Strange characters on import

Hi, sorry if this covered elsewhere but I can’t find it. When importing text files I often get stuff like this, where quotes etc are replaced:

Any suggestions? Thanks.

You are importing documents produced in Word for Windows which are therefore not in the standard coding system for OS-X — at a guess Microsoft ASCII as opposed to UTF8. The characters in question are “smart quotes”, which have a different position on the OS-X character table. You are seeing the characters which have the same positions on the OS-X character table as the quote marks have under WfW. Common problem. Do a search and replace … unless you can get smart quotes turned off in WfW.

Yes, they are smart quotes, and also an em-dash and I think some other special chars that sneak in. They are from .txt files saved via Textedit (I dont use Windows at all, and I avoid Word on Mac), probably pasted in from Safari.
Textedit is set to ‘automatic’ encoding format. Should I perhaps set it to UTF-8? I’m not at all familiar with this encoding stuff.

Yes, this problem is address in the FAQ, Section III, #1. You just need to save your file as utf-8 and it will all be okay. You want to leave TextEdit in “automatic” mode when loading files, but once it is loaded, choose Save As… and manually select utf-8. You can also, as suggested in the FAQ, simply copy and paste from TextEdit.

Thank you!

Word won’t let me export in UTF-8. Only UTF-16. :frowning:

Exporting it as Text Only works, though. But it loses any underlining/italics/bold I might have had.

Why would you want to export in plain text from Word? Scrivener imports .doc and .rtf from Word just fine…

Whenever I import a Word doc it gets a bunch of garbage characters all through it, especially at the beginning and end, and also quotes, ellipses, and em-dashes are deleted (not even replaced with other things, just entirely gone).

Actually, I think the ellipses might be replaced with hard returns, now that I look at it more closely.

Is there maybe a setting somewhere I’m missing? I looked through the prefs and couldn’t find anything that changed this.

Sounds to me as though your Word files might not have the .doc extension. Scrivener looks at the file extension to determine how to import it. If it has none, it gets imported as plain text (UTF8). So, if you import a Word file with no .doc extension, you get junk. For instance, if I save a Word document containing the following text:

"Some dialogue."

and if I then delete the file extension and import it into Scrivener, here is what I would get:


Got it. Sounds like a good addition to the already existing “gibberish” answer.

Definitely should be in the FAQ, because users often hit this problem. It’s ironic that a Mac-only program like Scrivener must follow the DOS-Windows convention of required file extensions. I learned to use them long ago when exchanging files with colleagues. But my early writing files don’t have extensions, and that always creates trouble when I try to Scrivenerize them.

I don’t at all see how it is ironic. Could you suggest another way of recognising files easily and quickly?

Ever since Apple did away with the mandatory convention of having Creator and Type attached to every single file, they have used a combination of methods to solve the problem of file type discovery. There really is no perfect solution on any platform, but OS X’s is pretty close to the best; it is certainly my favourite. Consider how things used to be pre-OS X. Each file had its own association. If you wanted to open all JPEG files with Graphic Converter instead of Photoshop, you have to go through some really annoying and time consuming procedures, as each file had to be changed individually. The opposite of this was Windows/DOS, where there is a central association for every 3-letter extension, and changing one changed them all. You couldn’t have one RTF always open in Word and another in RoughDraft, it was either all or nothing. OS X combines the two systems relatively painlessly. It has a central association system where you can tell the system to always open RTF files in TextEdit instead of Word – plus you can utilise the framework of that old Creator/Type system to tell just one file to go against the grain.

The consequence of this is that since files are no longer required to have Creator/Types, and the central association index uses extensions to establish defaults, you have a lot of files (99% of them these days) that no longer have their own instructions on how to handle them or what they are. Consequently, there is little reason to spend the time coding a system to read this, since nothing uses it formally any more. The only reliable determining factor is the extension, these days. Most programs are pretty good about having their own extensions (and being able to have more than 3 letters is so nice!), but some legacy programs, and applications made by monstrously huge corporations (ahem), still don’t get it.

Well, that worked. I’ll just have to remember to slap that .doc extension on Word files before I import.