I am having trouble importing into Scrivener very simple multimarkdown text files saved-as-text from Word. Simple as in
If I type that in TextEdit and save-as plain text, it imports into Scriv fine.
but if type it in Word and save-as plain text, the resulting file does not import properly into Scriv. Inside the resulting container folder in the binder is just one document whose name is ‘first header’, and that is all.
As far as one can see, the two plain text files are identical. What demon weirdness is this?
Is Word playing fast and loose with Plain Text format or is Scrivener import being overly sensitive to some internal variance in plain text files?
In hopes that someone might be able to see what I cannot, I have attached a zip file containing two identically typed .txt files of the sort described above.
Any inklings would be greatly appreciated. It is bugging me pretty much.
Seemingly identical, but I bet Word is producing CRLF style carriage returns by default, rather than LF, which is what the Mac uses these days. It has to do with the actual code being used to tell text editors when a line ends. If the text editor, or application attempting to read the file, is not expecting Windows DOS style carriage returns, it might only see the first bit of the file properly.
I’m pretty sure Word has a way of choose carriage return styles.
I just tentatively confirmed this by opening the text file produced in TextEdit, and using TextMate, saved the file with CRLF line endings, producing a result identical to the Word produced file, when imported into Scrivener.
I had thought of that, but Word has two save-as options, ‘Text Only’ and ‘Text Only (MS-DOS)’, and I have always assumed the former generated “Mac” paragraph breaks, and the latter DOS paragraph breaks–and indeed, TextWrangler confirms this.
Now, my (dusty) recollection is that “Mac” text files had always used CR and DOS used CRLF. Word can generate both, but neither works for Scriv import. So, I am now thinking the key is that the TextEdit file is using a third standard the Scriv likes, namely Unix-style plain text file. (Perhaps it is Unix that uses LF for paragraph breaks?) [Added: Makes sense that the Mac standard probably shifted with OSX to the Unix standard. But looks like Word 2004 ignored this.]
I followed up in TextWrangler after your post. TW lets you choose between Mac, Unix, and DOS line breaks when saving a text file. And, indeed, what I found was that the only one that produces an Scriv-importable MM file is the Unix.
Since Word can only save text files with Mac and DOS breaks, that makes clear why I am having a problem. Scriv is only accepting “Unix”-style plain text files.
Thanks, Amber, that was very helpful. --Greg
P.S. Sadly, Word does not have any settings for making Unix-type text files. So, there is no Word resolution to my little problem.
Thanks to the helpful analysis, I was able to resolve this hang up with Applescript–replacing all CRLFs (or solo CRs) with LFs before saving as Text Only.
This works, because it appears that Word does not enforce the Mac-style endings on paragraphs if they have been stripped out prior to saving, even if you choose the Mac-style plain text when saving. TextEdit is permissive in the same way. Seems weird but true.
My guess is that it is not a UTF-8 thing except in so far as the “Unix”-type text file and UTF-8 must agree about what an end paragraph looks like.
Since killing the CRs makes the import work, I suspect that all three of the text file formats that have been in play here are really just living on the backward compatibility of UTF-8 when it comes to the import to Scriv. So, the real action is just about differences in end-paragraph standards.
I don’t really have a way to explore more closely. The closest Word can get to UTF-8 is UTF-16, but Scriv dislikes UTF-16 files even more than it was disliking my Mac and DOS text files–refusing to import them at all.
And, of course, now that I have a fix for my purposes, I can afford to be just that fuzzy on file formats.