I am trying to diagnose cases where feeding a plain text document to Scrivener’s MMD-import results in the error message “The file is not a valid multimarkdown file. Nothing was imported.”
Can someone tell me what the operative conditions are, i.e. what conditions the routine is testing for? I have figured out a few of the gotchas, but it has become clear that there is at least one more!
Thanks,
Greg
BACKGROUND: I am working on a workflow like the following. Some ordinary text is placed in a Word document, a script is run to massage the text to prepare it for MMD import, the file is saved as Text Only. Then the result is imported into Scriv using the MMD import function.
The case at hand is one where the document is just plain text from a Gutenberg Project document, and the script has done little but insert some # marks are the beginnings of certain lines. The script has also performed the following get-ready-for-MMD-import functions: i) make sure there are not headers, footers, footnotes, endnotes, comments and if there are destroy them, ii) crush all CRLF and CR characters to simple LFs. In an earlier script these moves had seemed to turn the trick, but now I have cases where these moves are evidently not enough.
HISTORY: I had (thought I had) worked through this problem with an earlier script. There I learned that one needed to strip out all headers, footers, footnotes, comments, but most importantly, to force all CRLF and CR characters to simple linefeeds. [By the way, the need to crush all CRLF and CR is a real bummer, because i) Word does not have an option to save to .txt with just LF line breaks, and ii) it is impossible to accomplish in any way except programmatically (e.g., cannot be done thru Word search & replace). So, if there is any way the importer could avoid requiring this, that would certainly be a delight.]
Here is a small file which evidences the sort of trouble I am having.
This two paragraph file was produced in the following way: I took a file of about 80,000 words which was giving me the import problem. Using a binary search procedure I arrived by process of elimination on a pair of paragraphs that occur near (but not at) the end of the document.
The elimination procedure made clear that most of the 80,000 word original is unproblematic for import, so the import routine is balking at something very local within these paragraphs.
The attached file is a text file (Unix LF style) produced by TextWrangler. The Scriv-MM import fails on it. Both of the paragraphs generate the import failure when taken singly as well.
Any clues on this particular case?
Unless the MMD import routine has a schlock filter which these lines from Anna Katherine Green are triggering, I cannot see what the issue could possibly be.
If paragraphs like these can precipitate an import failure, you can see why I am anxious to see if I can get info (if feasible) on the general parameters a file must meet to make it through the import process!
That is very helpful. Unfortunately, it is only my sample file which could have run afoul of this one-# requirement-there must be at least one other hitch.
We can get the sample file I posted to import if we insert some # headers, but my original (large) source file (which I chopped down to the sample file) contained many # headers but would not import.
So, what this shows is that my troubleshooting (binary) search procedure needs to be redone to with this at-least-one-# requirement in mind to uncover what else is going on. I will have another go at it and post again.
The improved search procedure led me to a line, two paragraphs away from the text of my earlier sample, which which contained an occurrence of the term:
So, there was one high-bit ascii character in the book text and that is what Scriv’s import routine appears to be choking on.
Right now that means that I will have to add the following step to my workflow: open my script-prepared text file result in TextWrangler and tell it to crush any high-bit characters to low-. Something of a bother.
Actually it imports as UTF8 so I’m not sure why this would cause a snag. The necessity for a hash thing is known and is on the list to address for the next update (hopefully).
Best,
Keith
Ah, it is expecting UTF-8. You probably alerted me to this last time I was troubleshooting an mmd import script. My text files were in Western Roman (Mac) with Unix (LF) line breaks.
The trouble for me is that MS Word does not know how to save text files with either of these features.
I worked around this last time, by having the script force LFs through the Word document before save time. And this worked for mmd import for all the documents I have processed with that earlier script. But, as you will see, this process was giving me Unix LF line breaks, but Western Roman (Mac) encoding–which works fine with the Scriv import until you hit a high-bit character.
So, this gives me a slightly improved workaround (i.e. which involves opening the result in TextWrangler and resaving the text document with UTF-8).
This makes it quite mysterious (to me) why the one will import and the other not.
Now maybe i) TextWrangler is misleading me about the content encoding – or ii) the existence of high-bit characters in the text changes something in the header/resource area of the file (so that Scriv expects something in the non-content area that it does not find there (in my files) but only when there are such characters in the text body. But I wouldn’t have thought these two possibilities very likely.
Try opening the file in TextEdit, using File > Open and selecting UT8 as the encoding, and see what happens there. I believe it is possible to batch-convert the encoding of text files, though I don’t recall how - Amber will be able to fill you in if she sees this.
Best,
Keith