Unicode Bug with Footnotes

I have on and off had serious encoding problems with texts in Scrivener. It seems that something about the way Scrivener handles unicode text containing 2-byte characters (Japanese, Chinese, etc.) inside footnotes causes complete mangling of all non-Roman text in the document when exported to Microsoft Word or Mellel (problem does not exist with TextEdit). I suspect this is due to conflict between the unicode characters and the beginning/end of the footnote marker ({\SCRV_FN= \END_SCRV_FN}

When it happens, someone like me, writing a history dissertation with no Chinese/Japanese/Korean in the body of the text but frequent use of C/J/K in the footnotes will have completely mangled footnotes when exporting some but not all of my documents. This took me ages to track down to a reproducable problem but now I can fully report it as a bug and can accurately reproduce the problem exactly:

  1. Create a new Scrivener project+document
  2. Type:

This is a test[Hello 中國] test.

The [ … ] should be within a footnote marked text (command-shift-f) Make sure that the “test” after you end the footnote has not continued using the Chinese font but has returned to the original font being used before you switched into Chinese. That is, switch back to english keyboard before you being typing “test” at the end.

  1. copy the text and paste it into an empty Microsoft Word document or Mellel document. Note that the Chinese word 中國 in the footnote has become mangled and unreadable.

  2. Return to scrivener. Add the following single Chinese character at the end: 中

  3. copy the text and past it into an empty Microsoft Word document or Mellel document. Note that none of the Chinese is mangled anymore.

As long as there is some Chinese outside of the footnote, this error does not occur, but under some conditions I have not been able to establish (when the Chinese comes at the end of a footnote?) it happens when there is C/J/K text in footnote but not in the main text. This is highly problematic because in scholarly writing, we usually leave all C/J/K writing out of the body, and sometimes put it in the source references of a footnote.

Also note:

This also happens with Japanese and Korean. Try the following text as well:

This is a test[hello 日本語] ok.
This is a test[hello 한국어] ok.

I assume something happens when the text is put in the clipboard, or during the full export process that mangles this text. An annoying work around for the time being is to add one chinese character somewhere in the body of the text, export it, and then delete this single character but this is not obvious and will hopefully be unnecessary when the issue has been resolved

Thanks for your hard work on the development of Scrivener, which is the home of my dissertation.

Yup! That is indeed a bug, good catch. The good news is that it is already fixed. :slight_smile: In 2.0 anyway, which will be out most likely some time in October. So you shouldn’t have to worry about it for much longer.

In case you are curious, the problem is that Scrivener was not properly defining the RTF font style when it only appears inside a footnote. So if the only instance in which 2-byte wide characters were being used was inside footnotes, no special font would be declared and you would get base font (likely non fully Unicode aware) footnotes: i.e. garbage characters. If you put a Chinese character anywhere in the document, then the special font gets declared, and footnotes work.

Oh, and the reason why this bug only showed up in programs that handle RTF footnotes in the pasteboard is that programs that don’t (like TextEdit), use a different alternative in the pasteboard (probably the RTFD alt) which essentially flattens everything into pseudo-footnotes, thus making the footnote normal text, and just as with sample two, getting a proper RTF header to address it. It also won’t happen in Scrivener-to-Scrivener pastes because Scrivener has its own special alt which it uses, maintaining all of its internal formatting like dynamic footnotes.

Can I point out that this is not a problem with Nisus Writer Pro, which uses UTF8 as standard.

Mellel obviously still does not. I used Mellel when I first moved onto OS-X, as it was much better than anything else, but it wouldn’t import .doc files from WinWord in Chinese … it used single-byte ASCII on import, turning the C/J/K into garbage. I had to open the .docs in Text Edit first, save them out as RTF and open that in Mellel. Of course, footnotes didn’t come into the equation.

I’m a Microsoft-free zone, so I can’t comment on Word, but it doesn’t surprise me.

Mark

Actually, in my testing anyway, the reason why it works with Nisus is that doesn’t. For some curious reason (Nisus is usually cream of the crop for this sort of stuff) it takes the RTFD pasteboard version (which will work in all programs) not the RTF version with footnotes. So while you get a result that looks right, it isn’t using real footnotes, and you could get the same result by pasting into TextEdit first, and then pasting into Mellel or Word. To be clear, the problem isn’t with either Mellel or Word. They are turning the RTF pasteboard into footnotes precisely as they were requested to. The problem is with the request, which doesn’t contain all of the proper font encoding information unless there are multi-byte characters in the rest of the document.