Beginner's Question about jumbled Chinese Characters

Hello all,

I’ve been experimenting with MMD for the past week so far for writing on the iPad, and I love it. Right now I write in Scrivener, and sync it to Simplenote when I’m out and about. But there’s one thing that I’ve been having trouble with – my documents are a mixture of English and Chinese characters. When I sync the Chinese characters between Scrivener and Simplenote, the characters come out fine. But when I compile the file either to MMD or MMD–>RTF, the Chinese characters get jumbled into code like this: Âúã‰ΩõÊïôÂè≤Á±çÊߙ˴ñ . Can anybody tell me how to fix this? I’ve tried searching the forums but haven’t been able to find an answer.

Many thanks!

The compile path for RTF looks like this:

  1. Scrivener pre-processing
  2. MMD raw file production
  3. XHTML conversion
  4. Execution of textutil UNIX system utility

You can easily spot-check #2 and #3 for valid Unicode output by compiling first as plain MultiMarkdown and examining the source file; then if that looks good MMD -> HTML compile method. If that looks good in a web browser, then it must be textutil that is messing things up, and that unfortunately may be more difficult to solve. The man page for it might have some options for handling multi-byte characters, but it looks like whatever flags MMD is using by default are damaging these characters and giving you a sequence of single-byte character salad.

Thanks so much for these suggestions! The characters look fine when I process it straight using the MMD drag and drop, and also using Scrivener’s MMD–>HTML output. Do you have any suggestions how I would go about checking the textutil utility?

Thanks again for your help!

I could take a look at it, but am a bit unqualified. :slight_smile: If you could send me some sample Chinese text in an MMD produced XHTML file, that would really help me out. Send me stuff at (support AT literatureandlatte DOT com).

Incidentally, the best work-around I’ve found for the moment is to compile as HTML, and then open that in TextEdit with HTML parsing enabled in TextEdit’s preferences (default behaviour, I believe). From there, you can save as an RTF. It will have the old black-paper bug, but this can be easily fixed with the font palette.

The Problem: I finally got a test project with some Chinese in it so I could play around. Here’s the deal: Newer versions of MMD have a custom RTF generator which by and large is superior to the old method, which used textutil to convert. Textutil would just do a “camera conversion”. It would take an HTML file and create an RTF look-alike, and thus would acquire all of the flaws that came along with HTML, such as no real footnotes, no cross-reference linking, etc. So the new version is superior in that it has better support for real RTF codes, but has one flaw: it doesn’t properly encode Chinese characters. Here is an example of what a line looks like coming out of TextUtil:

\f0\b\fs36 \cf0 \uc0\u26631 \u39064 \

Here is the same line (I think, but if it isn’t, my point is not disturbed by the discrepency) coming out of MMD:

\s2 {\*\bkmkstart }标题{\*\bkmkend }

You can see, the characters are left in raw.

The solution to this is probably non-trivial. Basically we would need an open source solution that can convert Unicode characters to escaped equivalents, which could then be inserted into the conversion process in the appropriate location. It wouldn’t be hard to trigger MMD into this alternate mode. You could set the “Language” meta-data field to “chinese” and then have the program insert this extra routine into the process as necessary—just need the routine. The tricky part would be making sure it only impacts multi-byte characters and not the whole file.

Temporary Solution: The best thing to do for now is to make the above work around automatic. Fortunately this is easy to accomplish. The old RTF technique is still distributed with the MMD installation, so follow these steps:

First, open Terminal.app from [b]Applications/Utilities[/b]

If you have MMD installed in your Application Support folder, type in:

cd ~/Library/Application\ Support/MultiMarkdown/bin

If you do not have MMD installed, either do so at this point (recommended), or use this command instead:

cd /Applications/Scrivener.app/Contents/MacOS/MultiMarkdown/bin

Okay now enter these commands in order:

mv mmd2RTF.pl mmd2RTF-old.pl ln -s multimarkdown2RTF.pl mmd2RTF.pl

Scrivener is going to be executing [b]mmd2RTF.pl[/b] when you compile, so what we’ve done is established a symbolic link between that file and the old script which uses textutil—now Scrivener is using the old method.

Note: The following bug fix is for older versions of Snow Leopard. Apple has since fixed the “black paper bug”, so give it a shot first and see if you get a normal looking RTF file. If it is black, then proceed with the below work-around. The black paper bug is just that, if you change the paper colour of the file to white you will see the text of it, so it isn’t a huge problem, but this solution will remove the necessity of having to do that every time you compile.

Try compiling again from Scrivener. You will get the black-paper bug if it worked. To get around this bug in the future, add the following MMD meta-data to all of your projects:

Field: XHTML Header
Value: [b]<style type="text/css" media="all">body {background-color: #fff}</style>[/b]

Now you should be good to go.