When I compile to PDF, the umlauts (e.g. “ü”, Unicode U+00FC) in the resulting manuscript aren’t single characters, but two: the non-umlaut vowel (e.g. “u”, U+0075) and a diaeresis (“¨”, U+00A8), separately. You can easily verify this if you copy text containing umlauts from a comiled PDF to a plain text editor such as BBEdit and then begin to delete letters from the end of the word: the “ü” doesn’t disappear in one, rather first the diaeresis is deleted, leaving the “u”. (When I compile to Word format, the manuscript contains true, single-character umlauts.)
How can I get Scrivener to create PDFs with true umlauts?
The problems are that
- searching a PDF that contains, for example, Tu¨r for the German word Tür, “door”, doesn’t return any results;
- blind people do not hear the correct word when text is read to them;
- and printing from PDF may lead to strange placement of the diaeresis.
I just found out that the problem has nothing to do with Scrivener. When I open the PDF in Adobe Acrobat Pro, Chrome, or Firefox, the umlauts are fine, only when I open it in Apple’s Preview or Safari, are the umlauts represented as vowel plus diaeresis. It seems to be an Apple problem.
So as far as Scrivener is concerned, the problem has been resolved.
There are two ways to indicate accented characters in Unicode: the full character or the character + a combining character. See http://www.fileformat.info for more.
For instance ü is LATIN SMALL LETTER U WITH DIAERESIS (U+00FC), but you can also get the same by putting a “u” plus a COMBINING DIAERESIS (U+0308). These two forms should look the same, but I have had some programs (I think Acrobat Reader?) that display them differently, for some fonts, which is very confusing. I have even had to go through text documents to convert one to the other, though for the life of me I can’t remember why, right now.
Anyway, since these two forms look the same (usually), it usually doesn’t matter. But if there is a bug somewhere, or if the search algorithms are written poorly, it can be very confusing. It can be especially annoying if you have copied/pasted from various sources, and they have used different forms.
Hope this helps you understand the issue. I don’t know how to solve your problem. If it is really driving you insane, you can use a text editor (like Text Mate) to look inside the .rtf code and find and change the offending parts — but I don’t recommend doing that if you can help it.