Export .txt problem: sufficient gets suf?cient [BUG LOGGED]

Dear Scrivener Team,

since I use plaintext export and MMD together with a homebrewn script environment ( https://forum.literatureandlatte.com/t/an-approach-to-exporting-mmd-to-latex/9122/1 ) for me proper plain text export is important.

What happens:
*compiled to plain text
*in several words fi is replaced by ? like in finger -> ?nger and sufficient suf?cient. Fi is not replaced!
*replacement of fl by ? also occured (fluid -> ?uid)

This problem seems to be font (I use courier, verdana, MS shell dlg 2, and cambria) or document dependent since in several documents not a single replacement occured.

guess:
is fi probably interpreted as a command during export process? (for example as end of an (open?) if loop?). It also might be related to some rtf formatting.

To be able to check the resulting .txt files it woulb be good to know which character encoding you use. Is it cp1525 as Jedit tells me?

I checked old files (1.2) and they have the same problem

Best Regards,

Markus

Beta 1.3

In typography these fl and fi characters are called ligatures and are often substituted because they’re visually better spaced. I would assume that somewhere along the line your word processor has replaced fl and fi with new ascii characters that aren’t available in the fonts you’re using. To my knowledge this isn’t something Scrivener does but I may be wrong. If it does there’ll probably be a preferences setting.

Good luck!

Thanks for that very good hint. I directly checked the .txt file before processing it. So I am pretty sure it is an Scrivener issue.

I have done a test with a new project and I could not reproduce the problem. So I assume that an invisible character (perhaps indicating a wrong character coding is forcing this behaviour ( I imported the text partly via copy paste from html).

As a next step I will strip down my (pretty large; more than 80000 words over all) step by step to see which document makes Scrivener behave like that. If I manage to isolate the document I will post a zip of the project here.

One thing that might indicate if they exist as single characters in Scrivener would be to check the character count for anomalous counts in short sample texts. I’m not sure what best practice is for typographic ligatures. I’ve always assumed they should only be displayed by the text engine and not actually placed into the file as data unless the user specifically typed them in. I could be totally wrong about that though. They most certainly shouldn’t be ending up in plain-text files as typographic conventions have no meaning in plain-text, to the point of being antithetical.

As promised I stripped down the project to only two documents. These still have the ligature problem. You can find these for debugging purpose attached to this document. The result of “Compile” is also within the zip file.

I hope that helps.
strip_for_debug.zip (292 KB)

and here the options I used for export:
BTW: In my eyes the export font dialog in “formatting” should be inactive if I dou export plain text, shouldn’t it?

textoptions.pngseparators.pngformatting.png

I think I screwed it down:

when I use “find” within the document, where sufficient gets suf?cient I get a wierd behaviour which, in my eyes, shows that it is a internal scrivener problem with ligatures.

Searching for “suf” I can find the two occurences of sufficient.
search_suf.png

Searching for “sufficient” (or suffi) I don’t find the two occurrences of sufficient within the text.
find_sufficient.png
To me that proves that Scrivener internally screws the ligature up even before it is exported. Might this be related to a RTF problem?

Hi Louis, thank you for your wonderful posts, they are very helpful. This issue is related ultimately to RTF and is a Scrivener problem. The problem we have at the moment is that Qt, the C++ framework Scrivener is built on, uses a cut down version of HTML as its editor engine. What this means is that when text is cut and pasted into Scrivener from HTML or Word etc. stray characters are often augmented and remain invisible in the Qt editor - this is the first problem and is a Qt problem that we need to fix. We have decided to bypass this altogether and augment the Qt source code so that all pasted text is first filtered through our own RTF parser. We have not started this work yet, but plan to soon. In the interim cutting and pasting text into Notepad is the only way to remove these hidden characters and then paste into Scrivener. Unfortunately, you’ll loose all formatting and styling but at least the text will be clean.

The second issue is what happens on a save to disk. First the editor text in memory is converted to RTF by our own RTF writer (as Qt does not support RTF out of the box) and when that same file is loaded into the editor it is then parsed through our RTF reader and then loaded into memory for the editor to display. We have many issues with both the reader and writer at the moment and need to re-write large chunks of the code which we have started to do. Interestingly, this was one of the few pieces of work that we farmed out to an expert Qt programmer for 12 weeks to meet the Sept 25th deadline. We are paying for that now and are no longer using these ‘expert’ services. Instead we are fixing many design flaws ourselves - I hate the saying, but my father was right again when he said, ‘If you want something done right, do it yourself.’

The good news is that all is not lost and we understand the problem. It’s not like we’re scratching our heads wondering how to fix the problem. It’s simply a matter of head down bum up and get through it. We have about three-four weeks ahead of us to remedy all these import and editor issues such as tables, weird line spacings, and hidden characters etc. that many users are reporting.

Anyway, thanks again for all your support.

Lee

Also Louis in the longer term (maybe soon after v1.0 is released) MultiMarkDown doesn’t look like it will be all that difficult to implement for Windows/Linus thanks to some great info from AmberV - which I think you mentioned in another post? I could have really done with something like this when I was struggling to write my own university dissertation in a text editor using LaTeX markup. All good things will come :slight_smile:

Lee

Dear Lee,

I am the guy who has to say "thank you "since Scrivener in the state it currently is is better than anything else out there. Also thank you for taking the time to explain things. That makes me feel even more comfortable with Scrivener (and motivated to contribute by reporting bugs).

I definitely don’t want to make any pressure on you since I have a complete working environment to get proper latex out of Scrivener by just one doubleclick (you’ll find a short description here: https://forum.literatureandlatte.com/t/an-approach-to-exporting-mmd-to-latex/9122/1 ). So I actually don’t really need MMD export.

I just want to give you some hints to make things easier for you. Don’t worry about me loosing some work. Since I stripped down the problem to a single document I will manage to get everything out of Scrivener that I need. Further since I only use plain text export I don’t care about loosing formatting. Quite the contrary: I’d love to have the possibility to delete all formatting out of my text to adjust everything to one font for all 80.000 words.

Markus

And finally what I find in the rtf file:

Just a rough guess: Might a different encoding fix some problems? The header of the rtf tells me it is ansicpg1252. What character encoding does the QT- library use? Fixing that would probably be an easy one…

I just searched and found the following about the differences between ligatures in MAC OS and Windows:
MAC keeps both: the ligature and the underlying word, windows does not. So probably there might be some valuable things about ligatures:

blog.nella.org/?p=627

If not, just forget this post.

Markus

P.S.:Since I’m stuck I’ll have to finish debugging here

Just to let you know. This Bug can be marked fixed in my eyes. All ligatures are marked as spelling errors now and are handled as one letter.