Unreadable code when copy Chinese characters in PDF compiled by Scrivener

There is a problem when copy Chinese characters in the PDF exported by Scrivener. In the appearance, the document looks fine and everything is normal. But when you try to copy the text with Chinese characters, you will get unreadable code in pasting. I have tried various options in the Compile Editor and PDF formatting and nothing worked.

Any suggestions? Thank you very much indeed.

What font are you using in the compiled document? If it is not a unicode font (like the ugly Arial) I’d recommend you to try one and see if it makes any difference.

I would think it’s a coding issue. Scrivener should be using UTF-8. When you copy the text from the PDF, what are you pasting it into? That other software might be set to GB2312 or Big5, which would cause what you’re seeing.

Why don’t you post a brief example PDF so that I (and others) can take a look?

:slight_smile:

Mark

Try pasting into TextEdit first.
If it looks OK there, Ctrl+A, Ctrl+C, then copy into Scrivener.
If it still looks bad in Scrivener, it’s an encoding problem.

What are you pasting into?

What if you compile to an editable format, such as RTF or Word?

PDFs are visual documents. They aren’t really intended to be editable. Which means that if the PDF looks like you want it to, Scrivener has done its job. Dealing with text copied from a PDF is the responsibility of the destination application.

Katherine

Dear Mark,
I have uploaded the sample documents (scrivener project and PDF), But it seems that the unreadable code will inevitably show up no matter what application is being pasted.

What is the problem here, I wonder. Thank you.

Hi Katherine,
The unreadable code appears regardless what program is being used for pasting, say Safari and or Spotlight on Mac.

And when compile as RTF or Word documents and then save as PDF using Microsoft Word, the text is perfectly fine and can be pasted as well. Just couldn’t do when exported as PDF via Scrivener directly. It may be a coding issue perhaps.

Hello Miao,

What version of Scrivener are you using and what version of MacOS. I am using an M1 MacBook Air running MacOS 11.2.1 and Scrivener 3.2.2.

The PDF you sent has no searchable text-layer, so I had to scan it to make it searchable. I use “PDF OCR X enterprise edition” with Chinese language added. Having done that, I copied the paragraph on the second page and pasted it into TextEdit, saving it as “sample_text.rtf”. I don’t know why in the resulting file, at each line break there is a long string of unreadable characters, but they can be deleted.

I then quickly compiled your Scriv project—I had to assign section types—as “Sample Texts 2.pdf”. It has a searchable text-layer so I copied the same paragraph into TextEdit and saved it as sample_text_2.rtf.

The 2 RTFs are in “Chinese samples.zip”.

[attachment=0]Sample Texts 2.pdf[/attachment]
[attachment=1]Chinese samples.zip[/attachment]

I’ll try compiling your project on the iMac running 10.15.7 when I have a moment.

:slight_smile:

Mark

So, it’s something to do with your system>

Why do you need to copy from the PDF, rather than a Word or RTF document? Or directly from Scrivener, for that matter? As I said, PDF isn’t the best format if you need to get the text out to somewhere else.

Katherine

That has been on my mind as well, however I can imagine a situation where someone else might wish to copy and paste quotations from something that Miao publishes and buts up against this problem. And I can confirm it’s a problem with 3.2.2 running under 10.15.7.

As well as compiling Miao’s project, I tried compiling the Chinese document in one of my test projects to PDF and it had the same result. If you copy the text and paste it into TextEdit, nothing appears though there is one or two lines of invisible code, which TextEdit thinks is in the Apple Symbol font! Using Paste and Match Style has the same result apart from a ◆ at the end!
[attachment=0]scrivtest_2.scriv.zip[/attachment]
[attachment=1]scrivtest_2.pdf[/attachment]
I haven’t tried OCR’ing the PDF, as the app is on the MBA, not this iMac.

:slight_smile:

Mark

Many thanks Mark, to help confirm this issue.
I am using Scrivener 3.1.4 and Mojave and there is this problem. Now I understand this is not related to my Mac setting. Thanks for trying.

Hi Katherine,
This is actually an important functionality. Being able to copy-paste means being able to search in a PDF!!

I know from RTF to PDF via MS Word can output a searchable PDF, but there are drawbacks. During PDF conversion, Word offers two options. One for “Best for electronic distribution and accessibility”, it allows you create a PDF with outline, the index, BUT losing the font and formatting. See here https://answers.microsoft.com/en-us/msoffice/forum/msoffice_word-mso_mac-mso_mac2016/best-for-electronic-distribution-and-accessibility/448f6811-66e5-4ffc-9fa0-c62a0cb771f5
The other option for best printing does retain the format but losing PDF outline.

Scrivener can output PDF with outline and retain the right format at the same time. So for my needs, a long paper with outline and searchable PDF are needed.

Cheers,
Miao

Miao and Mark.

For me, the sample PDFs attached—compiled from your Scrivener projects above without any alterations—are both searchable and copiable, although I don’t know if they have the veracity and integrity you are seeking.

If they do meet your needs, the workaround is to compile to print, but when the print dialogue opens, choose the option to create a PDF rather than send the pages to a printer. This sidesteps Scrivener’s PDF settings and uses the ones provided by the operating system instead.

[attachment=1]Miao.pdf[/attachment]

[attachment=2]Mark.pdf[/attachment]

[attachment=0]Print.jpg[/attachment]

H Miao and MERX

Using the print engine is a good workaround. It works. Mind you, I always compile to RTF opening the document automatically in Nisus Writer Pro as it is there that I run spell check and check pagination, set endnotes to end of section etc. as necessary. It seems to me there is less.flexibility in layout compiling directly to print or PDF.

But the fact remains that there is a problem that the Chinese in Miao’s PDF is not copying properly, nor is my text when compiled using 3.2.2 under 10.15.7. My PDF compiled on my MBA with 11.2.1 does not exhibit the problem.

I have just checked another way—this is on the iMac running 10.15.7—by compiling my text to RTF, opening it in NWP—Note, NWP is built on the same Apple text-engine that Scrivener uses—printing it from NWP to PDF, copying the text and pasting it into TextEdit, The result? Exactly the same. In other words, this is an Apple bug in Catalina, and maybe earlier versions of MacOS, not a Scrivener issue.

It also means that unless Miao can update his system to BigSur, s/he will have to work out what is best with Word. I don’t have Word, but I’ve checked on the iMAC that compiling to DOCX—which uses KB’s own conversion code—opening in Pages, printing to PDF, copying and pasting into TextEdit works, though Pages screws up the fonts, turning everything into Arial Unicode MS, even though all the fonts I use are there available in Pages!

:slight_smile:

Mark

To return to this topic, I posted the problem on the Nisus Writer Pro forum as it affects PDFs created by NWP on my iMac running 10.15.7. Another forum member running NWP on 10.14.6 confirmed the problem. Martin of Nisus replied as follows:

So, anyone running any version of MacOS before Big Sur would need to check if it affects their compiled PDFs, and if so, switch to compiling to RTF or DOCX, opening in Word or LibreOffice and creating the PDF from there. Clearly, from my testing, even on Big Sur there are some Chinese fonts that still end up with wrong coding.

Mark