There is a problem when copy Chinese characters in the PDF exported by Scrivener. In the appearance, the document looks fine and everything is normal. But when you try to copy the text with Chinese characters, you will get unreadable code in pasting. I have tried various options in the Compile Editor and PDF formatting and nothing worked.
I would think it’s a coding issue. Scrivener should be using UTF-8. When you copy the text from the PDF, what are you pasting it into? That other software might be set to GB2312 or Big5, which would cause what you’re seeing.
Why don’t you post a brief example PDF so that I (and others) can take a look?
What if you compile to an editable format, such as RTF or Word?
PDFs are visual documents. They aren’t really intended to be editable. Which means that if the PDF looks like you want it to, Scrivener has done its job. Dealing with text copied from a PDF is the responsibility of the destination application.
The unreadable code appears regardless what program is being used for pasting, say Safari and or Spotlight on Mac.
And when compile as RTF or Word documents and then save as PDF using Microsoft Word, the text is perfectly fine and can be pasted as well. Just couldn’t do when exported as PDF via Scrivener directly. It may be a coding issue perhaps.
What version of Scrivener are you using and what version of MacOS. I am using an M1 MacBook Air running MacOS 11.2.1 and Scrivener 3.2.2.
The PDF you sent has no searchable text-layer, so I had to scan it to make it searchable. I use “PDF OCR X enterprise edition” with Chinese language added. Having done that, I copied the paragraph on the second page and pasted it into TextEdit, saving it as “sample_text.rtf”. I don’t know why in the resulting file, at each line break there is a long string of unreadable characters, but they can be deleted.
I then quickly compiled your Scriv project—I had to assign section types—as “Sample Texts 2.pdf”. It has a searchable text-layer so I copied the same paragraph into TextEdit and saved it as sample_text_2.rtf.
Why do you need to copy from the PDF, rather than a Word or RTF document? Or directly from Scrivener, for that matter? As I said, PDF isn’t the best format if you need to get the text out to somewhere else.
That has been on my mind as well, however I can imagine a situation where someone else might wish to copy and paste quotations from something that Miao publishes and buts up against this problem. And I can confirm it’s a problem with 3.2.2 running under 10.15.7.
As well as compiling Miao’s project, I tried compiling the Chinese document in one of my test projects to PDF and it had the same result. If you copy the text and paste it into TextEdit, nothing appears though there is one or two lines of invisible code, which TextEdit thinks is in the Apple Symbol font! Using Paste and Match Style has the same result apart from a ◆ at the end!
I haven’t tried OCR’ing the PDF, as the app is on the MBA, not this iMac.
For me, the sample PDFs attached—compiled from your Scrivener projects above without any alterations—are both searchable and copiable, although I don’t know if they have the veracity and integrity you are seeking.
If they do meet your needs, the workaround is to compile to print, but when the print dialogue opens, choose the option to create a PDF rather than send the pages to a printer. This sidesteps Scrivener’s PDF settings and uses the ones provided by the operating system instead.
Using the print engine is a good workaround. It works. Mind you, I always compile to RTF opening the document automatically in Nisus Writer Pro as it is there that I run spell check and check pagination, set endnotes to end of section etc. as necessary. It seems to me there is less.flexibility in layout compiling directly to print or PDF.
But the fact remains that there is a problem that the Chinese in Miao’s PDF is not copying properly, nor is my text when compiled using 3.2.2 under 10.15.7. My PDF compiled on my MBA with 11.2.1 does not exhibit the problem.
I have just checked another way—this is on the iMac running 10.15.7—by compiling my text to RTF, opening it in NWP—Note, NWP is built on the same Apple text-engine that Scrivener uses—printing it from NWP to PDF, copying the text and pasting it into TextEdit, The result? Exactly the same. In other words, this is an Apple bug in Catalina, and maybe earlier versions of MacOS, not a Scrivener issue.
It also means that unless Miao can update his system to BigSur, s/he will have to work out what is best with Word. I don’t have Word, but I’ve checked on the iMAC that compiling to DOCX—which uses KB’s own conversion code—opening in Pages, printing to PDF, copying and pasting into TextEdit works, though Pages screws up the fonts, turning everything into Arial Unicode MS, even though all the fonts I use are there available in Pages!
To return to this topic, I posted the problem on the Nisus Writer Pro forum as it affects PDFs created by NWP on my iMac running 10.15.7. Another forum member running NWP on 10.14.6 confirmed the problem. Martin of Nisus replied as follows:
So, anyone running any version of MacOS before Big Sur would need to check if it affects their compiled PDFs, and if so, switch to compiling to RTF or DOCX, opening in Word or LibreOffice and creating the PDF from there. Clearly, from my testing, even on Big Sur there are some Chinese fonts that still end up with wrong coding.