Copy & paste PDF text

This morning I’ve been copying snippets of text from a PDF document and pasting them into my draft assignment. This was fine for the first few times but now I am getting gobbledegook.

[Successful copy and paste]
Churchwardens also had to sign the bishops’ transcripts, the annual copies of the parish registers that were sent each year to the bishop following the Act of 1598
[now does this]
#D%&?D=6&,(/’“6@’$“D6,”+$”’.I/"+D(">.‘D$C’a"+&6/’?&.C+‘J"+D(“6//%6@”?$C.(’"$)"+D(“C6&.'D” &(I.’+(&’" +D6+" =(&(" '(/+" (6?D" -(6&" +$" +D(" >.'D$C" )$@@$=./I" +D(" !?+" $)" M\UW



In my experience pdfs can contain all kinds of junk. Sometimes the OCR hasn’t worked very well (in the case of scanned documents) and sometimes there seem to be problems with text encoding. You could try pasting some copied text into TextWrangler (free) and see if you get any legible results. It is capable of opening and saving in various text encodings, so that might give you some mileage. In any case, I’d say it’s 99% likely to be the pdf, not Scrivener, so not technically a Scrivener bug. Of course, if you mean that those pieces of text in the posting are exactly the same piece of text copied and pasted at different times, then something has probably happened while the text was on the clipboard – which I imagine is just the standard OS clipboard, so once again not something that Scrivener would control. But I may be wrong.

Cheers, Martin.

That gobbledegook looks to me like graphic code. Was this the same PDF, and if so, was it the same page as the previous copy/pastes? If it’s all from the same PDF, It would seem that part of the original was in text-PDF format, but that that bit is in graphic-PDF format. Two ways that that could happen: the OCR process didn’t include the page in question; or whoever created the PDF you are working from had copied that bit in from a graphic-PDF which is then not showing up as visibly different in the PDF s/he produced.

I would suggest the solution would be to try running OCR over that page/part of the text to make it copyable as text.


Another thought.

Is the weird part in question in exactly the same font as the parts that paste properly? I ask this because I have seen coding issues like this. I work with Chinese, and the standard Chinese encodings used on Windows — GB2312 or GB18030, or I believe that where MS has abandoned these they use UTF-16, where Apple uses UTF-8 — include the glyphs of the Roman alphabet, and on the most commonly used Chinese font, these look just like Times New Roman. But, and it’s a big but, the internal codes for the glyphs are totally different! However, there are subtle differences — the glyphs are badly kerned, generally, and the punctuation spacing can be a give-away! — and if some text is imported onto a system that doesn’t support the Chinese coding in question, the result looks like that. So, did the creator of the PDF in question copy that part of the text in from some other document that was actually written in a two-byte CJK (Chinese, Japanese, Korean) language or an RTL (Right-to-Left) language using the Roman font built into that? If so, there’s not much you’re going to be able to do about it, I’m afraid.


The text I was copying and pasting was from the same document. In fact, when the problem started to happen, I went back to text I had previously successfully copied and pasted and got gobblydegook when I tried again.
Martin — I downloaded TextWrangler but didn’t have any success (could be operator error on my part)
Mark — It is an English document, part of my genealogical studies. I don’t know how to do the OCR thing.

Thanks, guys, for your suggestions.


a Kiwi in Ireland

A follow-up:
I deleted the PDF I was having. Opened a very old application called MacLinkPlus Deluxe - had it since Adam. :slight_smile: Used it to “convert” the original PDF into whatever (another PDF?), saved it then imported it into Scrivener. Copy and paste worked a dream. Hmmm thought I… imported the original PDF and copied and pasted from it. Guess what? I works again. :astonished: Odd, but at least I have saved myself some typing.

Glad you got it sorted.

MacLink Plus, eh? I had that for a long time, and could do with it now. I get sent the odd .wps file (Microsoft Write) and nothing I have will read them … I don’t have Weird as I’m a virtually Microsoft-free Zone. Pain in the neck! One of the people is my 93-year-old father-in-law, and there’s no way he’ll change to something sensible!