PROBLEMS WITH IMPORTED CONVERTED PDF FILES

DrBaker · July 9, 2015, 8:14pm

I just purchased and downloaded Scrivener for Windows to my Dell Vostro laptop yesterday. I use Windows 7 Professional and Microsoft Word 2013.

I am a writer and a published author. I heard other authors say good things about Scrivener so I wanted to try it.

I am a veterinarian and I am in the process of writing a book about Allergies In Dogs. Most of my research documents are from scientific journals that I download in PDF format from the internet. I then use Adobe to convert the PDF files to Word (.docx) documents so that I can read the converted PDF documents in Word and copy and paste relevant material into another document that I will refer to when I eventually write each chapter.

When I imported converted PDF documents (in Word .docx format) into the Research Folder in Scrivener, the documents were all messed up. None of the images were there and some of the lines of text were on top of each other which made them illegible. The two columns of text became one column of text which resulted in making the text out of place and nonsensical.

I then used Adobe to convert the original PDF files into .rtf files and imported the .rtf files but I got the same results.

I cannot use PDF documents for my reading and research because I am not able to copy and paste relevant information from PDF documents into a new document to be used as reference when writing my book.

This seems to be a major flaw in Scrivener. I thought that it was possible to import any kind of file or image into the Research Folder, but this has not proven to be the case.

Can you please let me know if there is any way to successfully import converted PDF documents into the Research Folder in Scrivener while keeping images, graphs, and columns of text intact?

Briar_Kit · July 9, 2015, 8:30pm

Hello

Not a Windows user, but I think it might be a converter problem. Do the converted PDFs open in Word okay/cleanly?

This post suggests there are settings to tweak and retry…

https://forum.literatureandlatte.com/t/apostrophes-doubled-during-import/30342/1

Sure a Windows user will be able to help you out.

devinganger · July 9, 2015, 8:47pm

If you can give me a link or source for one of the PDFs, I can try the same thing – I have Office 2013 as well. I’ve found that its inbound PDF conversion sometimes leaves a lot to be desired.

DrBaker · July 9, 2015, 9:52pm

Briar, In response to your question “Do the converted PDFs open in Word okay/cleanly?,” Yes, they usually do. Occasionally, they get messed up, but it usually works fine.

I checked out the link you sent about Importing Converters in Scrivener, Briar. I followed the instructions for a importing a DOCX converter via MS Word 2007-2013, I clicked on OK and Apply, and then tried importing the converted PDF documents again, but it did not work any better than it did before. The images in the DOCX document (converted from a PDF document) were all stripped from the document and the two columns of text had been converted to one column of text which made the text nonsensical.

Devin, here is a link to one of the articles I have downloaded for my research entitled “Effects of Essential Oils and Polyunsaturated Fatty Acids on Canine Skin Equivalents.” On the right side of the screen, click on “Full Text PDF” to get a PDF file.

hindawi.com/journals/jvm/2013/231526/

I pay for, and use, Adobe ExportPDF to convert the PDF files to Word (.docx) files.

The documents converted by Adobe from PDF to .docx are the documents that I imported to Scrivener (which are completely messed up in Scrivener but are fine in Word).

Thank you in advance to anyone that can sort out this problem for me!

MimeticMouton · July 9, 2015, 10:48pm

If you’re importing as a .docx file, try changing the converter used. Go to Tools > Options, Import/Export and click “Import Converters”, then select “DOCX” from the left pull-down menu and “Microsoft Office” from the right. Scrivener doesn’t support multi-column text or text wrapping around images, so you may still need to make some adjustments to the files in Word before importing to Scrivener so that you can ensure the flow is correct.

Importing the files as PDFs would maintain their original layout, and you can choose from Scrivener to open them in an external application (whatever your default PDF program is) via Documents > Open > Open in External Editor; a button for this will appear in the right of the editor footer when a PDF is open in the editor. This would let you view the files directly in Scrivener but have easy access to enhanced PDF tools as needed.

DOCX files will be converted to Scrivener’s internal RTF format when importing, but you could choose instead to add the file as a reference rather than importing it, which would let you open it in Word. The second tab in the footer of Scrivener’s inspector will open the references in the bottom section of the inspector; you can toggle between document references and project references via the section’s header bar. Document references are tied to a specific binder document, so will only be visible in the inspector when that document is selected. Project references are global and can be viewed regardless of the binder selection. To add a reference you can simply drag the file into the list from Windows Explorer, or you can click the “+” button in the reference header to browse for the file and add it. Double-clicking the reference’s icon in the inspector will open it in the default program. With Window’s snap feature, you should be able to easily work in your Scrivener project and the referenced Word file side by side, copying and pasting as needed.

devinganger · July 10, 2015, 1:08am

So, I loaded the PDF supplied into Word 2013 and re-saved it as a DOCX, performing the conversion there. It seemed to retain most of the formatting, including the two-column layout.

I then made sure Scrivener was set to use Word for import DOCX, as MM states just above, and imported. It seemed to handle the double-column to single-column transition nicely – it looked like the text was flowing properly – and all of the image captions were imported. Only a handful of the images were imported, however.

I wonder if converting the DOCX to a single-column document before importing to Scrivener would help with that.

Note: when using the default internal Scrivener converter, it flat-out errored out on loading the converted DOCX – said the XML structure couldn’t be read.

DrBaker · July 10, 2015, 2:04am

Hi Jennifer, Thanks for your thoughtful reply. I already tried “Tools > Options, Import/Export, click “Import Converters”, then select “DOCX” from the left pull-down menu and “Microsoft Office” from the right” and it did not help. I tried converting the two columns of text in Word into one column of text as you suggested but the result was not good.

Windows Snap did not work on my computer. I am able to manually resize each program (Word and Scrivener) and place them side by side on my screen, but I really wanted to work within Scrivener alone for ease of use.

Devin, you get the prize for the best result so far. Thank you. I used Word to convert the PDF to a DOCX like you did and got the best conversion yet when I imported it into Scrivener. It was actually readable and some of the images were retained (but not all of them).

devinganger · July 10, 2015, 2:06am

Can you manually copy and paste the images from the DOCX document into the correct spots in the imported document in the Research folder? I know it’s more high-touch than would be convenient, but if it gets the data in…

Briar_Kit · July 10, 2015, 7:09am

On a Mac, the text in the sample PDF is selectable. Same in Windows? If yes, why not import that and copy and paste direct from the PDF?

Noticed that the link allows you to get the full text as HTML. Easier and cleaner to import that rather than going through PDF > Word > Scrivener?

Hope you find a solution that works for your needs.

DrBaker · July 12, 2015, 9:01pm

Hi Briar,

Thanks for your suggestions. I tried importing a PDF into the Research folder in Scrivener but it did not allow me to copy and paste from the PDF file.

I imported one of my research articles as an HTML file and that worked nicely. However, I then went to some of my other research articles online to see if I could get them as HTML files, and I found out that most of my research articles are only available in PDF.

It appears that it will not work out for me to import my research articles into Scrivener for use within Scrivener. I was looking forward to using the split screen feature for reading and then copying and pasting relevant material into another document for reference when writing my book. I have been using the View Side By Side feature in Word 2013 when I am working on my book for this purpose and this works out well for me.

So for authors of NonFiction books that must use extensive research materials, it does not appear that Scrivener is providing the platform that we need. This is very unfortunate. I was looking forward to using Scrivener while writing my book.

I emailed Scrivener Technical Support for help and here is the reply that I received:

[i]In general, it is possible to import just about any file type into the research area, although not all file types will be viewable in Scrivener. However, the major exception is editable text file types: specifically, RTF, DOC, DOCX, and ODT. These file types are all put through Scrivener’s converter so that they are changed into RTF files (an existing RTF file should mostly be imported straight in). This is done specifically to make them editable (Scrivener would have no way to display a native Word file, let alone let you edit it). There is no way to disable this automatic conversion.

Normally this isn’t a problem. However, a PDF file that has been de-constructed into a Word file isn’t going to be structured like a true Word file, and I suspect that is what is causing the problem here. In truth, PDF files are deliberately designed not to be deconstructed. They are intended as a read-only “distribution format.” While there are tools that attempt to backward engineer a PDF file into an editable format, in my experience the results are less than satisfactory. Text tends to have “hard breaks” at the line breaks, and text that was displayed in “columns” or other non-linear arrangements can wind up scattered out of order.

Regarding the images, I’m afraid that is a limitation of the importer. The importer is only intended as a brute-force way of getting your text content into Scrivener. But it has limitations, and one of them is that it isn’t able to import images.[/i]

If anyone has any other suggestions for me, I would love to hear them. Thank you.

MimeticMouton · July 13, 2015, 12:30am

What type of images are in the document? Word supports more embedded types than Scrivener, but it isn’t a given that the import will not include them–most embedded images do come across into Scrivener. Are these some form of vector graphics? You may be able to convert them into a JPG or PNG image in the Word file and successfully import them into Scrivener that way.

It is also possible to copy text from PDFs; you’ll need to use the crosshairs to get a block selection–the PDF tool used currently doesn’t have a simpler method of selecting a line of text like you would in Adobe–then right-click to copy the text. You’ll likely still face the general limitations of the PDF format as noted in the last post, with hard line breaks and so forth (I’m assuming the Word conversion is to get around that), but maybe it’s a worthwhile trade off for viewing the PDFs directly.