New user has two questions, please

  1. is there a way on here to take a few hundred Word files and merge them all onto a document without opening each and every one up?

  2. is there a way to check a long document for duplicate sentences or passages (not entering each one individually, but scanning the whole document and finding duplications)?

Thanks for any guidance you can give a new user!

1a) You can just drag-and-drop Word documents onto the Binder area of a Scrivener project to import them. You probably want to drop those into the Draft folder of the project.

1b) You do NOT really want to merge a few hundred Word docs into a single document. That would be terrible. What you probably want to do is import all those documents into a single Scrivener project. Inside the project, each will continue to exist as a separate item – though you can then merge some of them or split some of them up as you wish once they are in there. And, of course, using Scrivenings mode you can look at any collection of them as a single document whenever you want (and also Compile them for output that way also).

  1. Not that I know of.

-gr

Thank you for your response, that’s exciting news!

If anyone does know how to search for duplicate passages or sections in a long document, that would be very helpful to hear about!

Since you are starting with Word documents, might Word’s compare documents function provide the help you need? It is really meant for versions of the same document, so there is a limit to the displacements under which it will recognize that the same chunk of text has re-occurred.

That might not be enough for your purposes. Your framing of the question suggested maybe these repetitions show up in dribs and drabs and hence you might really need, literally, something that scoured around for repeated text. That kind of functionality is the sort of thing plagiarism checking software does. If you have access to some suitable plagiarism-checking software, you might think about whether that would do what you want. (If it is a service, you might, of course, be simultaneously submitting your docs to a database of texts against which other plagiarism checks would be done – and you might not want that.) Just free thinking aloud here.

gr

If you’re on a Mac, you might have a look at DevonThink Pro. It’s a full text database, and as such can find both true “duplicate” and merely “similar” documents. It’ll also do a “concordance,” showing the most frequent words in the database and where they are found.

On the other hand, since I don’t know exactly what you’re looking for, I don’t know if DTP’s “fuzzy search” is sophisticated enough to find it. And if you don’t need a full text database for other reasons, it’s a pretty big hammer with a pretty big learning curve to experiment with.

Katherine

Thank you all!

What I’m looking for specifically is that I created one large 800 page file of material from multiple versions of my book, and now I want to go through it and see where I accidentally duplicated sections, paragraphs, etc. So it would be scouring through 800 pages and weeding out duplications.

Obviously we don’t know what kind of book or how many versions or your rationale for the search. So these are simply queries from another writer with multiple versions of the same book on file.

Is it necessary to remove duplications?
Their presence is unlikely to be challenged unless you plan to publish more than one version.

Won’t you have to re-write any duplications?
Removing a section without re-write may disrupt the narrative.

Might duplication of a section not suggest that it is a version you genuinely favor, and is particularly well done?

ps

Since you are talking about multiple versions of the same book, then Word’s compare function can help you. But stacking all the material into one document is counterproductive for this purpose. What you want is a separate document for each version. Then you can compare them two at a time using Word’s compare-documents function (which produces a third marked-up document).

But maybe what you had at the outset was not multiple separate versions of your whole book, but more likely, multiple re-workings of various parts of it in multiple Word docs, and your 800 page doc is a kind of shuffling together of all of them. If that is what has happened, then Word’s compare function will not be so helpful at this point.

If what you are trying to do is bring order to a snarly mess of overlapping versions written and rewritten over time, then the following story may be of some use to you.

I hope what you are dealing with is less snarly than this case was, but maybe this story will give you some ideas that might be useful in bringing to heal your own unwieldy text circumstance.

Best,
gr

Without buying additional software, this might help if you are searching for duplicate passages…

  1. copy entire text to a single document

  2. sort document by paragraphs alphabetically, allowing you to scan read the text to identify duplicates in the copy file and to make edits in the original(s)

If you want to compare on a sentence level…

  1. copy entire text to a single document

  2. search for full stops and replace them with full stops and a single paragraph return

  3. search for question marks, exclamation marks, etc, and replace them with question marks, exclamation marks, etc and a single paragraph return. If you end sentences with other marks, such as closing quotation marks, you can search for the appropriate marks and a space and replace them with the same marks and a single paragraph return. In this way, you will chop your copy document into single sentences

  4. sort document by paragraphs alphabetically (each sentence will now constitute a paragraph for your needs), allowing you to scan read the text to identify duplicates in the copy file and to make edits in the original(s)

If you did all of this in a table with a numbered index column, you could sort the file alphabetically, make changes, and then reassemble the text by re-sorting on the numbered index column. Probably not what you want to do with an 800-page work.