Finding duplicate paragraphs within one large text

brunseye · January 23, 2025, 8:16pm

I’ve compiled pieces of my writing in my ideas file (a text), which is about 300 pages long, usually formatted as lines or paragraphs separated by a carriage return and a dash. For instance:

-Lorem ipsum dolor sit amet, consectetur adipisci elit, sed eiusmod tempor incidunt ut labore et dolore magna aliqua.
-Ut enim ad minim veniam, quis nostrum exercitationem ullam corporis suscipit laboriosam, nisi ut aliquid ex ea commodi consequatur.

However I’ve noticed that through years of cutting and pasting, I have many sections that are duplicated, some of them quite long.

Is there any feature that would allow me to detect repeated or duplicated passages, paragraphs or lines (not individual words) within this huge text, without already knowing what they are (ie. typing in a specific word into the Find search)?

GoalieDad · January 23, 2025, 10:33pm

Not sure how many words could be searched at a time but could try 3-4 words and search option all and see what comes up. Would pull all documents having all the words, but could easily see where all 3-4 are together signifying the paragraph you want. I would be interested to see how that worked. Away from home so can’t try.

kewms · January 23, 2025, 11:13pm

The way I would do it is to split the document into more manageable chunks, using the dash as a delimiter. Don’t name the chunks, allow Scrivener to autofill the title based on the document text. Then use the Outline view to sort by the document pseudo-titles.

auxbuss · January 23, 2025, 11:15pm

Just adding a ref to a previous discussion: Is there any way to look for duplicate text?

AmberV · January 23, 2025, 11:21pm

The linked thread above is Windows-specific, which does not yet have the feature described in §13.4.4, Matching Text Finder, in the user manual PDF. There is also a project search function which is faster for finding full duplicates in the intended search scope (Titles, Text, etc.), which is documented in §11.1.2, under subheading Finding Duplications.

brunseye · January 24, 2025, 1:40pm

Thanks for all the suggestions! However, as I say, I’m not looking for duplicates of specific phrases that I already know I’ve duplicated, rather, any duplicates that have appeared in this 300-page single document that I’ve produced over ten years of work but am unaware of. With frequent copies and pastes and re-arrangings, I’ve noticed now that many things appear twice, and I’d like to eliminate all the duplication everywhere.

brunseye · January 24, 2025, 1:54pm

I should specify, I meant “compile” only in the sense of “assemble” from various notepads, dreams, scraps of paper etc. and not in the Scrivener-specific meaning. Poor choice of words!

brunseye · January 24, 2025, 2:00pm

So looking at that old thread, it seems as if this is exactly what I’m after: “The planned feature we have will be generally superior in every way, as it will examine the full text of the current outline chunk you’re working with and then scan for sentence-or-greater duplications of that text throughout the entire project, and generate a list of all documents containing those ranges of text, sorted by incident weight and highlighting them for you in both documents.”

And you’re saying this is only in the Windows-specific version? If I can get access to a PC, do I get access to the Windows version as a Mac user… and then I’d be able to use this very handy feature!

AmberV · January 24, 2025, 2:05pm

It’s the other way around. The user manual references I listed should be in your copy, in the Help menu.

brunseye · January 24, 2025, 2:25pm

Ah! Yes I see now. That is super useful indeed… that was going to be my next endeavor, to try to find the sources of all of the duplicates in the various other pieces of writing that are stashed here and there in the project. That function has indeed done that. Really great to know, and that will be invaluable.

What I still can’t do is find the duplicates that appear within this one very very long text.

So far, the best solution I’ve found is pasting the content into MS Word, then run a Macro that someone has developed to highlight duplicates within that document. Then I can just erase all the highlighted portions, and that should do it. Please do consider a similar function for Scrivener!

AmberV · January 24, 2025, 3:59pm

You could certainly split up a really long import into arbitrary lengths (it really doesn’t matter too much so long as they are sufficiently short), and then use the matching text feature to scan other parts of it that way. When you are done, the Merge command will bring it all back together if you really want it to be. Honestly I prefer even old drafts and research to be broken up a bit though, as its easier to navigate in the binder and link to things.

Consider importing it again with Import and Split, which can save a lot of time if there is something useful to break by, like a series of hyphens between thoughts, or the use of styled headings.