Matching text

It’s such a great feature. Would there be an option somewhere to save the text matching results once the matching process is done and to update them only when needed, manually? Presently, the matching process is initiated at the document selection or when the tab is activated even if there is no need for the results to be refreshed. This process can take some time for larger projects.

If that is not yet possible, would you consider this as a possibility for future versions? It could be an interesting feature for research documents—static documents that would not need to have a matching text process automatically and regularly updated. Thank you.

You mean to keep an index of possible search strings and their matches? A hash table?

I realize my question was not very clear: I am referring to the matching text function in the bookmark panel that allows to search the project database for similarity in other documents.

The way this function works now is that when a document (A) is selected and the bookmark tab activated, the search for matching text is automatically launched for the document (A) in view. Once the results are found though, if I switch to a different document (B) and then later come back to the document (A), the search for matching text is launched again from zero. My question is if there is an option somewhere to keep the results of a matching search once the matching text search has been executed for a specific document?

The only way to turn the search off is to dismiss the panel from view. So switching it back to Bookmarks, until you specifically want to see it again, is probably the best approach.

In all the time I’ve extolled the virtues of Scrivener’s boomarks feature, I never even noticed that besides “Document bookmarks” and “Project Bookmarks”, the dropdown list included “Matching Text”. Likely, I wouldn’t know what it was for without looking it up in the manual. I’ll have to keep that one in my back pocket for the next time I merge two inter-related projects that might have shared some text.

Speaking of the manual, @AmberV , paragraph 2 of section 13.4.4 contains the word “identity” in the phrase “looking to identity items” when I’m pretty sure you meant “identify”.

1 Like

Thanks Amber. I guess you have indirectly answered my question, which was about the ability to save the search so it does not have to be redone each time. I find the matching text results quite useful but as I have a large database, it takes a bit of time to display the results each time. Could the ability to save the results (i.e. not refresh the results) be a feature in one of the next iteration of Scrivener?

It is a wonderful feature. The compared text is even highlighted to show where there is an actual match down to the word level and the results are ordered by degree of similarity.

I would suspect it wouldn’t be too hard to add that. At the moment the limiting factor appears to be that you can only select one line at a time in this view. Curiously you can already copy, which produces a hyperlink to the item you copy from. So if you could Select All and then Copy, you could then paste that into Document Notes or somewhere convenient. But I can’t say whether that will ever actually happen, it just looks like most of the ingredients are already there.

Thanks! It looks like I already fixed that typo at some point, as it is reading “identify” in the source.

Indeed, I just tried it. It is a feature that also works from the binder actually.

Select All and Copy would not display the results of matching text though, it would create a list of links to all the documents listed in the results. Very simply (or as simply as I can put it), what I am really after is for the search results to be persistent until manually updated.

Let’s imagine I have a document (A) and I have done the matching text search for this document (A). If I go work on document (B) and then come back to document (A), I would like to see the results from my former matching text search. The reason again is that the larger the database, the longer this matching process takes. So if the matching search for each document is saved and only updated when manually triggered instead of being automatically updated at each document opening (at least as an option in the settings if not by default), then it would save processing time and probably energy resources too, although it could become a memory usage issue.

This Feature May Cause Excess Battery Usage
If you are using Scrivener in a mobile context, you will want to leave this list off unless using it, as it will incur a significant energy penalty, by having to trawl through the entire project text every time you inspect a new thing.

Oh, I see what you mean. I’m not sure how that could be done in a way that didn’t seem buggy—particularly if you are using this tool to resolve duplication. If it didn’t update until you clicked a refresh button, then the results would appear confusing—like why is there an entry in the list but no lines actually highlighted as being the same when you click on it? It just seems to me as though we already kind of have that, just not in a format that could get confusing or require excess clicking during stretches where you always do: turning the pane on only when you need to reference it.

Oh yeah, and I even added a yellow box warning in the documentation, about leaving this pane on while using a laptop on battery. It’s certainly going to be a drain to be running this constantly. Shouldn’t be much of a memory issue in theory though, as Scrivener cleans up loaded content after it hasn’t been edited or viewed in a while. Even projects that have been open for weeks, where you use Scrivenings mode a lot, shouldn’t be holding much more than what is needed to do recent edits. And the trawl itself is not done on the disk, but to the search index which is held in memory always. It only goes to the disk if necessary, and to load the preview area.

By the way in case the problem is friction with turning it on and off, as I don’t think the interface actually telegraphs this information anywhere, but this is all fully wired up with shortcut keys:

  • ⌘6 always flips between Project & Document Bookmarks, even if you’re viewing Matching Text.
  • ⌃⌘6 always selects Matching Text.

So in practice it is really easy to only switch it on when you need it.

Thanks for your thorough answer, Amber. This might be a case of a picture being worth a thousand words as it would certainly offset English not being my primary language. So I appreciate your time and patience on this.

So to clarify: the matching text results for document A would not be persistent whatever document would be selected. Matching text results and document would of course stay synchronous: Document A > Matching text results for document A; document B > Matching text results for document B. I mean by persistent that once the matching text search is done for Document A, whenever I come back to Document A, the matching text results from the last time I did the search for that document would be displayed without a new search having to be triggered. Again this is all to avoid the ‘processing time’ of thrawling through large database.

I think I understand what you mean, but that my point about how that could get confusing would still apply.

  1. We look at document A, and there are matching texts.
  2. We go to one of those documents that had matching text, and delete the duplicate text.
  3. We go back to document A. If it did not look again, if it did what you suggest, then the document we fixed in step (2) would still be in the list. However if you select it and look at it, the reason for it being in that list is now gone.

Now you might not be confused with that sequence, you just did it after all, but say this happened days ago, and now half of the result list doesn’t make sense. What if someone does not notice the little refresh button to click on? It just to me seems like it would make using the feature more complicated to use, because you would have to remember to refresh it when it matters. When it doesn’t matter you can ignore it.

But if it doesn’t matter, why does it need to be open? This is the question I keep coming back to. If the list does not matter, and we do not care if it is accurate or not, then it can be closed.

Ah…I guess I almost forgot that nowadays I am using this function, and Scrivener in general, less as a screenwriting app than a research tool for materials that are quoting each other abundantly. This matching tool has become surprisingly valuable in this regard.

From the perspective of drafting a document, be it a screenplay or a PhD thesis, redundancy is not desirable and the matching tool is a way to spot those. So I understand when you are saying that my suggestion may create the possibility for more confusion.

My suggestion would apply to a different, arguably specific, use case, where the matching tool is used for research, i.e. spotting redundancy to shed the light on intertextual relationships in large amount of texts with the ability to navigate through them. Scrivener’s matching tool is one of the best I have used so far. The only minor inconvenience is having to do the matching search at each document’s opening, hence my request. But I understand that Scrivener’s features have to be in line with its mission statement and intended use. Thank you.

Oh I see, yeah that’s makes more sense from the perspective of how you are using it. It was designed more for cleaning up projects so they don’t have duplicates, or checking for accidental plagiarism. So definitely a tool you’d be using to track stuff down and fix them, and thus end up with less of a need to use it.

What it sounds like you’d really benefit from more is a tool that runs a big project-wide sweep once, or whenever something new is added to the binder. A different kind of tool I think, maybe even a report, where the duplicates themselves can be more easily spotted as “clusters” in a network of notes, rather than it being item-centric if that makes sense.

A macro view would certainly be very helpful! I do like the granularity of working with specific documents as well so ideally, the macro and micro would be correlated. Qualitative analysis softwares do offer this type of document analysis, at a high price tag. One often dreams that all tools could be contained in one’s favorite apps or that they would somehow all ‘communicate’ and be integrated in a way only expert programmers can manifest!

I love the matching text feature. Perhaps I am imagining, but it seems to me that it works much better than some diff tools used by git.

The downside of using it though is that i. it can take a long time to run (which is understandable) and ii. in some of my projects, it can cause Scrivener to crash.

It’s been my experience too. I have tried many and they don’t come close to the accuracy, speed and functionality of Scrivener’s. So for a tool which had kind of a secondary role in the cast of characters, hats off!

1 Like