Search documents by All Whole Words

Julian_M1 · December 11, 2022, 2:46pm

Please could you add a real All Whole Words to the search capabilities

Rationale:

It seems not widely appreciated, though it is correctly described in the manual, that the “words” in “All Words”, and “Any Words” searches are not “words” in the sense of word as would be found by a regex between “\b” boundary delimiters, i.e. they’re any/all sub-string searches (maybe the current labelling arose because the search terms are seen/entered as space-delimited “words” in the search box, and it might be a tad too technical to refer to sub-string searches?)

This led to much confused discussion, and by way of workaround, development of an external software utility that would do precisely this.

It is needed as a separate option because it is not practical to do it in Scrivener using other tools: regex is the obvious choice, but creating a multiple word, any order regex, is time consuming and error prone. The workaround of doing a single word search “hoisting” the results, then searching within them, and iterating is just about viable for two words, but impractical for >2.

Use Cases:

Searching for people, things, adjectives etc. that occur frequently as substrings of other words. For example, I want to find all documents containing “John” and “Liz” excluding “Johnson” and Eliza", or “tab” and “hair” without hitting “table” and “chair”.

Even with my external utility, I cannot then go directly to the found documents (Unless there is an API to call Scrivener to open a particular content.rtf so that it is not some anonymous document but is properly located in the binder, etc.?)

NB One could also use such a capability to do character & other inventories based on user supplied lists; that would also be handy, e.g. for finding out in which document A & B first meet, or for continuity checking on things.

Extra

I don’t think this should be hard or time consuming to do, he said, speaking like a fully-detached manager: all * one needs to do is iterate over all docs in scope, repeating whole word searches per doc until either they all match or one match fails (early out). You already have the document iteration capability, the results presentation capability, etc. all you need to add is a multi-pass whole word loop per document. (* +UI update, documentation change, test…)

Absurd Complication to be ignored (Q: so why am I mentioning it?)

For myself, I am thinking about checking to see whether the search term is referred to within quotes, or “exists” outside them, then I’d have an even better idea who was in a scene and who had been referred to

Answer: so I can find this idea again

Thanks, Julian

JenT · December 12, 2022, 3:46pm

Perhaps I need another coffee this morning and am not fully comprehending the issue here (a baby kept me up a lot of the night, so this is entirely possible), but in my experience, the whole word search only searches the exact word entered in the search. I tested your example of John vs Johnson and searched for John using the Whole Word Project Search and only returned instances of John, not Johnson.

Julian_M1 · December 12, 2022, 3:58pm

Sorry to hear about the sleepless night - ditto here (not baby but seasonal cold & hacking cough )

Anyway, hope you got some coffee… and to clarify: not “Whole Word” (one off) - that works fine. If you look at the linked thread you will see how the very common confusion was eventually acknowledged

I would like (sorry for shouting but the emphasis does seem appropriate) ALL WHOLE WORDS, a concept difficult to communicate because of the (understandable) misapplication of “Word” in “Any Word” and “All Word” searches.

See the manual II.I (p213 in my version, quoted below) and note the distinction made under “Whole Word” showing that the other searches do not search for Whole Words but take whole “words” as search terms, so an All Words search is not All Whole Words but rather All ‘Words’ as sub-strings, where Word is a set of characters between spaces in the search box

Operator

Select the method by which your text will be used by the search engine:
— Any Word: the default search method. Queried documents must contain at least one of the words typed into the search field. Analogous to logical OR.
— All Words: every word entered into the search field must be present. Documents which only match some of the words will not be returned. Words can be entered in any order. Analogous to logical AND. You can also enter double-quoted phrases mixed in with single words, working in the same manner as Exact Phrase, below.
— Exact Phrase: what you type into the field will be queried precisely as it is typed in. “The book” will only match documents that have the phrase “the book” as written, not documents that just have the word “book” or the phrase “book the”. It will also return documents that contain “the books”. For exclusive matching, use Whole Word.
— Whole Word: unlike any of the above search methods, the term supplied will only match whole words. A search for “Jo” will only return documents with that word, not documents that also contain “Jocelyne”.

drmajorbob · December 13, 2022, 7:34am

The feature could be useful, I admit, but I suggest searching for Johnson (for instance) and assigning a keyword called “character” (for instance) to every document matched. Do the same for all the words you’re interested in, and a search for the character keyword gets you what you originally wanted, without regex or nested searches. For a dozen characters, I could probably do it in under a minute, and then I’d save the final search as a collection.

Kevitec57 · December 13, 2022, 11:32am

Har-har matey, har would never find hair.

Julian_M1 · December 13, 2022, 11:51am

PS E&OE (fixed it )

JenT · December 13, 2022, 3:19pm

Sorry, I started to reply to this yesterday and got interrupted. Your explanation did clarify it for me—just mushy mom brain, not a bad explanation on your part! And you’re right, I can see why a person would want to use that.

I was going to suggest something similar to @drmajorbob for a workaround—using Inspector tools like Keywords would allow you to get something similar functionally with a little extra markup work on your part. This is, of course, easier to do as you’re going along; a bit cumbersome to go back and do later.

Julian_M1 · December 13, 2022, 3:24pm

No worries Information housekeeping is hard at the best of times, and when the the focus is elsewhere even important stuff gets forgotten…

The alternative suggestions for specific use-cases are all worthy in their own ways, but the general capability would be worth so much more – and a very natural completion to the existing search facilities.