Search returns false positives

annekaelber · June 29, 2023, 10:28pm

I am having trouble with the search (the one over the binder). I’ve read the documentation and searched here to see if someone else might have already asked, but nothing has helped.

My search I want to find ALL instances of the # character in my main binder (renamed “Journal”). My search settings are All, any word, include, exclude. In the results, I’ve come across a couple files where the # character is not present. I’ve looked at the title, the summary, the comments/footnotes, metadata, everything. I can’t see what I’m missing OR how I could improve my search.

Context I’m going through my writing journal to standardize my use of hashtags. I did not limit my use of the # character to just hashtags, which will in the future return results I don’t want. I’m changing out instances like “#3” to “No. 3”. My hope is to make it so I can find specific things more easily (otherwise, why keep the journal, yeah?). I don’t generally randomly read old entries unless I’m searching for something. But I know I have lots of “ooh, good idea, save for later” things scattered throughout.

If there are “false positives”, could there also be some files that do have the # character but are not coming up in my search? I would appreciate any guidance you might offer (even if it’s to say, “That’s a dumb way to do that – try my way.” As long as you tell me about your way! )

Thank you for reading!
Anne
(eta: typos)

Vincent_Vincent · June 29, 2023, 10:35pm

Since you are doing this with the goal of editing those that you find, I’d say you should rather display all of your documents as a big scrivening, and use Edit/Find/Find... instead, as it will navigate you to the exact spot, and will also have the occurence already selected and ready for you to replace or edit it.

As for project search (what you are currently using), I’ve never heard of false positives.
Either your settings ain’t quite right, or you have a # well hidden somewhere you don’t suspect. (Metadata orKeywords, for e.g.)

annekaelber · June 30, 2023, 12:16am

I tried it that way. Man was it a pain to navigate though. It kept hopping around and didn’t always land on the hash mark (if there was one). And opening the Inspector to check there for the character would re-size the editor, and I would no longer see where my cursor was. I had to click on previous and then go forward again, so I could see it.

I noticed something else that might be related: Every “false” positive document had image imbedded. Don’t know if that’s anecdotal or potentially helpful.

Julian_M1 · June 30, 2023, 12:59pm

I think the use of Scrivenings in this way is good in pricinple, but I found recently (3.1.4.1+) that repeated searching within large scrivenings caused Scrivener to crash - for me. It has been noted here - a messy post, I admit, but it is a repeated search issue.

CORRECTION: that issue was temporary and is not now reproducible any more. Feel free to search in Scrivenings but note that dialog and F3 repeated searches work differently at present (for me at least )

Julian_M1 · June 30, 2023, 1:02pm

You could try turning on invisible to see whether there are some lurking #, perhaps in URL anchor references around images.

I’d also recommend using REGEXES when having difficulty finding specific things. BUT since I am useless at them and have no inclination to learn how to write them I asked Chat GPT to “Provide a PCRE compliant regex to…”

Usually works… just omit the leading/terminal “/” from the code it provides. NB case sensitivity is controlled by the Scrivener menu, not any regex switch.

Julian_M1 · June 30, 2023, 1:06pm

Including Synopses, Notes I expect, but just in case.

It can be frustrating not knowing where Scrivener has found something; I think I posted something about an obscure location that gets searched but isn’t obviously searched a few months ago… if I find it, I’ll surface it for you.

It’s here, but after re-reading I don’t think it will help in this case.

AmberV · June 30, 2023, 6:12pm

The latter! I created a test project that had a positive and negative text match, and then three documents without the hash as text, each testing embedded, disk linked and binder linked images. I then ran a search for “#” and got four hits: the positive text match and the three image tests.

Upon opening the Files/search.indexes project file, I found images are marked with a single “#” like that. Here is the search index for one test document:

<Document ID="80508D6E-97ED-4217-8FD8-30DF102BF35C">
    <Text>Negative match with linked image.
#</Text>
</Document>

The test image was on the second line, after the first bit of text. So I’m afraid that is quite unavoidable in some cases. It would depend on whether images tend to be on their own on a line, and the hashes you are looking for aren’t on their own line.

From what you describe wanting to do, they may well be easier to look for. Someone looking for old scene-breaks might have a harder time. But if you’re looking to standardise away from “#3” as a format, then setting your search parameter to RegEx and searching for #\d+ would be much more specific of a thing to look for. Any hash followed by at least one or many digits. # ?\d+ would be a bit more flexible, allowing for the potential of a space between the hash and digit sequence.

annekaelber · July 1, 2023, 12:24am

Once again, I’m tickled that I was able to provide enough for you to test it, too.

Understanding that is the likely cause is enough for me. The other aspect of the standardizing was the hashtags I’d already randomly used with no plan – and combining some unnecessary ones into one tag, etc. I’ve copy/pasted a list of my current hashtags to the Notes window, in the hope that I’ll refer to it if/when I forget a particular hashtag.

I feel like I might be ‘vacuuming the cat’ (again), but at a minimum, going forward, I’ll be able to apply hashtags and find them again in the collections I’ve created from the hashtags.

Thanks to all for the input!

Vincent_Vincent · July 1, 2023, 12:54am

Perhaps (since you are cleaning up) it’d be best to consider using a symbol (any symbol) that doesn’t mean anything in markdown… (?)

You just so happen to have picked the worse one to pick in my opinion.

annekaelber · July 3, 2023, 7:10pm

You make a very valid point. I was being (mentally) lazy and just using the “hashtags” concept like I’ve seen it elsewhere. But there is no real reason to use the hash mark over something else… Hmm.

Before I go too far, I’ll see what symbols won’t conflict.

Vincent_Vincent · July 3, 2023, 7:44pm

The first thing you see landing on Babel’s website is… well… looks like it is a stupid kid’s thing.
So here is a screenshot that better represents why I linked to this app:

AmberV · July 3, 2023, 7:45pm

You should be safe with something else. I’ve had a look into this, and that aspect of converting images to “#” is a deliberate and unique step taken for one purpose: so that binder items with nothing but images in them continue showing a “filled page” icon. For performance reasons, we use the search index to populate icon states (and a lot of other things; index card preview text and so on) as the alternative is scanning each and every text file.

That said we’ll give some thought to making this better. For one thing it shouldn’t be finding those as matches for the actual punctuation mark. The Mac version already does filter image results out somehow. It’s just a matter of figuring out a good performance-friendly way of doing that on another platform.

annekaelber · July 3, 2023, 10:46pm

That image reminds me of my knit and crochet symbol fonts.