Exclude documents from Search Index?

Greetings

Scrivener’s searches are accomplished via search.indexes in the project’s File folder. Could Scrivener give users a way to toggle documents for inclusion in this index, or set a top-level folder within which text indexing would be bypassed?

I have about 7800 documents in my project, lots of articles and so on. The overhead of cramming their text into a single xml file would be intolerable. Fortunately, Scrivener seems to know this, and indexes only a few hundred of my documents. But I can’t figure out how those documents are chosen. Is it based, perhaps, on how quickly I click “Abort” on the Project Backup popup when closing Scrivener? Or perhaps there’s an unstated maximum number of documents that Scrivener will index.

As most users don’t want gaps in their indexes, I’m concerned that one day Scrivener will be adjusted to make its indexing more vigilant. This would make for long delays and enormous overhead, for a capability I don’t need at all.

So I’d like a way to make the exclusions explicit. Best would be a checkbox to include or exclude documents from the text index. Also a project option to maintain a minimal search index file, with only titles, no text.

Just as we can exclude certain documents from the Compile, so also should we be able to exclude documents from the Index.

Thanks for considering – Jerome

Well first of all, you shouldn’t be in a situation where only ~100 items are indexed. That is certainly a problem in and of itself, and not something that should just be patched over with exclusion properties. Have you confirmed, at the XML level, that there are only around 100 top-level entries, or is this something you’re observing from the search feedback? I’m asking because the search index is used for more than just searching. The icon state for example is calculated using the search index. One would expect the Binder to be a bit “unstable” if there are index problems.

In my experience, problems with performance in large projects can occur, but it’s usually only at very extreme cases. It’s hard to say if you would qualify for that, since 7,800 files may only be 80mb, whereas another project may only have a few hundred items in it, but a raw 30gb of pure text, which would make for an intolerable amount of delay when loading and closing the search index. Fortunately that many gigabytes of research is quite rare. In most cases, plain-text being so dirt cheap in terms of bytes, this isn’t a problem even for huge projects like yours.

That doesn’t necessarily mean there may be no merit in a exclusion switch, to be fair, but I think it would be better to spend development time on optimisation and correcting whatever is wrong with the truncated search index, first—and then consider adding features on top of it, if that makes sense. I’ll add the exclusion idea to our list for consideration though. There are other reasons why that could be useful, beyond performance.

Hi, Amber. Yes, this is verified at XML level. There’s no stability issue, and, in fact, no evidence Scrivener is doing anything wrong, given that:

  1. I haven’t been searching, and MM advised that Scrivener would rebuild the index on a search if it needed to.
  2. I usually click “Abort” on the Backup popup when closing down, and have read that’s also an index rebuild time. And of course I haven’t been running manual rebuilds.

The XML for search.indexes has a full set of Document and Title pairs, from ID=3 to ID=7819. But it has an incomplete set of Text pairs. A closer look at the pattern shows full Text pairs from ID=3 to ID=163, and beyond that only 24 more entries with xml Text pairs, out of at least 7000 entries with text.

There’s no loss of stability because the remaining entries are indistinguishable to Scrivener from documents that actually have no text. Nothing’s malfunctioning. It’s only that Scrivener is loading significantly more slowly in 1.7.0.4, and I thought it might be struggling with index entries for a search I’ll never use. Hence the request for index exclusion capability.

Best Rgds – Jerome

Update: a way to suppress text indexing that seems to work quite well. I replaced search.indexes with a bare-minimum XML of the same structure:

<?xml version="1.0" encoding="UTF-8"?>
  <SearchIndexes Version="1.0">
    <Documents>
      <Document/>
    </Documents>
  </SearchIndexes>

Then fired up Scrivener. On exit, Scrivener created a search.indexes that had a full set of 7800+ Document pairs, but only a few Text pairs, all associated with the first document in the Binder. So I then put a blank unlinked doc in the first position, did it all over, and wound up with a clean, titles-only index file.

I expect that adding or updating documents within Scrivener will create new Text entries for those docs, but that Scrivener will not attempt to add the full set without my running a Rebuild. Easy enough to reset to the minimum, should the index get too cluttered. There’s clearly a performance gain (of course it’s at the expense of search ability.) And search.indexes is down, for now, to about 1 MB in size.

Since tinkering with system files is never recommended, I am of course still hoping for an exclusion capability within the program. But this method is filling the bill for now.

Rgds – Jerome