Would there be much of a quickening of the ‘word’ indexing speed or of its being searched were I to convert the content of the many PDF documents I have in my projects to plain text?
“A simple test in a blank project with one PDF and a project search should confirm that. …”
Thanks Ioa. I had done test searches for PDF text in my principle project: Got no ‘hits’.
“… Why is generating the search index a problem that needs optimising? … Normally the index is built and subtracted from as you use it.”
As I have posted/implied(?) elsewhere, even after a clean boot of my i5/8GB PC, with Outlook (~ 1GB *.PST) and a many-tabbed Chrome not yet running (my principle other applications), Scrivener’s first project search runs 2+ minutes, after which searches are indeed nearly instantaneous as you suggest. (Re “first project search”: My prior ‘writing’ application apparently did its indexing during its startup or carried it over as a discreet file. (It was however, a DB affair, not a many-file arrangement.))
But slow, apparent re-indexing does happen occasionally at other times too during Scrivener sessions, after I return to S. from other web or large-file apps, or when I use them moderately+ with S.: The first project search back into S. does run longer than ‘nearly instantaneously’, its length perhaps dependent upon how much I’ve been using the other apps. ‘Clearly’ in these cases, there is RAM-Swapfile activity involved.
(I am not a pure/dedicated/focused ‘writer’, often as much a researcher who takes notes, and writes and emends extensive considerations and conclusions as they occur to me.)
And thank you for the “PDF is an extremely inefficient way …”. Though the PDF text copy/pastes badly formatted, being able to search it as I knew I could, with the added benefit of qualitatively faster backups * makes a routine PDF copy/paste TXT/RTF conversion a slam-dunk. (Even for this 5’7 person!)
*For my principle project (text: 300 MB), backups vary from 1+ minutes to nearly 5.
I’ll admit that I could have easily run a short A/B test, but given the number of PDFs that I would want to convert to TXT/RTF for the many-file test I’d want to run, I decided to see if someone already knew.
Converting PFD text to TXT/RTF for search purposes, I’d only import a PDF once a general search found it of use, which PDF I would likely set as a sub-document of its search-copy, or again for backup-time reasons, attach it.
(Now all ‘we’ need is an easy way to remove inconvenient PDF formattings in the copy/pasted text!)
I thank you both for your comments & time. (And patience.)