2 questions: Actual S. file names; PDF indexing

dbc183c7 · November 28, 2016, 5:39pm

From within Scrivener, is there a way to see Scrivener’s number ‘name’ for a document?

Would there be much of a quickening of the ‘word’ indexing speed or of its being searched were I to convert the content of the many PDF documents I have in my projects to plain text?

dbc183c7 · December 12, 2016, 11:52pm

Reading today about the differences between the Mac and Windows Scriveners,
my having complete plain text extracts of the PDFs in my project instead of the PDF docs/files themselves might serve a purpose more than merely speeding up possibly the indexing * and probably the project backups:
It would for Windows users actually include the contents of the project’s PDFs in our searches.

*From what I read, it does seem the answer to my query above is contrarily that using complete-text extracts of the PDFs would in fact increase the time of my indexing (depending of course upon the amount of new text)–because the PDFs are not now indexed in the Windows version?

True? … At least until the Windows v3 catches up with the Mac v3?

g

kewms · December 13, 2016, 12:30am

No, it is not possible to see Scrivener’s internal file name for a document from within Scrivener.

Nor is it recommended to use that name to access the document from any other tool. The only supported way to manipulate the contents of a Scrivener project is with Scrivener itself.

Some PDFs include a plain text transcription as part of the file. Beyond that, I’m going to leave any comment to someone more knowledgeable about the Windows version than I am.

Katherine

AmberV · December 13, 2016, 5:57am

I’ve never seen the Windows version read text out of a PDF. A simple test in a blank project with one PDF and a project search should confirm that. It’s not really something we can fix, that’s a limitation in the PDF toolkit used, so whether that is resolved in v3 I can’t say. We’re working to establish feature parity, we cannot ourselves establish parity in the programming environment however.

The main question I would have however is: why is generating the search index a problem that needs optimising? This ordinarily should not be done unless the program encountered an issue (crashed session, detected inconsistencies in the data set). Normally the index is built and subtracted from as you use it.

As for the backups, yes, PDF is an extremely inefficient way of storing textual information. If you can get by without using a printer format to convey text, then go for it! If you have a system for doing so, it should be relatively simple to create an A/B test where one project is PDF is the another is text. Which backs up quicker? If you must frequently do so, is the extra text load worth it for rebuilding the search index?

dbc183c7 · January 2, 2017, 5:59pm

From within Scrivener, is there a way to see Scrivener’s number ‘name’ for a document?

“No, it is not possible to see Scrivener’s internal file name for a document from within Scrivener.
Nor is it recommended to use that name to access the document from any other tool. The only supported way to manipulate the contents of a Scrivener project is with Scrivener itself.”

Thank you Katherine, but I know that fiddling with indexed data outside its ‘ecosystem’ is laden with data-confusion possibilities: Such fiddling was not in mind. (Though its repetition does bear doing.)
The reason I asked was that, since Windows Scrivener does not yet have a way to show my 20k documents/notes in a date sequence, I do that ordering outside S., in Windows Explorer. And having done so for a particular project’s files, it would be of use to be able to tie directly what Explorer ‘knows’ to what I can ID in S… (Without doing string project searches based on Explorer’s file content display, numbers being shorter than file-unique text strings.)

(I remember reading a while back that the outlining for Win/S. v3 might/will allow column/metadata click-sorting of the rows. However, that is months down the road, if at all. (cf. Note Ioa’s understandable caveat below about the possible limitations of the Windows toolkit.))

dbc183c7 · January 2, 2017, 9:30pm

Would there be much of a quickening of the ‘word’ indexing speed or of its being searched were I to convert the content of the many PDF documents I have in my projects to plain text?

“A simple test in a blank project with one PDF and a project search should confirm that. …”
Thanks Ioa. I had done test searches for PDF text in my principle project: Got no ‘hits’.

“… Why is generating the search index a problem that needs optimising? … Normally the index is built and subtracted from as you use it.”
As I have posted/implied(?) elsewhere, even after a clean boot of my i5/8GB PC, with Outlook (~ 1GB *.PST) and a many-tabbed Chrome not yet running (my principle other applications), Scrivener’s first project search runs 2+ minutes, after which searches are indeed nearly instantaneous as you suggest. (Re “first project search”: My prior ‘writing’ application apparently did its indexing during its startup or carried it over as a discreet file. (It was however, a DB affair, not a many-file arrangement.))
But slow, apparent re-indexing does happen occasionally at other times too during Scrivener sessions, after I return to S. from other web or large-file apps, or when I use them moderately+ with S.: The first project search back into S. does run longer than ‘nearly instantaneously’, its length perhaps dependent upon how much I’ve been using the other apps. ‘Clearly’ in these cases, there is RAM-Swapfile activity involved.
(I am not a pure/dedicated/focused ‘writer’, often as much a researcher who takes notes, and writes and emends extensive considerations and conclusions as they occur to me.)

And thank you for the “PDF is an extremely inefficient way …”. Though the PDF text copy/pastes badly formatted, being able to search it as I knew I could, with the added benefit of qualitatively faster backups * makes a routine PDF copy/paste TXT/RTF conversion a slam-dunk. (Even for this 5’7 person!)

*For my principle project (text: 300 MB), backups vary from 1+ minutes to nearly 5.

I’ll admit that I could have easily run a short A/B test, but given the number of PDFs that I would want to convert to TXT/RTF for the many-file test I’d want to run, I decided to see if someone already knew.

Converting PFD text to TXT/RTF for search purposes, I’d only import a PDF once a general search found it of use, which PDF I would likely set as a sub-document of its search-copy, or again for backup-time reasons, attach it.
(Now all ‘we’ need is an easy way to remove inconvenient PDF formattings in the copy/pasted text!)

I thank you both for your comments & time. (And patience.)

lunk · January 2, 2017, 10:00pm

A thought after having read your “thinking out loud” about your needs. If you have a lot of PDFs. wouldn’t Devon Think be a better choice of software, or even Papers 3? Both handle both organizing and searching pdfs…

AmberV · January 2, 2017, 10:34pm

Alas, Windows is bereft of such an option.

lunk · January 2, 2017, 10:50pm

Oh…

devinganger · January 2, 2017, 11:06pm

Depending on how heavily used it is, OneNote might be a suitable application for Windows users…but it’s not DevonThink.

brookter · January 3, 2017, 3:26am

Nota Bene is an (venerable and highly respected) academic suite which includes a text database (Orbis) with somewhat similar AI capabilities to Devonthink - and it can now index PDFs, apparently. The rest of the suite includes a word processor and a very capable reference manager – and all the elements are very closely integrated.

Orbis is a free-form text-retrieval system that converts your computer from a glorified typewriter into an indispensable tool for organizing your correspondence, research notes, lectures, field notes, lists, or articles and books you write. It does so by making everything you’ve ever written, along with data imported from outside sources, instantly accessible as you work, without requiring you to define keywords or remember filenames. Simply indicate the directories and/or documents you want to search, and Orbis will manage everything for you automatically, keeping track of files as they are edited, added, or deleted. Then, while you’re writing, Orbis can instantly show you every passage (from up to eight million different files!) that contains the given word or combination of words, in a simple and elegant table view (with keywords highlighted like a “keyword in context” concordance), with the full text shown in another window. You can then insert any passages into your open document with a single keystroke. (Or select a fragment, and paste it in.) And—most amazing of all—if the text retrieved is a note linked to an Ibidem bibliographic record, Orbis can automatically insert the correct citation into your document. You will never need to worry about inadvertent plagarism again! By finding long-forgotten passages or displaying new relationships between disparate texts, and keeping track of sources, Orbis saves you from the rote, mechanical part of your job, all the while serving to jog your memory and excite your imagination.

That’s from their website: http://www.notabene.com/brochures/orbis.html

It’s a Windows application but can run on the Mac as they produce a bottle Wine version. It’s expensive though. There is a 30 day trial version.

I’ve tried it out a couple of times and it’s an impressive suite. I think Devonthink is a better all-in data bucket and AI tool (it can cope with a far greater range of document types), but it lacks the seamless integration of the Nota Bene suite. If some malignant entity were to force me back to Windows, then I’d buy Nota Bene to try to replace Devonthink. Not sure what I’d do about Tinderbox though.

DavidR · January 5, 2017, 3:55pm

Just want to say +1 on Nota Bene–with full disclosure that I’m a long-time user and supporter of the software. It’s designed for researchers and writers, rather than office workers, and works well within that framework. I find that organizing material in Scrivener and then transferring to Nota Bene makes for a very successful workflow. But if you need to be able to search PDFs, then NB would be a good place to start. I don’t want to hijack the thread to promote other software; feel free to PM me with any questions.

dbc183c7 · January 9, 2017, 6:31pm

Thanks Lunk. Yup, in some ways, we over here do have fewer options.

And thanks Devin. I could use OneNote to organize and search the many PDF articles I have, but I’d rather search my PDFs (their content) at the same time I search Scrivener’s RTF files, without always duplicating searches or deciding each time whether ‘this’ is something that might have wider implications: I enjoy the serendipity.

Thanks Brookter & David. I do have NB, its v11+, just short its v11.5, which is needed for the latest upgrade of Orbis: PDF, DOCX, etc. searching). It is good to know that it can be similar to DevonThink; but it still requires (required?) that I search separately its ‘textbase’ and whatever document I had opened – I’d pestered its support manager endlessly about integrating the searches (given its cost) – and it’s textbase did need periodic reindexing to keep it up-to-date though the latest updates have reportedly qualitatively sped up the process.
After trying it for about ten months, I came to S.: The latter does not have quite as many features, but it is more friendly to this user and for whatever reason, more stable. (NB also requires my converting my writing to its own format, and a much greater effort to import it from Info Select than S. does and also a similarly difficult effort to get any file structure I had in it, out.)
You are right though Brookter: It is a full, useful, suite (every once in a while I consider using this or that function of it – eg, Ibidem);
And it is pricey (driven mostly by its small user base?). :mrgreen:

Thank you all: As noted, the generalist that I shall always be enjoys serendipity, and I will therefore get much less done than otherwise – but with many more digressions!
But I do smile a lot!

PS: The reference above about the PDF-copy’s poor formatting – That ties back to Nota Bene … NB has a keystroke that works to remove hard returns from a user’s selected text. (And Info Select [infoselect] has a Find/Replace that does a very good job with several formatting characters.)

dbc183c7 · January 27, 2017, 9:00pm

As noted above, thank you all.

So I’ll close with the comment that …
It would be of use—to me, in Windows—were the documents’ internal ‘name’ among the metadata. (Though I do realize its being obtainable would open the door to the possibilities of extra-Scrivener file tampering ).

g