What is the maximum number of documents a project can have?

The Manuel says: “Scrivener can (…) handle many thousands of individual components (…) Scrivener has been tested against projects with millions of words in them (…) So for ordinary usage, you will never need to worry about limitations.”

What is “ordinary usage”? What does “many thousands of individual components” mean? And what is the result of “has been tested”? I wanted to find out, so I tested it myself.

I created a monster project. 25000 documents (research material)

  • 70% of the docs are texts: Two thirds with less than 4000 words, one third with more, up to 35000 words.
  • 10 % PDF, 1 page up to 900 pages (Scrivener Manual)
  • 10 % images
  • 10 % web pages

Positive: Almost everything works surprisingly well. Writing, scrolling, navigating in the project, searching in a document, all good. Scrivener only crashed once in several hours of testing. When restarted, the monster project was restored perfectly. But the creation of the search index takes a long time.

Negative: Searching in the project is almost no longer possible. Searching with the * to call up all documents takes 8 seconds. Typing in the search field is strongly delayed.

Conclusion: Because the search function cannot be used, it does not make sense to work with a 25000 documents project.

What is the maximum number of documents that the search function can handle? To find out, I have reduced the size of my Monster project in several steps.

To be able to work/search properly, I have to limit my project to 4000 documents. The search with the * to call up all documents now takes 2 seconds. (Almost) normal writing in the search field is possible.

Summary: Too bad. A Scrivener project with 25000 documents would be functional if the search also worked.

Hint: Scrivener offers the (hidden) option of managing two projects in one (so to speak). In the trash, not in the trash. This allows the maximum number of documents to be doubled to 8000. Documents/folders in the trash can be ignored by Scrivener. Or Scrivener can only search in them. So, only half of the documents are “in use”, even though they are all in the same project. If you do it this way, you can also work/search normally with 8000 documents.

This is of course only an approximate number. Perhaps this result would have been different if I had chosen a different doc mix.

2 Likes

The manual is vague about this because there are no limits you could run into (unfathomable 64-bit numbers aside), only what time limits you are willing to tolerate during certain operations. This will mainly be search, backup and crash recovery, and how fast hardware gets over the years will change all of that. What was once less tolerable in 2012, now is perhaps fine. Backup times can be mitigated by setting up your own automation, and rebuilds by not using mobile to edit it (but the size is probably reason enough not to sync it). You can’t avoid the rare crash, but coffee breaks never hurt.

I do agree that we need a short delay timer on the search field though, or an optional commit model, the urge to show results instantly is typically the right one, but not having one does put a gradient on the scalability of the UI that is reached long before usage really would otherwise effectively need to be scaled back.

1 Like

In this forum, people have repeatedly expressed the wish for Scrivener to be able to search in different (open) projects at the same time. Some would even like to be able to search in closed projects. This is of course not possible. Not even DT can do that.

I wouldn’t object to a faster search function, of course :slightly_smiling_face: What seems more realistic to me (at the moment) is to improve what Scrivener can already do. The search in a project could be limited to different parts. Search only in parts A and B but not in parts C and D. If this could be set quickly and easily, it would be wonderful. :slightly_smiling_face:

Edit: I don’t know, maybe it would also help if you could deactivate “live” search. So you type in the search terms and then hit enter.

I may be misunderstanding you, but you can already do this, by using the Project Search ‘Search Binder Selection only’ setting, can’t you?

Yes, that’s right. But I find it inconvenient to work with this menu and to have to change the settings in it. Before searching and after searching.

I do it like this. some of the documents in my project have this icon :diamonds: in the metadata. This divides the project into two parts.

To exclude docs with the icon from the search, I type this.

To search only in docs with the icon, I type the opposite.

The problem is, to exclude documents/folders from the search, Scrivener has to search for them first. :joy:Which is of course a disadvantage because it slows down the search. The trash method has an advantage here, if only it were easier to set.

Can you try this, please?

Create a collection from your diamond metadata. Don’t search for any text or anything else. You just want the docs with metadata = diamond. Save the result as a collection. (This is a one-time step, of course).

Now, whenever you just want to search in the diamond docs, click in the collection, cmd-a to select all the documents, then cmd-opt-r to reveal them in the binder.

Now you can do a ‘Selected Docs only’ search on all / text etc, without complicating the search by included the metadata element as you’ve reduced the target document number with the collection.

I can’t test how much (if any) quicker this would be because I don’t have any billion-document projects lying around, but does it make a difference to the speed?

That’s a clever thought … but a collection is nothing more than a search. Whether I click on a collection or enter the diamond in the search field, Scrivener does exactly the same afterwards.

The trick with the collection is actually very useful for complex searches for which Scrivener would need several steps. I use that too. For example this collection searches for all documents that have the label green, a certain status and the keyword “neu”. The search is not faster, but it saves me typing.

image

But typing is the right keyword. With very large projects, typing in the search field is delayed so much that it is actually no longer possible. Deactivating the “live” function would help against this. Maybe :slightly_smiling_face:

@AmberV , would it be complicated to add an option enable/disable “life” search?

1 Like

Yes, again there are probably some tactics we could take, involving modifications to the interface and such, but these are all ideas we’ve had on the shelf for a while and probably all best addressed in a more comprehensive look at the feature. I don’t feel that putting a hard commit into the current interface would complement it, and would be more a case of trying to accommodate a rare edge case with something that isn’t in harmony with the design.

I see, I was just asking.

Thank you for your bravery!? Are the issues you describe Scrivener specific or are they Hardware and OS problems? Do you work from local primary SSD or HardDrive storage? Or are you working from a Dropbox saved document? Mac or Windows? If Mac, what processor family (Intel or Apple native M series, how many cores). Have you tried working from this same project on a machine with more RAM? With more available storage space? I write non-fiction as well. Having easy simple intuitive fast access to a s-tone of research docs and linked notes is a fundimenatal requirement for such work. Scrivener’s search features are why I use Scrivener. I have a long wishlist for better search and sort in Scrivener, but for now at least I find ways to suffer through. Scrivener is the best I have found.

Not sure how Scrivener performs searches. There are slow ways to code a search (perform searches on the raw documents themselves). There are fast ways to perform searches (first create indices of the raw documents and then perform searches on these indices). The first method is slow but requires no fancy processing to be pre-performed on the user’s raw documents. The second can be orders of magnitude faster but requires pre-processing of documents into external indices and the necissary graph links to those indices. When should such indices be processed? Offline, when the user is not editing documents? As the user works, in the background, during idle cycles? The Macintosh uses an index based search system it calls Spotlight. Spotlight is almost useless for documents stored on the cloud. Latency becomes an issue when indices are updated in real time (in the background). Latency is also a problem when indices are processed and or stored on documents stored remotely (on the cloud). Way back in the 1970’s code was written at MIT that was the fastest search code I have ever seen. It was made into a commercial product in the late 1980’s. It would live index every drive I attached top my Mac, the searches were instantaneous (or indistinguishable from instantaneous) no matter how many terabytes of data held in its indices. No idea how this was accomplished though I suspect some sort of x-dimensional graphs or lattices. Not only were the searches instantaneous, but the found sets were displayed with equal rapidity. Even and this is the confounding part, when I searched for simple single character strings like “e”. The company and its software disappeared just a few years after it was introduced. I think it was called “On-Technologies” (could be wrong). Also, and this is important, the indices themselves were increadlby small. When I would request an index from this software, it would perform it in less than 5 seconds on any disc I fed it (of course discs held far less date than today, but processing and memory were equally restrictive). What I am suggesting is that it is conceivable that Scrivener could maintain an index of all of a user’s projects, and to perform fast searches and sorts of any or all of those projects, but it would require a more sophisticated indexing of document data. Google calls their indexing and navigation model “Map Reduce”. Map reduce was derived from Google Earth and the need to build a hierarchical model for storing and navigating (searching) infinite data sets. I belive that Google Earth’s Map Reduce code may have been stolen from code written in norther Europe at Napster.

Scrivener builds a per-project search index, and updates it as the user works. It does integrity checks/updates both when the project is opened and when it is closed. The user can also force a manual rebuild of the index. (On the Mac, hold the Alt key and look for the File → Save and Rebuild Search Indexes command.)

1 Like

Rare? To search in real world situations, on large projects?

@Randall_Lee_Reetz My advice is not to spend too much time on details.

Are you sure you have to have everything in one project?

Scrivener can handle multiple or many medium sized projects wonderfully. Otherwise, you need another app like Devonthink for research material. Many people do that. They write in Scrivener and keep everything else in Devonthink. :slightly_smiling_face:

1 Like

Yes, rare on the large project front. I don’t know if the context was described in this thread completely, but this was in response to @fto, who if I recall correctly, has A Project, in the singular (millions of words), and would like to see that use case optimised. Strictly speaking, this is outside of the design of the system, where projects are generally meant to correlate more with real-world project scopes. On the other end of the spectrum, I have hundreds of projects, all rather tightly scoped. How I find them is with a larger system that exists outside of them, so I don’t so much face the problem of needing to find word “X” in project “Y”, in a way that would compel me to combine all of the projects from A to Y together, so that I can find X with one tool—if that makes sense.

To answer your earlier question in more detail, if you right-click on any project in Finder and open the package contents, and drill down into the Files subfolder, you will find a ‘search.indexes’ XML file. Open that in a coding editor, and you will see this is a plain-text dump of all the searchable strings, from metadata down to content. This of course speeds things up a great deal because you don’t have the disk bottleneck of opening thousands of RTF files to find something, but it does also have the downside of being one mammoth XML file.[1]

This is a deliberate compromise. To go a step higher on optimisation we’d need something more serialised, less human-readable, perhaps an SQLite DB. The reason for the compromise is that the search.indexes file is, as it stands, one of the most useful tools for recovery in worst case scenarios. It is useful to both machine and human, and of the latter, even those without programming experience could puzzle it out—it’s designed to be that way.

But it’s not what you would do if you were looking to make a multi-million word “everything I’ve written since the age of 16” repository that can return results between keystrokes. That’s more where DEVONtechnologies has put their energy.


  1. That said, tech has progressed a lot beyond grep -R, or ack, so to speak. There is a tool with a somewhat eccentric named called “The Silver Searcher”, that can plough through hundreds of thousands of closed files, millions upon millions of words, in a fraction of a second. These are the kinds of innovations we might look to capitalise on in the future. With SSD speeds being what they are today, the days of indexing may be historic. ↩︎

2 Likes

And indeed, DevonThink can search across multiple multi-million word databases as fast as you can type. I love it almost as much as Scrivener.

2 Likes