Scrivener as Dedicated Research Database

AmberV · April 17, 2009, 4:19pm

You ask some good questions. It has taken me about four years to “perfect” the system that I currently use. I don’t for one minute think it is perfect, but at its current state it is as superior as it has ever been compared to anything prior. By the way, thanks for the link to Alex Payne’s blog. What an interesting read, I enjoyed it. Odd to consider he is behind Twitter, something I consider completely frivolous and a waste of humanity on the scale of the E-Channel, but oh well. We are all entitled, as they say.

Four top-level categories: Record, Communication, Manifested, and Information. Does it cover everything in life? No, but it covers the four main areas of input and output that exist in mine as relating to a digital archive, and that is all that really matters to the system. Record is everything I take down and record. Communication is self-explanatory. Manifestation encompasses mainly creative pursuits, but can also address explanatory expositions that go beyond mere record or information. Information is mostly just documentation, either generated or collected. Things could be a bit blurry if you take them literally. Naturally all capital letter items are technically manifestations. I do not define tokens by what things are but what they represent to the future me. So if something will be useful as a reference or cross-linked to from another document for elaboration—it is Information, even if it is a Record of something, or something I manifested. I feel this way of looking at identity assignment places more emphasis on the intention of a document than the identity of the document. I don’t really care so much about what it was, but what it will be in the future when I try to find it.

Beneath that there is a strict three-level depth restriction. Nothing can be less or more. Something must have a super-category, a minor-category, and a topic or key. Example super-categories in Record are simple: Internal and External. I observe and record psychology or dreams into Internal, for example, but observe and record events into External. The minor-category allows for division of the super-category into logical parts. One of the above examples: {R2.1.Dream} is Record-2(Internal)-1(Observation)-Dream. Some branches might only have one super or minor category in them at the moment, but that is fine as it enforces a rigid structure that reduces the proliferation of hair-splitting.

The fundamentals are to reduce the top-level to four broad categories and denote them as letters instead of numbers. To only have two numbers and to keep super-types as much in dichotomy as possible to reduce complexity. Record is split into Internal and External, you cannot really address much with a hypothetical R3. Perhaps a religious person might put supernatural in there, I don’t know. And finally, the most specific category which is also the most prone to proliferation is written down as a word, not a number. So there is no “5” = this person—I just write down the person’s name. Thus keywords can expand as much as they need to without increasing the need for memorisation.

I presume you mean the tokens that I have been discussing. There are other codes, but they are largely just syntax expansion of MMD to suit me. For example, I use the date and time as unique ID. Each entry in the system has its own date and time and thus a unique way to link to it. Rather than provide the precise URI to the file in MMD linking syntax, I have a shortcut that just requires the unique ID, <|unique_id|>. My MultiMarkdown parser has been modified to interscept these codes and expand them out to full MMD syntax in combination with a file search routine to pinpoint the precise URI for me. Thus when I render the file to XHTML or whatever, it gets a link to the actual file on the system. Cross-referencing is mindless and flexible. If I change where I archive everything on the filesystem, I just adjust a variable in the script that adjust the URIs. The base documents themselves are therefore ignorant of file-system specific information. This is a big program with many of these applications in that their linking ability is absolute URI based. Move a file, and they get confused. Move everything, and your cross-references are useless.

As for the tokens themselves: it is largely irrelevant how many of them exist in total since the highly specific portion is plain-English. I don’t need to memorise the difference between {I3.1.Doc} and {I3.1.Invoice} it is not only obvious which is which, but it can also be reasonably guessed what the {I3.1 part means just by looking at those two and and the (I)information lead. Thus the memory intensive parts are the two numbers in the middle. I try to keep super-categories to at least two and no more the four (though I have yet to exceed three). This took a good deal of premeditation in order to make sure the skeleton would not be inadequate as data was added to it. There are a total of nine super-categories I need to remember. Information has three branches and the rest have two. There are around 30 minor-categories; so an average of three per super-category. This might sound like a lot to memorise, but since they are hierarchal and logical it really isn’t that bad. I don’t have to memorise all thirty, but rather only the parts of each super-category individually; hierarchal memory is much more suited to the mind than linear list memory. I try to keep patterns repeating, too. So R2.1 and R1.1 are very similar except in direction Both are minor-category “Observation”, but the latter is External observation where the former is Internal observation. Likewise I2.1 is Indirect-Citation. Different concept, but similar to the notion of internal-observation (at least to me).

Yes, but there has been extremely little expansion in super/minor categories; thanks to the amount of thought I put into it initially. Nearly all of the expansion is done with the suffix keywords, which was the original intention of the design. I’m not sure how many of those there are, but probably 120 or so, which means there might be 120 or 130 total tokens. But as said, that is largely irrelevant since the hard part is the 9 and 30 amounts.

I cannot use DTP for two reasons. One I don’t need it. My taxonomic system is good enough that I don’t need AI to tell me what is related to what and whenever I’ve played with it, it has nearly always been wrong. If it is wrong on the stuff that I know about, why should I trust it with stuff I don’t know about. Second, the new version has no concept of punctuation or case-sensitive searching. Which, naturally, my token system requires strict adherence to.

On the topic of file-system dependence: in fact, for a period of time last year, I did precisely what you are doing, and documented it rather thoroughly here in a few posts; I don’t remember where. I condensed my token expression down a bit so that it could be included in file names.

09106911-M-Scrivener-on_archival.md

Would have been this conversation. I have the date and time (that’s the unique ID by the way) in the numeral (using my weird datestamp format), ‘M’ for communication and then the final specifier. It doesn’t have the whole token, but in my experience this was enough to find things as the title helps a lot. It looks kind of messy all by itself, but in a long list in Finder, the hyphens make everything stand out nicely. I’d also use Hazel to automatically label everything with -M- or whatever to a certain colour. I could use grep to narrow it down further if necessary. Knowing the date is an astonishingly useful bit of meta-data when it comes to personal archival. I cannot recommend date emphasis enough. For multi-person archival it is much less useful, but if you know when you filed something, even only roughly, that can right there eliminate 99% of the chaff when looking for something.

The reason I wasn’t using Boswell is that I had a bit of a hiccup with it that resulted in a corrupted archive. I back-up religiously, so I didn’t actually lose anything, but it was disconcerting enough to abandon it. I tried a number of other softwares; Journler (probably my favourite, but development future became uncertain so I moved out), EagleFiler (which I quite liked because it was Finder with a nice tagging system on top), and even just Finder as you are doing (actually Path Finder, but yes). I had grown so attached to the Boswell way of doing things though that nothing could measure up. There are just some things that program does that nobody else gets close to. So in the end I went back to Boswell after more carefully researching what happened; learned how to avoid the problem and I’m a happy Amber.

The problem with “everything buckets” is one of the reasons I do like Boswell so much because it focusses on only doing one thing: organising information (not files). It is dirt simple once you get over the philosophical hurdles, and allows for a more transparent and fluid assignment of information than folders and files can (for me). Here is a practical example: Every day all of my journal entries (individual documents) land in a collector designed to look for them (this would be analogous to saving all of your diary entries into one folder, but I’m already ahead because the notebook filters are scattering these entries all over the system according to keywords). At the end of the day I go through that and pull out the ones I think are important to the 10-day period. I do this by selecting them, then selecting the 10-day notebook, and dragging the entries roughly over to the notebook list. In Boswell drag and drop is “specify then drag”, not drag to specify. You select the target, then drag to a vague area. Seems like a small thing, but if I want to drag a single entry into forty notebooks all at once, I can do that with one drag. Try making forty aliases in Finder with a single drag. It also allows inverse drags. Dragging ten notebooks onto a selected range of entries assigns those entries to those notebooks. You can combine two complex and arbitrary lists in a single move. Nothing else exists like that out there, nothing. It is just one of the philosophical hurdles you have to get over though, because we are so used to point-to-point assignment; it is all we generally have. The notion of 8 point to 10 point co-assignment is a bit weird, but absolute fluid once you get it. This is the way the brain works.

Every ten days I go through that notebook and put copies into a 40-day bin that I feel are important to a 40-day perspective. Again this is like dropping five aliases into a folder, the original entries are still in the main collector and the 10-day. Then once every 200-days I pull entries that are important to that cycle. I now have all diary entries in a huge notebook collecting all R1.1’s, 20 10-day bins, and 5 40-day bins. If I want to go back to the first part of 2006, I can read the entries in the 200-day bin back then; if something interests me I can drill down into the smaller level bins to get more context. Since all of these entries are “clones” scattered everywhere, they accomplish something that is very difficult to do with a file system and folders: It allows you to approach your data from multiple perspectives and intentions. The system described above is largely chronological, but what if I don’t remember the date—that’s fine I can probably find it in a topical notebook. I tried to duplicate this in folder based applications (including Finder) but it gets messy, really fast, if it was even possible at all. DTP has replicants, but it just is not as elegant. It’s five clicks and a bunch of sub-menus for one alias at a time, for what two clicks and a drag does in Boswell for any amount of aliases, where every single entry in the entire database is no more than three clicks away.

Another thing that Boswell has that so few else on the market can do: Functional searching. That means that I can not only retrieve information in a Spotlight fashion, but act on that information. This would be like creating AppleScripts, but infinitely easier. A simple example would be if I wished to create a new notebook to hold everything regarding Scrivener. As mentioned in the prior post, I would construct a search pattern for EndsWith .Scrivener}, and tell Boswell to move all matching entries into the Scrivener_NB and a few other notebooks as well. Done. It’s like creating a thousand aliases from a Spotlight search result. The beautiful thing is that now I can use that Scrivener notebook as a search source. I can rapidly split out five other notebooks pertaining to specific aspects of Scrivener, by using that notebook as the data cluster instead of the entire archive. This speeds up searches and negates the need for complex cascading search terms. In Spotlight, I’d have to re-define the EndsWith and then tack on any specifics. With Boswell, I know the Scrivener notebook has everything I need already, so just search within it. Since notebooks are like folders in that you can manually add and subtract information from them, they are even more powerful than additional search criteria. Searches become more useful as you prune the data clusters. This isn’t something terribly novel to Boswell, plenty of programs have container style search restraints, but many are artificially limited. Since Boswell is set up to search this way by default, it is very easy to describe search constraints with a dozen notebooks in a single click and drag. DTP has a search constraint but can only work on one container at a time. Spotlight’s requires setting up rule after rule.

As you can divine, Notebooks are really strange constructs in Boswell. They are definitely one of the philosophical hurdles that one has to approach if they are going to understand the software. They are simultaneously search results, manually assembled collections of data, documents (drag a notebook to the desktop and poof, it is a single file in the same vein as a compile out of Scrivener), they are filters, if their name exists in the text of the document being archived, they automatically file the document within them (example, a notebook named Scrivener will automatically acquire every single document I archive with the word ‘Scrivener’ in it), and search terms when used a source.

And of file system’s benefits? I have those too. Every ten-days I dump everything I’ve archived into a Boswell file, then run a script I wrote which builds a directory structure and puts the entries into them. It’s a mirror I never touch; a back-up; something for everything to link to universally (using those unique IDs I mentioned); something for Spotlight to search against. When Boswell fails to help me find something (rare), it’s there. Back when I was using the file system, I had a pretty decent system. I wrote a bevy of scripts in Ruby to make things easier for myself—but in the end it was just too much work. Making a new entry meant opening TextMate, then saving and locating the right folder to save in. In Boswell, its Cmd-E; start typing. Done. File it, or let it sit around for a bit and stew. I don’t have the Untitled File problem mentioned in the follow-up. Untitled stuff just stays like that until I change it or want to think about filing it.

Anyway, if you like the strictly file system approach, I recommend learning grep if you haven’t already. SpotLight is fast, but it can also be frustratingly vague. Grep is slow, but extremely accurate. Also, Hazel which I mentioned above. Hazel is one of the best Mac applications out there. It is the Scrivener of file automation and can make a folder of files almost application-like.

I’d caution using DTP as your email archive. It only has an extremely rudimentary understanding of the email header, and I’m not sure if it can even output an mbox file; or if it retains the full original email copy.

Update: Forgot to mention the one thing that always made me nervous whenever I wasn’t in Boswell: data security (which might seem ironic since it fubared my database once, but bear with me). Many of these applications support a form of locking which is in my opinion a very soft lock. The data cannot be edited, in some rare cases the meta-data remains locked, but in all cases you can still accidentally or even intentionally delete the file. Boswell just doesn’t let you. The only way to edit old stuff is to version it to a new copy. There is simply no way to delete anything, period. At first it seems limiting, but once you get used to it, it makes everything else—especially raw file systems—feel absolutely dangerous.