Scrivener as Dedicated Research Database

refusion · April 18, 2009, 12:42am

Okay, I think my eyes just glazed over…

Jaysen · April 18, 2009, 12:54am

If one of AmberV’s previous posts had not done -->
snort would need the mop now.

AmberV · April 18, 2009, 7:46pm

Douger,

There are several large posts that I placed here regarding the usage of the file system as an archival engine a while back. All based on utterly portable principles—no Spotlight hacks. I don’t trust myself to still be using Apple in 20 years, and I trust Apple still be using Spotlight even less (and I’ve had situations were meta-data got lost when getting backed up onto non-Apple equipment. Not cool.)

Your original triad is actually very close to my first crack at it as well. This is, I should say, prior to my eight-wide top level system which was a monster to keep track of. I decided to go for a strict four after reading an insightful blog on physical index card filing and the Noguchi filing method. In fact that first super-category dichotomy in Information is the distinction between citations and references, and the actual works themselves. I call it Direct and Indirect to make it even more flexible. i2 is a citation or link, i1 would be the article itself mirrored; i3 is a special category (fact) for what are essentially footnotes in the archival system. Supporting documents to provide context for the things I observe. I split out Thoughts from Creativity because I am, if nothing else, a manic journalist. I probably produce on order of 35,000 to 60,000 words every forty days; sheerly just internal thoughts and reflections on the world. So that is why I have an entire top-level branch dedicated specifically toward Recording the present. Otherwise it would muck up the creative sections, though as can be expected there is a lot of cross-linkage between those two branches.

I prefer to express these cross-linkages in patterns rather than explicit double-token or binding-token usage. It’s lower maintenance. With an explicit system I have to anticipate the patterns I will be interested in within four years. I submit that is too difficult, and potentially impossible. I have to look at two articles in the archive and say, these should be connected, and affix a binding string between them. If however, I use pattern based implied linkages, I need never worry about it. One of them, as already pointed out, is the similarity in super/minor numbering. I can know correlations by numeric similarities at the simplest level, and have embedded mathematical relationships between the numbers to provide secondary and tertiary meanings. This allows the archive to literally blossom outward like a crystallising pool of water, governed by internal physical models. Extremely low maintenance and nearly entirely retroactively manifesting. I place a lot of priority on “mindless” archival like that, because in my experience—with the volume of information I archive—if it requires more than five minutes of meta-data thought it doesn’t get done.

One thing I really like about this system is that it makes multi-year thread highlight simple. If I have a thought one day and put it down, the moment I do I can see it snap into place with similar thoughts (Boswell show’s automated filing record) that I’ve had over the course of the decade since I started recording. Periodically, if the thread is striking enough, I’ll write a summary paper collecting all of their IDs together into a list and summarising the way in which it has polymorphed over the years; strategies for monitoring its further development. Given enough time, even these summarisers become reminded hits. It’s really far too big for something like Tinderbox anymore (I once had just a fraction of my archive in it, but it was too much for the linear XML format), but in a way the token system allows a form of Tinderbox philosophy in textual terms. I’ve written a bit on the topic of re-archival and retro-linking as ways of passively enriching a data set. Most people prefer in-place editing of material to enhance a data pool. Boswell forces a read-only attitude, and from that I developed the notion of enhancing data pools by reiteration and multi-faceted expansion. The core data pool remains inviolate, but annotated outward.

I like your idea of affixing a goal signifier. But with the read-only mentality I’m worried it would dilute itself into usefulness. I could possibly use the single mutable meta-data that Boswell offers, the “tag” (not at all like keywords in most apps. It’s rather more like status).

I mainly used Journler’s labels for my top-four. This made chronological list scanning extremely easy. I liked it so much that I’ve adopted the system in the Finder as well. I have Hazel set up to look for [font=Courier]{M[/font] et cetera and assign the colour label appropriately. I found the tagging system kind of redundant, personally, but the category I used.

Hmm. I’ve said this before and I’ll say it again. I kind of recommend it. I’ll put it this way:

On the positive side: I have not found a single application on any platform that more aptly approaching the problems of personal archival. Boswell just gets it. In doing so it does some things that are profoundly weird. It can take quite a lot of fiddling and making mistakes to “get” the application. I know I barraged poor Will Volnak with page after page of feature requests and modifications trying to get Boswell to work like everything else does, in short. But he was gentle about it and said, no, you just don’t get it yet, keep trying. Well, I paraphrase, but that was the gist. After about a quarter of a year of steady usage I did finally get it. It’s strange not being able to delete things, profoundly. You make a mistake, you want to get rid of it. But Boswell rightly asks, why? Drives are big, text is small. Why not just ignore it forever (there is a mechanism for doing so; it’s a bit like telling Spotlight to ignore a file but potentially leaving it out of the Trash. It is there, but you’ll never come across it when searching. Or you can remove it from everything and it sits in the trash, ignored) and if for some reason what you think was a mistake actually wasn’t a mistake, four years from now, you’ll have it. On the notion of filing—it combines the strengths of every system I’ve seen. It recognises the weaknesses of dynamic smart folders and static folders and combines the two into a single concept. A folder that gets stuff added to it by filters automatically, but you can prune or add things yourself. Adding only happens once when you archive. Boswell never dynamically touches a single thing in your database after that point. It’s a simple concept, and the only thing I know that does anything like it is Gmail’s Label and Filter system. What Gmail lacks though is a functional search as I described. Boswell’s Manager is the archivalist’s (my spell checker suggests this should be Archivolts, to which I must emphatically poetically agree) dream. Perform complex multi-point searches across manually assembled data sets and perform functions on them, adding richness and increasing future data importance.

I’ve said it elsewhere, what you get with Boswell is not so much an application with features, but an application that is built in a concise and efficient manner around a core concept. Everything about it supports that core concept and there is nothing superfluous about it. In a way, it is quite spare; but the stuff that is there is a nearly perfect symphony. There are only a few very minor things I would tweak about it if it were my code. We are talking little quibbles that most people would never even notice.

In the grey area: a lot of people don’t work in text archival. I do because I am a strict MultiMarkdown user. I never archival results but source documents. I say never, sure every once in a while I’ll link up a PDF using a text document like a library card file in Boswell. Again, easy to do with MMD since it supports file-system level links, and with my ID based system it doesn’t matter where the physical file is to the archived document. So text only is a bit of a limiting factor—and I do mean text only. It has some simple OS 9 era formatting in it, but it doesn’t get out. All it exports is raw plain-text. That is too much for many people are bread on the rich ambrosia of RTF these days. For people like me though: It’s a feature. You mentioned elsewhere most of your work is in PDF and RTF. You’ll probably find Boswell way more work than it is worth.

Then there are the negatives. There are some pretty substantial negatives. As AndraesE pointed out, the current version is quite old—built to run on OS 9 as a matter of fact, and this has more implication than simply future-proofness. Actually there have been a steady stream of beta releases past the 2005 date you saw—but by an large it is an old program running in an ancient toolkit through Carbon, and on a modern computer, through Rosetta. There are translation problems going through that many filters. The program is as solid as a rock on OS 9, and pretty solid on PPC Leopard, but toss the Intel equation into the mix and it can be catastrophically unstable at times. As in, your whole computer just shuts down due to deep bugs in Apple’s Rosetta engine and this toolkit. I honestly don’t use it as much on my Intel laptop, but heavily use it on my two PPC computers. No real problems there beyond a few quirks with copy and paste.

Will it run in Snow Leopard? I don’t know. It is quirky but functional in PPC Leopard, so PPC users are set—Snow Leopard is the end of the road for us anyway on that score. It’s already marginally unstable on Intel, and unless for some reason Apple decides to fix their broken Rosetta code for people running OldStuff, I doubt Snow Leopard will change anything. Apple doesn’t fix old bugs, if you haven’t noticed. They have an attitude about released stuff being obsolete stuff, in general. Just look at their weak text engine that we all have to live with. Minor bugs getting slowly fixed over the course of entire operating system releases.

So all of that said, the ball is in Copernican’s court. I can say that they are not dead, Boswell hasn’t been abandoned, and it will not always be something written in an ancient toolkit for people running a 12 year old operating system. Before I realised this, I was actually well on my way to writing my own clone from scratch. I’d rather have learned an entire programming language than use some other solution.

Something to keep in mind, in balance with what I’ve said above on the pro side: For the most part this program was done. The core philosophy was addressed. Sure little things here and there, bug fixes, but the slow release cycle is really a testament to the fact that it was a cohesive application that had reached its zenith.

So do I recommend Boswell? I recommend you play with it; if only to see the possibilities and perhaps garner some inspiration for your own system. The demo is kind of limited with only 15 entries, which is hardly good enough to get an idea of how it really plays. Where DEVONthink requires 10,000 files to do anything beyond nominally mimicking the Finder, Boswell requires the user to process maybe a 500 entries before the philosophy all clicks together—at least for me. Perhaps my explorations and explanations of them would give anyone else reading this a head start. But it is quirky. It is old and cranky. It probably will crash on you. It looks like a fossil. But these are all sacrifices I’m willing to make. Every other system out there feels philosophically fossilised. Working on a metaphor that was never meant for personal archival, and at times only dimly grasping some of the basic concepts that Boswell expresses with near symphonic purity.

That is definitely how it has worked out for me. I should say, the combination of having four possibilities in broad strokes combined with the date stamp is a knife that cuts through the haze of all time. When I was using a file system based system only, I cut everything up into 90s with some change at the end of the year. So 09090, 09180… This meant each folder represented a broad stroke of a year, rather than the smaller months. Five years from now, I’m not likely to remember whether I did something March or April of this year, but I’m most likely remember it was somewhere in early Spring. That narrows it down to the tail end of the 090 folder and the front part of the 180 folder.

Plus, I put IDs on everything, not just files. I get a new unique ID every 86.4 seconds using my system. That is usually time enough and some documents are scattered with multiple ID signifier codes so that specific notions can be tracked and referenced across time. I don’t care if I never use them. I’d rather assign the number up front and never use it than go back and wish I had an older number (but if I don’t I just use the current time; no big deal. I’ve got a lifetime supply of them).

valente.mac · April 19, 2009, 12:07am

Thank you for pointing this out, Amber. I researched online on this Noguchi Filing System and came across a post by William Lise that is very helpful, especially considering that none of the books by Noguchi was translated into any language I know. Apparently Lise had taken his post out of the web (there are lots of dead ends), but he reposted it again early this year.

Link: http://www.lise.jp/colleagues/noguchi.html

Anyway — this is a very interesting concept: organizing your files as your brain (i.e. chronological order) and not the reverse (tagging, filing in folders by category, etc.). I think I’m going to try it and see if the work flows.

AmberV · April 19, 2009, 1:02am

Exactly. This is where I got the idea to use Finder’s colour labels to shorten scan time. The principle modification made to the index card derivative is that resources are not re-ordered according to access time, but rather marked as accessed again to decrease scan time. The original chronology is thus not disturbed. Some these concepts can be relayed to the digital realm with great results.

Updated: To expand on this a bit, since I wrote it on an iPod originally. The index card filing method[1] which is based on Noguchi Filing has three key differences. The first is that it is designed for thought filing as opposed to arbitrary paper filing. You have an idea of some kind, you write it down on an index card and then use a marker to accomplish what the coloured tape does on a manilla envelope. The position of the mark indicates one of the four super-categories. Each cart is titled and dated at the top, and the rest is free-form. They are placed into a card file in strict chronological order of generation. When you pull a card later on, unlike the Noguchi method, you do not refile the card at the front of the system, but replace it from whence it came, thus keeping the original chronology intact. Instead, using the marker you create a small spot on the right-hand side of the card that indicates it had been pulled. If the card gets pulled again you put another dot—the designer of the system recommends up to four dots—any more is redundant as the card is obviously frequently used.

When all of the cards are placed into a stack or card-drawer, the markings along the top are visible. It is possible to quickly scan down a huge stack of cards and find something by date and super-category based on the staggered positioning of the left-side indicator. It is also possible to, at a glance, see which cards are frequently accessed based on right-side dots.

I tried the analogue system for a while, but did not care for what I call Index Card Syndrome. That is, sub-consciously tailoring and abbreviating your thoughts so that they may fit onto a single card. It also felt kind of wasteful, as I’d go through on average about a fifty cards a week. Fortunately, the principles of this system are all fairly easy to relay to the digital file system; as elaborated on in the posts the Douger was kind enough to dig up, below.

[1]: Which can be found as blogged, here

refusion · April 19, 2009, 10:00am

So I just thought I’d wade into the middle of this thick conversation to let everyone know what I think of DEVONthink and DEVONagent after waffling around with it for a few days.

I really love agent since I no longer have to manually monkey around with online search engines. Agent looks very promising with it user-configurable search algorithms and archive capabilities.

DEVONthink has really freed me up to just focus on research with as little distraction as possible by allowing me to have my RSS feeds from Times, email archives, and exported Scrivener notes all in the same app. I particularly like being able to do writing in DEVONthink as doing so narrows down my working environment. My WIP in Scrivener was starting to get a little distracting actually having all my stuff right there for me look at all at once (or the temptation to do so anyway).

For me, research and writing are two totally different animals (like mixing and mastering are) and I like the fact that I can use good software to separate them into different steps of the larger process.

douger · April 19, 2009, 12:59pm

For anyone else who trips over this great thread, here is a link to one of the expansive 2007 posts. It contains definitions of the codes, usage, concepts, etc …

http://www.literatureandlatte.com/forum/viewtopic.php?f=15&t=2092&st=0&sk=t&sd=a&hilit=file+system+archival

kastorff · January 20, 2010, 9:22am

I sift through so much swiftly decaying information each day it’s wonderful to find a thread like this.

brett · January 20, 2010, 11:39pm

Well, since Amber revived the old thread, I should update what I wrote there. I no longer use SCrivener as a database. Like a couple others hereabouts, I started experiencing some serious slowdowns and even a crash or two as my Scrivener database grew. I’m not sure how many words were in there, but it was almost a year’s worth of press releases, notes, and articles. Keith said that a file that size shouldn’t choke Scrivener, yet it was happening, for whatever reason. I also started wanting to get access to my files remotely (via Dropbox) from my new iPhone. Plus, I wanted to be able to search by keyword and get a specific file, not a result that just put me in a huge Scrivener file that I’d then have to search again. And Keith has often said that Scrivener was never intended to be used as a database.

All those factors, and my natural inclination to simplify, impelled me to start keeping my research in the form of rtf files, read in Bean or TextEdit. Then when it comes time to draft an article, I import the relevant rtfs into Scrivener and proceed in the usual way. So instead of one database that holds a year’s worth of articles for each publication, I have a separate SCrivener file for each article, even the short ones.

So now my database application is the Finder. Everything works much faster this way. And when my MacBook was stolen last month – only a couple weeks after I exported all my Scriv research to rtf files – I was able to retrieve my files from my Dropbox account and read them on my iPhone, which came in very handy when I had to write a few short pieces on deadline in the several days before I was able to replace my Mac. I just feel more secure by storing my research in a form that’s easily readable with any number of applications.

I’m doing the same thing with my book in progress: collecting all research info in rtf or text files. When it comes time to start drafting chapters, we’ll see how SCrivener handles the massive load. I’d love to be able to put the whole book in a single Scrivener project, but if it struggles with that volume of data again, I’ll make a separate project for each chapter. I suppose if I need better search capability than a standard Spotlight or EasyFind search, I can use Spotlight comments or a free app like Tagit.

As I said in the earlier thread, I was a happy user of DevonNote before Scrivener arrived, but now I’m not sure why I need any database or info management apps beyond the Finder/TextEdit combo (for my general research archive) and Scrivener (for each individual article/book/project). I’d love to hear what other users of these info drawer apps think about this strategy, and what they offer that it doesn’t.

AsafKeller · January 21, 2010, 3:45am

I have a very similar workflow (notes in individual txt or rtf files). I store them all in Eaglefiler. This provides more sophisticated search functions, and, most importantly, a very convenient way to view the search results, with the search terms highlighted. Unlike in a Finder based system, I do not have to open each file that Spotlight would have found and re-search within it . Added benefits: robust data integrity checks and flexible OpenMeta tagging support. Finally, the files are stored in Finder folders, so i am not commiting them to a propriotery database.

kewms · January 21, 2010, 5:01pm

This comes up every so often, so you might do a search for some of the other threads on the subject.

I use DevonThink Pro because the Finder is inadequate for the volume of data I’m trying to manage. (My main DTP database exceeds 3.5 million words.) DTP’s search and classification functions are much more sophisticated.

I don’t use Scrivener because many items are used again and again in multiple projects. Having to keep everything in one .scriv document would defeat the purpose of Scrivener, but otherwise I’d always be trying to figure out where this or that item was stored. With everything in DTP, I don’t have that problem.

Katherine

brett · January 22, 2010, 11:28pm

Thanks, Katherine and AsafKeller. this is very helpful. I definitely plan to continue storing all research in the Finder as rtf or text files. It sounds as though the primary advantage of an info organizer such as Devon or EagleFiler or Yojimbo is that it makes searching easier when you have a lot of material. I don’t generally use research info for more than one or two projects, so I’m happy to keep using Finder folders for it. So I think for my article writing, I’ll keep using the Finder/Scrivener combo alone.

However, your posts show how some intermediate organizing app (between the Finder and SCrivener stages) might be useful on my book project. I do have my old download code for DevonNote, though I have no idea whether my circa 2006 version will work in Snow Leopard. But I think I prefer EagleFiler’s method of storing files in the Finder rather than in a database, so if I do wind up buying something like this for my book, it’ll probably be that one.

Nevertheless, I think I’ll first try using Finder + Scrivener for my book. Certainly Scrivener’s search capabilities will be plenty sufficient for my needs, as long as it doesn’t choke on the volume of files again. I’m still compiling research at this point, so maybe by the time I get down to organizing and writing, Scrivener 2 will be ready. Or maybe the current version will work without the slowdowns I encountered when I used it to hold lots of info for multiple articles last year.

alexwein · February 12, 2010, 12:31am

Nice to see this topic never goes away. I took everything out of DT Pro and created folders in the Finder, but now that DT lets you easily locate files, I have returned to the fold. The ability to view and annotate pdfs, and other features like tagging, etc., make DT Pro the better choice for me. It’s also easier for me to grab things from the web and put it where it needs to go with DT.

I have tried EVERYTHING out there I could find over the years, and I always come back to DT. I store it in DT, I can find it easily, I can then find it in the Finder with a menu option so I can link it to my Scr. project. I tried using Scr. for this but it just is not as powerful for data warehousing. They’ve improved DT and I have to say the changes have made a huge difference.

I like to keep it simple, so I just stick to DT Pro (Office, actually) and Scr. for most projects. Sorry if this is what other people said. I didn’t read all the posts.

Alexandria

Lettermuck · October 22, 2010, 7:10pm

Hello Amber.

I know I am going back a ways - to April last year, but I was trawling through the old postings and noticed this from you. Did you ever produce an MMD guide, and if so could you kindly point me to it.

Also, you may be able to answer a question on LyX. I wondered how you get your MMD files into LyX and whether you can then get LyX to produce a PDF that ignores your meta data (token), as can be done with the more complex LaTeX? Thank you.

AmberV · October 22, 2010, 8:05pm

Unfortunately no, I never did finish that. It is about 3/4 done and has been since about that time I wrote that. At the time I was working full time for another company and just never had the time to finish things off. Now that it is part of my job to do stuff like this, it should eventually get tended to, but I can’t make any promises in the near term, as there is quite a lot of craziness scheduled in the upcoming months. Definitely still something I want to finish off though, and hasn’t been forgotten.

I don’t ever do that, and I can’t think of any good reason for doing so. Why not just produce a LaTeX file and then import that into LyX?

Lettermuck · October 22, 2010, 10:49pm

Thanks Amber. No problem about the MMD guide, I was just interested to know if it was out there somewhere. You certainly have more important things on the horizon. It is an exciting week

With regard to LyX, this stems back to original attempts with LaTeX. I originally tested the bundle within TextMate to LaTeX-pdf, but could not get it to work. It could not find htmldoc. I downloaded htmldoc, but still had trouble (perhaps I did not have it in the right directory). I then came across Fletcher’s MMD bundle, which works fine - but has the issue of needing to edit in LaTeX (which is more complex than I hoped for). You kindly referred me to LyX. Combining all through LyX is much more attractive and seems something I could work with.

This may be beyond my technical background, so please bear with me. I know that you work with your token so that it is searchable in your plain text files, in and beyond Boswell. However, you have the token in the MMD header, so that it does not render when you produce a formatted document. By default it is ignored. Is that correct so far? Presumably ignoring the header is part of the LaTeX process. When I put an MMD document directly into LyX, it shows the header, so I guess this just means that I need to manually delete the MMD header when producing a formatted version of my file?

AmberV · October 22, 2010, 11:30pm

What I do with LyX is use the File/Import menu to import a “Plain LaTeX” file, straight from the .tex that Scrivener creates when compiling. No need to mess with LaTeX at all if you don’t want. Definitely agree, it is an extremely complicated tool the deeper you go into it—and even on the surface it is quite a bit to absorb, but once you have the .tex file imported into LyX you should be in a much friendlier environment, and it is really easy to create PDFs with a single button click.

Not entirely. Anything you put into the MMD meta-data area will get saved into the XHTML file that is then used to produce other versions. For a quick example, try compiling a Scrivener document using MMD->RTF, but before you do so, use File/MultiMarkdown Settings... to add a Keywords field. Try putting your token in there and then compiling.

Now open that RTF file in TextEdit (it will be black, yes, known bug—just change the background colour). Open properties for the document with Opt-Cmd-P. The keywords you type into Scrivener ought to be in the Keywords meta-data for the RTF file. Those other fields, the ones that match up with MMD’s meta-data, will get populated too. This means Spotlight searches work on these RTFs.

Now in the XHTML itself, the token would get inserted in the keywords meta element, which is a standard web info tool. You can view keywords and description in many browsers, and search engines will use these when spidering for content. I think Spotlight catches those, too.

LaTeX is a little different because it doesn’t have a dedicated keywords field—but it does get stored into a variable. This variable can later be used within the document, by calling \mykeywords. So you could in fact make the token visible in that fashion, by using Insert/TeX Code in LyX, and typing in “\mykeywords”.

Yeah, but again I don’t really get why you’d want to do this. If you compile to LaTeX from Scrivener (or TextMate with Cmd-Opt-L), and then import that .tex file into LyX, you get a very nicely fully formatted document. If you just import the MMD file as plain text (or copy and paste it), all of those asterisks and headers and such will just be plain-text. You’d need to convert them all by hand and remove the MD syntax.

Now do note that an imported .tex from from MMD will look a little sloppy “at the top” in LyX. It uses some codes that LyX doesn’t “understand” and LyX imports comments. You can ignore all of this “red text” though, it will disappear when you create a PDF and is in fact important to the creation of that PDF.

Lettermuck · October 22, 2010, 11:40pm

Okay - think I am getting with it now. I will work on that Amber. Many thanks for your continued support and patience.

Ticino · February 1, 2011, 11:24am

Ioa,

Did you ever get around to posting this guide somewhere? I’m very new to (Multi-)Markdown and could do with this kind of primer.

I’d appreciate being added to any “alert” list you may have.

Thanks.