Scrivener as Dedicated Research Database

refusion · April 14, 2009, 4:01am

Let’s say you write four books and have research for those books in each individual Scrivener file. You might want to access research data for those books later. To do so, you would have locate the file and open it up.

So, why not create a Scrivener file which is dedicated to storing only your research? Scrivener’s flexibility allows you to set up a template anyway you like. When you’re done with whatever it is you’re writing, simply drag and drop the research folders over into your dedicated Scrivener research file. It’ll be really easy for you later if you need to go back and find something again for another project or if your WIP gets deleted or corrupted. Research can be almost as expensive as the actual writing.

howarth · April 14, 2009, 4:39am

You’d get the same effect, with far more powerful software, by creating research data folders in DevonThink Pro. As Keith recently stated, Scrivener is not optimized to be a research database.

AmberV · April 14, 2009, 4:37pm

While true to an extent, it does happen to be one of the stated goals of Scrivener to allow a writer to assemble their research and their manuscript all in one place. Not everyone needs a huge database software whose primary unique features only start working after 10,000 entry additions. I would say that the best course of action for any writer using Scrivener to take is to simply user Scrivener and its research folder. If it starts to get unwieldy then it is isn’t difficult to upgrade to EagleFiler or DTP or Boswell or whatever suits them.

The nice thing about sticking with Scrivener as long as you can, and refusion’s method, is that you have full advantage of Scrivener’s meta-data, split screens, and other features. Dragging items to a research project retains a large majority of the hard work that goes into filing and organising things. Dragging stuff between Scrivener and other software packages resets the meta-data to zero every time.

bashosfrog · April 15, 2009, 12:11am

In my view, one failing of Scrivener as a research database is that it doesn’t retain the source URL with web clips or archives. It’s like having a whole lot of photocopies and no idea what books they came from. (Yes, I could do it manually, but I’ve now got the expectation that it will be done for me).
Scrivener’s ability to handle email messages is also limited, which I think contradicts the “binder” philosophy - my real manilla folders are stuffed with correspondence, but I can’t drag a Mail message to the relevant folder in Scrivener’s binder.

Devonthink handles these things very deftly, which is why my 134 Scrivener projects (according to a quick check with Spotlight) all have empty Research folders.

I’d take another look at Scrivener for research, but Keith would have to consider that these features are worth the coding overhead they would create for him. I don’t see a chorus of complaint, so they probably aren’t for most users.

Besides, with DT databases containing six million words and more, my heavy-duty journalist’s research needs are probably better met with Devonthink anyway.

AmberV · April 15, 2009, 12:42am

Yeah, for me it’s really not an issue even though I do a lot of web research, and really for two reasons. 1) If I really need the original URL, I import it normally and it gets one. Just drag the URL icon into the Binder and that’s it. 2) I always, always, always make my own backup of the page by using SiteSucker or wget. The web is transitory and things come and go. I have web research from eight, nine years ago that I can still go back and find with ease because I stored the “hard copy”, and if I go to the original URL it is no longer there. The site is gone, or the site changed the way they do URLs, whatever. For stuff that only need for three months or even six months, it is probably okay to rely upon the Internet to sustain your research, but otherwise there is no reason to expect it to do so.

Beyond losing the web source, the data itself is kind of fickle. You may or may not get to keep the magically imported URL 15 years from now when DEVONthink hypothetically no longer exists and you need to upgrade to the next big thing. Losing meta-data like this has caused me to, over the years, move everything into the data area and keep that annotated likewise. Then even a laserjet print-out has the full copy and citation.

Incidentally I keep my long-term archives in Boswell, not Scrivener, which is even less razzle dazzle in those terms. An entry is a text file with three meta-data fields. Period. Simple, but it works perfectly for me since I hard-copy everything and cross link it if necessary. But that is for stuff that is history; no longer being worked on. I archive the Scrivener project too, and cross-link that in Boswell as well.

My motto has long been, in the terms of long-term archival, the less razzle dazzle the better.

AsafKeller · April 15, 2009, 3:28am

Amber: What does “move everything into the data area and keep that annotated likewise” mean, precisely? Do you mean within Scrivener’s Research folder, or using a different application?

refusion · April 15, 2009, 1:59pm

You guys got me thinking about DEVONthink again. I went today and downloaded it and Devon Agent. Man, I don’t how in the world I can live with those now! Bummer, more money to spend. I was able to find exactly what I was looking in a few seconds with DEVONagent so I know I will be buying that. I’m torn now because you can get both in a $99.00 bundle deal. Hmm…

AmberV · April 15, 2009, 5:23pm

What I mean is, when I’m done with something and it is ready to be archived, I don’t rely upon any database meta-data for data storage. Everything crucial to filing goes into the text area. Since I use MultiMarkdown, much of this is pretty easy to do in the header area of the document. For example:

Title: Name of the article  
Date: 09105,763  
Keywords: some,fancy,topic

Et cetera. I average about 7-12 lines per document, but only about four of them are interesting to me as a human. Most are helpers for future automation routines. I have a pretty simple but effective filing method that is very loosely based on the sort of decimal systems that libraries use, vastly trimmed down to a scope that is meaningful to me—stuff like that. For instance, this post would be filed as:

{M2.1.Scrivener}

Which I can tell at a glance means coMmunications-2(Public)-1(Forums&Wiki)-Scrivener(topical). Your response, if I chose to file it for future reference, would simply be {m2.1.Scrivener}. Letter case denotes internal or external origin. {M1.1.Scrivener}, on the other hand, would be 1(Private)-1(Email&Letter), thus most likely a bug report or something to Keith. It is compact, machine-friendly, and operates very nicely with searching mechanisms. I can search for StartsWith {R1.1. to get all Records-1(external)-1(observations), of any topicality, or on the other hand, I can search EndsWith .Scrivener} to perform a cross-axis crawl for Scrivener related documentation, correspondence, theories, and so on, across all the major branches. I’ve never needed a.i. or anything in fact that kind of stuff gets in the way because my system cuts through huge amounts with surgical ease. Add chronology to these topic tokens and you can find a precise article out of 8,000 in a few seconds.

So while I do make use of database meta-data, since it is there, I don’t trust anything to remember that. I always assume that some day I’ll have to do an emergency restoration and dump everything out as raw information. The more I can fit into the document area as annotations to that, the better.

This system has served me well. I’ve probably switched archival applications five or six times and every time I was able to fairly rapidly have a fully organised kit, even though thousands of files are involved—because each file is aware of its own placement in the greater system. It assembles itself.

howarth · April 15, 2009, 5:29pm

If you are student or faculty at a school, you may apply for an educator discount on any DT product. That might allow you to upgrade to DT Pro.

Vermonter17032 · April 16, 2009, 1:07pm

Amber (or anyone else who can answer this question),

Would you recommend a resource for learning more about MultiMarkdown. I checked the forum on this site, but all the links seem to be no longer functioning.

I guess I’m looking first for a quick overview and then, hopefully, for a primer of some kind.

Thank you.

AmberV · April 16, 2009, 3:08pm

I have a rough draft of a step-by-step guide to learning MultiMarkdown which I am about a week away from posting everywhere. It’s something I’ve been wanting to do for a while, and had a bit of spare time. Ultimately, I want to create a complete guide that goes from very basic “my first MMD hello world” type exercises to creating your own modifications to the export engine and even your own file formats. But I intend to release the basic part first and join it up with the more advanced stuff later. I’ll put you on the list of people to alert.

signinstranger · April 16, 2009, 3:34pm

I’ve added a list of links related to MultiMarkdown (some more interesting than others) to the Scrivener Wiki. There’s currently no tutorial, but you could start with the Markdown Web Dingus:

daringfireball.net/projects/markdown/dingus

This is Markdown, not MMD, but the concept is basically the same.

douger · April 16, 2009, 7:21pm

Amber,

Now I know what N and R sand for, but I’m dying to know, what other taxonomy do you use?
How many codes are there in your system?
Did you start with a few and have they expanded over time?

I started down the data management journey using Journler. Like Boswell, Journler was an excellent tool for letting me organize my data. After a while Journler and I parted ways and I converted everything through Together to DEVONThink Pro. I posted a lot about this on my blog, and don’t want to repeat it here, but I’ve dumped DTP and I’m now basing all my archiving and research management on the basic file system functionality of OS X because its data retrieval and organization capabilities seem to work much more elegantly than DTP (or the other shoebox applications), and they seem to be less prone to obsolescence.

Alex Payne wrote a great web post at al3x.net (Case Against Everything Buckets.) (and the followup) about the value of using single purpose apps (of which Scrivener is one of the best) along with the file system to achieve robust research management capabilities. When I read his work it really change my outlook on these types of applications.

What Scrivener lets me do is put just the right amount of research in with my writing project, while the mother-load of the library stays out in file system. I fear that if Scrivener tried to recreate a research management tool or duplicate the file system, it would loose some of what makes it work so well as a single purpose application.

I agree with you completely, Amber, that the only valuable unique functions in DTP kick in at huge data sets - all that AI stuff. I did find value however in using DTP (and Journler, and Together, and to a lesser extend EagleFiler…) as tools to organize my data in the first place. I have thousands of notes in rtf/d files, along with a significant library of pdfs. After looking at all the lit manager program (papers et al) I came to the same conclusion that without a text based coding system you’re just setting yourself up for a future conversion project or worse, date loss problem (I once used Commence so know about that well)

Going it alone in the file system however does require one to make your own internal text based tagging system, and I am intrigued by yours, hence my question.

Doug

brett · April 16, 2009, 10:43pm

After using DevonNote for a couple years, I came to the same conclusion when I got Scrivener. It’s just easier for me to use the file system (basically, Finder + Spotlight + TextEdit) to store my research for various projects past, present and prospective. Then use Scrivener as the respository / organizer for the projects I’m actually working on at the moment. That system is even working well so far for my book in progress.

The file system is about as future proof as we can hope for at this point, not that I expect Devon or EagleFiler or the rest to go bellyup. (And I guess you could always export everything to rtf or even text if necessary.) I have a preference for minimalism – just the tools I need, less clutter – and this system means just one less app (at least) to have running and consuming space and RAM and my time to RTFM and scale the learning curve. I fully appreciate the usefulness of Devon et al for research intensive projects. But for my needs as a working journalist and book writer, the file system + Scrivener combo seems to be as much as I need and no more.

refusion · April 17, 2009, 1:37am

The arguments for and against bucket software are compelling. However, the basic OSX file system doesn’t always cut it for me. On the one hand, I’ll rely on a basic file folder system on the hard drive for storing my music projects (seems strange as those are quite huge, I know) but then on the other hand, I’m a huge news/info. junkie and need to try to merge my email and internet data together as seamlessly as possible.

I decided that my Scrivener research folders were getting a bit cumbersome so I think DEVONthink is going to be the best solution since I’ll be using agent for searches from now on anyway. I went ahead and bought DEVONthink and agent yesterday. I’m happy I’m able get searches done so quickly and thoroughly with agent and store my ‘keepers’ in a database in think. The other thing that sold me was the fact that I can have RSS feeds and archive emails from Entourage in DEVONthink. I’m currently using Times for my RSS newsreader but if I need to archive a story that I want to keep as research, there’s no quick way of getting it out of Times and into Scrivener. And with DEVONthink, archiving and accessing email from Entourage is easy as I frequently send emails to news articles to my Gmail account (I do this at work during the day on my haggish corporate Windows setup). Now, I can simply drag/export/etc… the link over from Entourage into DEVONthink for later use in Scrivener.

I would be hesitant to use DEVONthink for databasing any of my other stuff however. I like to keep my other ‘twiddly’ bits in actual folders on OSX in Documents. I don’t like the idea of certain things being locked up in a proprietary file system I may not be able to access later. iPhoto comes to mind on that point. I like to backup my photos to DVD in standard OSX folder structure.

I have Circus Ponies notebook but I stopped using it as I realized the proprietary save files it generates could get corrupted or if you have one of those notebooks password-protected and forget what it is, you’d be in real trouble. Also, notebook for me is too nebulous in workflow as far as structuring things goes. I wind with more clutter inside the notebooks than what I had outside to begin with. Then the other problem is should I save the info. somewhere else that just saved in Notebook as a ‘just in case’? Sort of kills the whole fun of it. Notebook is a neat idea but I’ve never found any real use for it. Just another bucket I wasted money on…

DEVONthink and agent seem a bit more reliable and logical but then again, should I save all that stuff somewhere else…‘just in case’? The debate rages on.

All of this data we save means nothing really. In the end, the software we buy to GTD means nothing either as the formats will forever mutate and leaves us saddened by the loss of those tiny digital ‘nothings’ we worked so hard on and spent so much time and money trying to preserve.

With that in mind, I hope to make DEVONthink and agent worth the money spent and actually use them to get all the data collected with them committed to text in the form of this book I’m working on. If there’s a tangible and lasting end result that justifies the means, then I say it’s worth the time and money spent on bucket software.

End of mindless rant but I hope it helps others in their decision on how to best organize and ‘future-proof’ their data.

End Analysis:

Circus Ponies Notebook- Pretty but not very useful. Too ‘noodly-doodly’. Proprietary file system.
DEVONthink- Can integrate email, RSS feeds and searches from agent together in spot. Still a proprietary database system though.
When in doubt, do ‘raw’ backups of everything and frequently using OSX file structure. I use Time Machine.

pete340 · April 17, 2009, 12:10pm

With the current 2.0 beta it’s metadata is in a proprietary format, but the files that you store go directly into the Mac file system.

Vermonter17032 · April 17, 2009, 12:58pm

Amber,

This would be terrific. Thank you very much!!!

Steve

AmberV · April 17, 2009, 4:19pm

You ask some good questions. It has taken me about four years to “perfect” the system that I currently use. I don’t for one minute think it is perfect, but at its current state it is as superior as it has ever been compared to anything prior. By the way, thanks for the link to Alex Payne’s blog. What an interesting read, I enjoyed it. Odd to consider he is behind Twitter, something I consider completely frivolous and a waste of humanity on the scale of the E-Channel, but oh well. We are all entitled, as they say.

Four top-level categories: Record, Communication, Manifested, and Information. Does it cover everything in life? No, but it covers the four main areas of input and output that exist in mine as relating to a digital archive, and that is all that really matters to the system. Record is everything I take down and record. Communication is self-explanatory. Manifestation encompasses mainly creative pursuits, but can also address explanatory expositions that go beyond mere record or information. Information is mostly just documentation, either generated or collected. Things could be a bit blurry if you take them literally. Naturally all capital letter items are technically manifestations. I do not define tokens by what things are but what they represent to the future me. So if something will be useful as a reference or cross-linked to from another document for elaboration—it is Information, even if it is a Record of something, or something I manifested. I feel this way of looking at identity assignment places more emphasis on the intention of a document than the identity of the document. I don’t really care so much about what it was, but what it will be in the future when I try to find it.

Beneath that there is a strict three-level depth restriction. Nothing can be less or more. Something must have a super-category, a minor-category, and a topic or key. Example super-categories in Record are simple: Internal and External. I observe and record psychology or dreams into Internal, for example, but observe and record events into External. The minor-category allows for division of the super-category into logical parts. One of the above examples: {R2.1.Dream} is Record-2(Internal)-1(Observation)-Dream. Some branches might only have one super or minor category in them at the moment, but that is fine as it enforces a rigid structure that reduces the proliferation of hair-splitting.

The fundamentals are to reduce the top-level to four broad categories and denote them as letters instead of numbers. To only have two numbers and to keep super-types as much in dichotomy as possible to reduce complexity. Record is split into Internal and External, you cannot really address much with a hypothetical R3. Perhaps a religious person might put supernatural in there, I don’t know. And finally, the most specific category which is also the most prone to proliferation is written down as a word, not a number. So there is no “5” = this person—I just write down the person’s name. Thus keywords can expand as much as they need to without increasing the need for memorisation.

I presume you mean the tokens that I have been discussing. There are other codes, but they are largely just syntax expansion of MMD to suit me. For example, I use the date and time as unique ID. Each entry in the system has its own date and time and thus a unique way to link to it. Rather than provide the precise URI to the file in MMD linking syntax, I have a shortcut that just requires the unique ID, <|unique_id|>. My MultiMarkdown parser has been modified to interscept these codes and expand them out to full MMD syntax in combination with a file search routine to pinpoint the precise URI for me. Thus when I render the file to XHTML or whatever, it gets a link to the actual file on the system. Cross-referencing is mindless and flexible. If I change where I archive everything on the filesystem, I just adjust a variable in the script that adjust the URIs. The base documents themselves are therefore ignorant of file-system specific information. This is a big program with many of these applications in that their linking ability is absolute URI based. Move a file, and they get confused. Move everything, and your cross-references are useless.

As for the tokens themselves: it is largely irrelevant how many of them exist in total since the highly specific portion is plain-English. I don’t need to memorise the difference between {I3.1.Doc} and {I3.1.Invoice} it is not only obvious which is which, but it can also be reasonably guessed what the {I3.1 part means just by looking at those two and and the (I)information lead. Thus the memory intensive parts are the two numbers in the middle. I try to keep super-categories to at least two and no more the four (though I have yet to exceed three). This took a good deal of premeditation in order to make sure the skeleton would not be inadequate as data was added to it. There are a total of nine super-categories I need to remember. Information has three branches and the rest have two. There are around 30 minor-categories; so an average of three per super-category. This might sound like a lot to memorise, but since they are hierarchal and logical it really isn’t that bad. I don’t have to memorise all thirty, but rather only the parts of each super-category individually; hierarchal memory is much more suited to the mind than linear list memory. I try to keep patterns repeating, too. So R2.1 and R1.1 are very similar except in direction Both are minor-category “Observation”, but the latter is External observation where the former is Internal observation. Likewise I2.1 is Indirect-Citation. Different concept, but similar to the notion of internal-observation (at least to me).

Yes, but there has been extremely little expansion in super/minor categories; thanks to the amount of thought I put into it initially. Nearly all of the expansion is done with the suffix keywords, which was the original intention of the design. I’m not sure how many of those there are, but probably 120 or so, which means there might be 120 or 130 total tokens. But as said, that is largely irrelevant since the hard part is the 9 and 30 amounts.

I cannot use DTP for two reasons. One I don’t need it. My taxonomic system is good enough that I don’t need AI to tell me what is related to what and whenever I’ve played with it, it has nearly always been wrong. If it is wrong on the stuff that I know about, why should I trust it with stuff I don’t know about. Second, the new version has no concept of punctuation or case-sensitive searching. Which, naturally, my token system requires strict adherence to.

On the topic of file-system dependence: in fact, for a period of time last year, I did precisely what you are doing, and documented it rather thoroughly here in a few posts; I don’t remember where. I condensed my token expression down a bit so that it could be included in file names.

09106911-M-Scrivener-on_archival.md

Would have been this conversation. I have the date and time (that’s the unique ID by the way) in the numeral (using my weird datestamp format), ‘M’ for communication and then the final specifier. It doesn’t have the whole token, but in my experience this was enough to find things as the title helps a lot. It looks kind of messy all by itself, but in a long list in Finder, the hyphens make everything stand out nicely. I’d also use Hazel to automatically label everything with -M- or whatever to a certain colour. I could use grep to narrow it down further if necessary. Knowing the date is an astonishingly useful bit of meta-data when it comes to personal archival. I cannot recommend date emphasis enough. For multi-person archival it is much less useful, but if you know when you filed something, even only roughly, that can right there eliminate 99% of the chaff when looking for something.

The reason I wasn’t using Boswell is that I had a bit of a hiccup with it that resulted in a corrupted archive. I back-up religiously, so I didn’t actually lose anything, but it was disconcerting enough to abandon it. I tried a number of other softwares; Journler (probably my favourite, but development future became uncertain so I moved out), EagleFiler (which I quite liked because it was Finder with a nice tagging system on top), and even just Finder as you are doing (actually Path Finder, but yes). I had grown so attached to the Boswell way of doing things though that nothing could measure up. There are just some things that program does that nobody else gets close to. So in the end I went back to Boswell after more carefully researching what happened; learned how to avoid the problem and I’m a happy Amber.

The problem with “everything buckets” is one of the reasons I do like Boswell so much because it focusses on only doing one thing: organising information (not files). It is dirt simple once you get over the philosophical hurdles, and allows for a more transparent and fluid assignment of information than folders and files can (for me). Here is a practical example: Every day all of my journal entries (individual documents) land in a collector designed to look for them (this would be analogous to saving all of your diary entries into one folder, but I’m already ahead because the notebook filters are scattering these entries all over the system according to keywords). At the end of the day I go through that and pull out the ones I think are important to the 10-day period. I do this by selecting them, then selecting the 10-day notebook, and dragging the entries roughly over to the notebook list. In Boswell drag and drop is “specify then drag”, not drag to specify. You select the target, then drag to a vague area. Seems like a small thing, but if I want to drag a single entry into forty notebooks all at once, I can do that with one drag. Try making forty aliases in Finder with a single drag. It also allows inverse drags. Dragging ten notebooks onto a selected range of entries assigns those entries to those notebooks. You can combine two complex and arbitrary lists in a single move. Nothing else exists like that out there, nothing. It is just one of the philosophical hurdles you have to get over though, because we are so used to point-to-point assignment; it is all we generally have. The notion of 8 point to 10 point co-assignment is a bit weird, but absolute fluid once you get it. This is the way the brain works.

Every ten days I go through that notebook and put copies into a 40-day bin that I feel are important to a 40-day perspective. Again this is like dropping five aliases into a folder, the original entries are still in the main collector and the 10-day. Then once every 200-days I pull entries that are important to that cycle. I now have all diary entries in a huge notebook collecting all R1.1’s, 20 10-day bins, and 5 40-day bins. If I want to go back to the first part of 2006, I can read the entries in the 200-day bin back then; if something interests me I can drill down into the smaller level bins to get more context. Since all of these entries are “clones” scattered everywhere, they accomplish something that is very difficult to do with a file system and folders: It allows you to approach your data from multiple perspectives and intentions. The system described above is largely chronological, but what if I don’t remember the date—that’s fine I can probably find it in a topical notebook. I tried to duplicate this in folder based applications (including Finder) but it gets messy, really fast, if it was even possible at all. DTP has replicants, but it just is not as elegant. It’s five clicks and a bunch of sub-menus for one alias at a time, for what two clicks and a drag does in Boswell for any amount of aliases, where every single entry in the entire database is no more than three clicks away.

Another thing that Boswell has that so few else on the market can do: Functional searching. That means that I can not only retrieve information in a Spotlight fashion, but act on that information. This would be like creating AppleScripts, but infinitely easier. A simple example would be if I wished to create a new notebook to hold everything regarding Scrivener. As mentioned in the prior post, I would construct a search pattern for EndsWith .Scrivener}, and tell Boswell to move all matching entries into the Scrivener_NB and a few other notebooks as well. Done. It’s like creating a thousand aliases from a Spotlight search result. The beautiful thing is that now I can use that Scrivener notebook as a search source. I can rapidly split out five other notebooks pertaining to specific aspects of Scrivener, by using that notebook as the data cluster instead of the entire archive. This speeds up searches and negates the need for complex cascading search terms. In Spotlight, I’d have to re-define the EndsWith and then tack on any specifics. With Boswell, I know the Scrivener notebook has everything I need already, so just search within it. Since notebooks are like folders in that you can manually add and subtract information from them, they are even more powerful than additional search criteria. Searches become more useful as you prune the data clusters. This isn’t something terribly novel to Boswell, plenty of programs have container style search restraints, but many are artificially limited. Since Boswell is set up to search this way by default, it is very easy to describe search constraints with a dozen notebooks in a single click and drag. DTP has a search constraint but can only work on one container at a time. Spotlight’s requires setting up rule after rule.

As you can divine, Notebooks are really strange constructs in Boswell. They are definitely one of the philosophical hurdles that one has to approach if they are going to understand the software. They are simultaneously search results, manually assembled collections of data, documents (drag a notebook to the desktop and poof, it is a single file in the same vein as a compile out of Scrivener), they are filters, if their name exists in the text of the document being archived, they automatically file the document within them (example, a notebook named Scrivener will automatically acquire every single document I archive with the word ‘Scrivener’ in it), and search terms when used a source.

And of file system’s benefits? I have those too. Every ten-days I dump everything I’ve archived into a Boswell file, then run a script I wrote which builds a directory structure and puts the entries into them. It’s a mirror I never touch; a back-up; something for everything to link to universally (using those unique IDs I mentioned); something for Spotlight to search against. When Boswell fails to help me find something (rare), it’s there. Back when I was using the file system, I had a pretty decent system. I wrote a bevy of scripts in Ruby to make things easier for myself—but in the end it was just too much work. Making a new entry meant opening TextMate, then saving and locating the right folder to save in. In Boswell, its Cmd-E; start typing. Done. File it, or let it sit around for a bit and stew. I don’t have the Untitled File problem mentioned in the follow-up. Untitled stuff just stays like that until I change it or want to think about filing it.

Anyway, if you like the strictly file system approach, I recommend learning grep if you haven’t already. SpotLight is fast, but it can also be frustratingly vague. Grep is slow, but extremely accurate. Also, Hazel which I mentioned above. Hazel is one of the best Mac applications out there. It is the Scrivener of file automation and can make a folder of files almost application-like.

I’d caution using DTP as your email archive. It only has an extremely rudimentary understanding of the email header, and I’m not sure if it can even output an mbox file; or if it retains the full original email copy.

Update: Forgot to mention the one thing that always made me nervous whenever I wasn’t in Boswell: data security (which might seem ironic since it fubared my database once, but bear with me). Many of these applications support a form of locking which is in my opinion a very soft lock. The data cannot be edited, in some rare cases the meta-data remains locked, but in all cases you can still accidentally or even intentionally delete the file. Boswell just doesn’t let you. The only way to edit old stuff is to version it to a new copy. There is simply no way to delete anything, period. At first it seems limiting, but once you get used to it, it makes everything else—especially raw file systems—feel absolutely dangerous.

douger · April 17, 2009, 10:01pm

Amber,

What a fabulous reply. Thank you so much for the effort and insight. (I’m going to dig up your other posts as well…)

When I began managing my data in a more structured way (mostly in Journler) I began using a taxonomy of …

Thoughts: Things I made up (Drafts, observations, my writing, articles, papers)
Notes: Copies of other’s work (All those pdfs, web page printouts, notes from lectures or interviews)
Reference: Kind of like notes, but more referential - like systems notes, keystroke shortcuts and the like.

(I never adopted a notation like your Communications, because being somewhat shy and hermit like I tend to hide all that stuff in email clients. )

Journler had a nice field called Category that let me add these markers, and at times I adopted and then dumped a few other categories - like Goals (which really were Thoughts with a “goal” tag.)

Journal also was a wonderful tagging tool so I went off on dozens of cross linkable tags of all sorts and my filing system tagged along as I followed my hobby horse. These have now mostly sorted themselves out as folders. I was surprised at how a given level of abstraction in folder structure can make multi cross referencing redundant, especially when supported with full text search.

Even though Phil Dow is working on a new release of Journler, and perhaps will open source it, it became clear to me that (as the risk management folks say) I had a very high exposure to developer failure, so as i said I went to flat files. I must say I miss tags. Even though I still segregate my Writing from my Notes, there was always a richness to the linkages that tags provide. The other side of that is I find myself manually flipping through files more often and find serendipitous connections that no system would have foreseen. I think of it as being closer to my data.

Thanks also for the app recommendations. I’m giving Boswell a whirl if for no other reason that my curiosity is insatiable.

Oh and I agree completely with your characterization of time stamp chronology as crucial to filing and retrieval, as my personal data collection ages, my sense is that chronology will ultimately become the trumping tag for all my aging files.

And thanks again for the generous reply.

AndreasE · April 17, 2009, 10:17pm

Amber,

this was an impressive post - and an impressive praise of this software named Boswell. You recommend it, I understand

But when I look on their website, the last version dates from 2005, which is quite a time. How sure are you Boswell will still work under OS X 10.6 ?