Tagging, Taxonomy, and Metadata (oh my!)

(crossposted in the Software By Other Folk > Large-Scale Info Managers and Scriv. thread)

First, let’s begin with a tautology, as evidenced by the four pages of forum commentary preceding this. There is no one right way to organize your data. Say it with me now: “Duh!”

That said, I think we can gain insight by looking into what makes a certain type of program appeal to a certain type of workflow. So, here’s a breakdown of what we need from our programs, extracted from the prior discussion. Note: Almost all of us seem to use a combination of these tools, but the most important is the FIRST interaction, basically what you do immediately following acquisition of a piece of data.

I’ve broken this post up into a few sections. First, we get the basics, a by-no-means complete listing of the ways we interact with our data. Then it’s on to the programs, what does what well, and which ones seems to appeal to certain types. Finally, we get into a couple example workflows, culled from members of this forum, with a bit of my analysis following.

Important note: I may be making an artificial distinction here, but I’m choosing NOT to discuss database vs. plain folder structure, durability of information, or any of that stuff. That seems to go more towards how we deal with data in the abstract, rather than how we deal with specific pieces of incoming data, and I like to think it’s beyond the scope of this post. However, I do understand that things like compatibility, freedom of access, and all those other things do play a role in the decisions made, so maybe I’ll get into those later.

The Basics

Tagging: One of the most obvious tools, it allows us to attach keywords to documents, crosslink, and set up a sort of complicated flowchart of connections amongst our stuff. Seems to appeal to folks who accumulate huge amounts of information and need a flexible way to organize it, without losing sight of the whole. It lets us put a simplified abstraction map over our stuff, so we can find generally what we want fairly quickly. However, tagging is vulnerable to proliferation, in which each set of new data gets its own tag, such that the abstract map begins to become just as complicated as the underlying data structure in the first place.

Searching: Be it Spotlight, Devonthink, or cmd-F, we need a way to hunt through our stuff. Tagging tackles this problem by creating another layer of abstraction that lets us search for concepts, if not specifics. Searching lets us dive through the nitty-gritty files. Also is infinitely extensible, with larger collections of data merely taking slightly more time to search. Seems to appeal to folks who need to find a specific thing in a specific document. This requires the least user-interaction at acquisition, but falls through if you find yourself doing the same searches over and over again.

Smartfoldering/Saved Searches: This is for those exceptionally dynamic data collections, when wehave so much input we can barely remember what went where and when. Less flexible than tagging, but also requires less effort on the front end. On the downside, sometimes things just fall through the cracks.

Taxonomy: This is the most basic tool. Get document, stick in folder. Works best when your data is easily classifiable, and each subclassification contains a relatively small number of documents. Lots of documents, folders, etc. sure, but it follows the handy tree structure, and it seems to be the most easily integratable (is that a word?) with the other techniques. Also, this is the one we’re ALL familiar with, from years and years of using computers with folders within folders within folders.

The Programs

There’s been a whole BUNCH of discussion about the pros and cons of each program. What’s included here is by no means exhaustive – for that, use the handy search link at the top of the forum – but it is meant to provide a brief overview in the context of the previous section.

  • DevonThink. The 800-pound gorilla in the room. It’s lightning-fast, has ludicrously powerful searching, as well as a built-in AI that works to classify documents on the fly. Primarily, DevonThink seems to cater mostly to the taxonomy/search crowd – what with its lack of tagging (which is hackable, but far from ideal), relatively spartan interface, and fairly limited smartfolder support (again, hackable, but not ideal), DevonThink attempts to make up for in speed and “intelligence” what it lacks in breadth. The classification features seems to work more or less like an on-the-fly, adaptable tagging feature. Rather than finding every document tagged X, it goes by context, says, “all of these are related,” and lets you put the label on it. On the downside, DevonThink is SO keyed for this type of large-scale that anything less feels a bit like shouting into an empty room. And it has the most powerful searching of any program I’ve ever seen, what with its fuzzy searching, phrasing support, and everything else, it makes spotlight look like sorting through a card catalog.

  • Journler. This is an odd duck. It’s half entry-based journal, half DevonThink, with a few neat features tossed in that makes it pretty good at lots of things, but not quite as powerful as others. Journler supports tagging and smartfolders fairly well, though for some reason its smartfolders won’t search WITHIN added documents. On the bright side, you can have multiple documents within a given “entry,” which allows for closer grouping of specific things, as well as a handy, attached place to leave notes without creating a whole other hierarchy just to handle it. Other positives include easy customizability of icons, perfectly usable taxonomy tools, and a really, REALLY cool and innovative way of using nested smartfolders as an implied AND operator (the top smart folder only includes items found in the lower smartfolders). The entry- and date-based structure gets in the way of the documents sometimes, though, and it’s not quite as speedy or robust as some of the other options. DISCLAIMER: I’ve decided on Journler for my work, as it fits me the best of what I’ve found. I have, however, tried every program I’ve listed here, and I’m trying to be as impartial as possible.

  • EagleFiler. This one seems more geared towards mass data acquisition and tagging. A large portion of its interface is devoted to rapidly tagging and sorting incoming data. But hey! It’s got smartfoldering, too. There’s not too much else to say, here.

  • Yojimbo. This is the simples of the bunch. Folders (but not nested folders!), tags, a few different types of entries to organize things even more, it tries to organize the data by type, and then, within those types allow for some level of organization. Personally, I’m not a big fan of Yojimbo, since it’s nowhere NEAR as powerful as some of the other options, but if you want something pretty and simple, with a heavy bend towards tagging, it might be worth a shot.

  • Nifty Box. Even more than EagleFiler, Nifty Box is all about the tags. The ONLY way you sort things here is by tags. Folders? Nope. You can have folders of tags, sure, but tags are the primary way you interact with your data. If that’s your game, more power to you! Otherwise, you might feel a little stifled with the lack of taxonomy features

  • Voodoo Pad. Almost no one on the Lit&Lat forums uses Voodoo Pad, and neither do I, really. It’s a wiki-based information manager, which lets you dump whatever you want into these documents and then crosslink willy-nilly. Think brainstorming, drawing arrows between and among everything you have, but with fairly little overarching structure. There are folks out there in the Mac community who swear by this program, but I’m not sure if it’s as useful for the writer-types who happen to frequent this forum.

  • Plain Text/MMD. This is both a technique AND a system. I’ve not employed this one, so I’m going to quote the redoubtable AmberV here, who continually makes me feel like I should probably go out and teach myself MultiMarkdown.

  • Folders & Spotlight. This one doesn’t need so much discussion – hell, we all use it daily. Its primary downside is that for all its power, Spotlight is woefully underpowered when it comes to searching. No phrasing, no operators, and it searches your ENTIRE HARD DRIVE. This can be slow. Scratch that, for anyone who accumulates enough data to need an information manager, is IS slow. But hey, it’s free, it comes with the OS, and it’s something we all use anyway. And it does work for some folks.

  • And, lastly, our favorite, Scrivener. We all know what this program can do – a friend of mine called it “the Aperture/Pro App of writing.” But for all its power, Scrivener seems more project specific – it’s all about the writing, not the research, and sorting through hundreds of folders and documents in the “Research” folder can be occasionally daunting. But since this is the reason we’re all here, I figured I’d include it. From my humble perspective, Scriv has just enough info-management oomph to cope with what’s needed for writing, but not so much that it feels bloated or can get in the WAY of the writing. For what it does, I find it more or less perfect.

The Workflows

To put the mass of the above into context, I thought I’d drop in a few workflows that highlight the various techniques in action. Because I’m writing this thing, I’ll start with mine.

First off, I’m a law student, so my world is all about PDFs and searching through them/extracting specific quotations. I don’t need to worry so much about crosslinking them, since cases on murder usually have relatively little to do with cases on government contracts (but not always!). So I tend to shun the metastructure imposed by tagging until later in my workflow, since the research itself tends to limit my intake to at least a RELATIVELY narrow field. Taxonomy works well here, with a separate folder in Journler (or DevonThink, or whatever) for each project. Searchability is KEY, and smartfolders can help too. This is where the entry-based structure of Journler tends to work out pretty well for me, since if I have a few cases on a specific item, I can group them all into one entry as “resources,â€

See my answer in that other thread.
Maria :blush:

I use Tinderbox mostly. Okay, it’s on a project basis, but my work generally involves collecting information, gutting it, then, most importantly, mapping the connections. Very different from, for example, fiction writing. So the workflow is

(1) Whatever I fancy clipping/dumping the stuff into in the first place (usually Yojimbo, as a sort of huge in-box)

(2) As soon as possible, moving it into Tinderbox, which I can make do pretty much anything I want (and now that it has something similar to tag clouds and something similar to DevonThink’s “see also” function it’s even more powerful) and mixing it all together

(3) Export the results of this mulling-over of notes into Scrivener.

See, most of it’s about reading and re-reading and re-re-reading my notes. The writing-down or typing-out is the very last stage. I seldom commit prose until we’re ready to go with the whole project. Writing in snippets works for many people but not for me, so this solution works well.

The thing about Tinderbox is it’s basically a big XML file (so to a large extent future-proofed) and it’s Finder-friendly. The tricky thing about it is it’s more of an environment or a language than a pre-built app; but that’s also its strength.

YMMV, as they say.

You bring up an good point there: I would extend it a bit and say that any archival application using XML or an SQL compatible database is going to be very useful for many years in the future, whether or not you stick with the application that generated it. There are a few caveats, of course. Obviously, neither is terribly easy to fathom unless you are at least moderately geeky. But with a proper viewing application or front-end (of which there are plenty for both) the experience can alleviated. The other thing to watch for is that many applications tout standard formats, but in actuality are just loading up the standard format with proprietary binary data, which isn’t going to be terribly useful.

Tinderbox is not one of these. The XML file it produces is very easy to fathom, and large parts of it could be read without aid if you wanted. Tb’s biggest disadvantage is that it doesn’t handle file sorting all that well. It isn’t really meant to be a file organiser, like EagleFiler or the Spotlight tagging systems. It’s all about textual information imported into it. For most things, this is fine. I use dedicated media organisers (such as Aperture) for visual sorting. But, some people really like their HTML files to stay HTML, and their PDFs to look the way they meant them to, and Tb isn’t really going to address that need. It has file pointer data types, but these are fragile in my experience. Something as simple as opening the Tb file on a different computer and then moving it back to the original computer can break these links.

I’ve spent silly amounts of testing different applications for the ‘perfect workflow’ since I got my mac. It should probably take me a year at least to make up for lost time, provided that I don’t restructure my things again in the near future, with the next greatest app. I seriously wonder how many ‘lifehacking’ nuts actually save time.

That aside, here are my results. Perhaps they will save you some research-time.

Out of all the applications I’ve tried, I’ve settled for the following three ‘archiving/editing’ tools: DEVONThink, Journler and Scrivener. Let me explain why I settled for these, and why I use more than one app.

DEVONThink, as it has been pointed out, is a huge, lumbering, all-encompassing archiving and searching app. I use it to store everything that I might want to look up later. And I mean everything. web pages, academic papers, whole books!

To keep my database from becoming too clogged, I’ve set up two separate databases. The first contains entire ebooks, many of which I also have in ‘physical form’. The second database contains my academic papers, observations, web pages and the like. The reason for this divide is that the intelligent ‘see also’ and ‘classify’ features really shine with documents of around 500 words. Books are just too big, and the AI slows down too much. What I end up doing is to put summaries of books in the ‘small document’ database. Eventually, I might even create a third one for the mid-sized documents (papers of around 30 pages), but so far everything seems to work.

The problem of DEVONthink is that it’s not really easy to use, especially for writing.

For my day-to-day activities and notes, I use Journler. Journler’s workflow is based around daily entries, so it’s a good way to see how productive my day has been. On top of that, it has a smartfolder/tagging system that makes it very good for archiving.

To that end, I tend use Journler for articles, images and everything else that I collect on a given day. It’s basically a finder replacement. The main reason, aside from the ‘daily-activity’ feature, is the nested smartfolder support, and the robust tagging and indexing.

I have a ‘science’, ‘creativity’ and ‘work’ smartfolder at ‘root’ level. Below creativity, for example, I have a ‘guitar’ smartfolder. Clicking on it shows me all the entries in the creativity folder, tagged with guitar. This is an effect of the nested smartfolder support (that no other app has). If I drag the ‘guitar’ smartfolder to, say, work, I would get all the entries tagged with ‘guitar’ that are in the ‘work’ category. Obviously, none exist. I’m not a musician by trade.

A bunch of the entries might eventually be moved to DEVONthink, and in some cases (for example, an ebook I’m currently reading) from DT to Journler. In the latter case I only have to provide a link to the file that already resides in DT.

For actual creative writing, I use Scrivener. I don’t think I need to explain why. Any articles that are published (published to my blog, or my uni paper, that is) are then added to Journler for quick reference, and in some cases to DEVONthink, if I think they have lasting value.

Finally, I use a program called Textmate for some of the writing work, and to convert my scrivener multimarkdown documents to (x)html.

As it is, my workflow isn’t perfect. I still use Journler for some archiving, even though it might make more sense to use DT exclusively, and Journler just for day-to-day work. As it is, I see Journler as a buffer; anything really worthwhile will eventually (yearly? monthly?) be moved to DEVONthink, for later reference. To what degree this actually works remains to be seen.

So there, my workflow with three/four fantastic mac-only apps. Suggestions or criticism is welcome…

Brian, thanks for this.

I’m still doing an indepth reading of it (and the responses) but I wanted to note that with Leopard + Spotlight, you can limit the search to a user name and choose what to search between content or filename. You can also choose what formats to search (PDFs, images, webpages) in Sys Prefs -> Spotlight as well as rearrange the order in which they appear in the Spotlight window.

(Back to reading…)