Filesystem-based Information Management Question

Yes, as said I’ve been meaning to answer some of your original questions and just haven’t found the time, but now I have some, so I’ll give it a go.

First, meta-data on media. My solution to this is in part derived from the fact that I simply don’t archive too much media. It constitutes a small minority of what is in my system. So for me, a “filing card system” approach works just fine. I treat media just like a public library would treat their books. The books are not in the database, but the information regarding the book is. If I wish to file a PDF or a website, I write up a quick reference card in an MMD text file which points to the file in the body, so that it can be clicked and opened, some meta-data concerning it in the MMD meta-data block, and perhaps some commentary on it. If it is an image, I’ll include the image right in the MMD output as well. There are important reasons for approaching things this way. Manually creating “a card” like this, anchors the item in your memory. If you just press some “Stash Everything Into My Super Everything Program” universal shortcut… well the stuff barely exists in your brain. If you take the time to actually write this stuff down you will remember it. Using this system is not just about having a good archive, it’s about keeping your brain sharp. It’s designed to benefit you in many ways.

Lots of people really like their convenience though! I get that, I do. It’s just not for me and the system definitely reflects that.

To handle “media” (which I put in quotes because I really just mean anything other than a text file when I say that—sometimes that might mean old Scrivener projects, or what have you) I have a separate folder which is organised by token. So my folder structure looks like this, in abstract:

archiveBoswell
    2010
        10090
        10180
        10270
    2009
        09090
        09180
        09270
        09360
    ...
archiveFiles
    {R2.1}
    {C2.1}
    {I2.2}
    ...

The two key things here are that the text documents themselves are in the dated folders. I organise by year and then by quarter of the year, and that is it. This is just tons of tons of MMD files in there. Now if a Record.External.Observation type file ({R2.1.Society} for example) needs to link to some media, I’ll drop the media into archiveFiles/{R2.1}. So that is how I annotate and organise PDFs, images, and so forth. Again, I don’t do a lot of this, so this workflow might be way too cumbersome for many. 99% of what I archive is plain-text. That impacts how I work in a big way.

Briefly, if you have’t read my prior thoughts on tokens, the above is an example of a typical token in my system. When it comes to filenames I only use the initial letter, not the whole token. At the file list level, I’m not as concerned with whether it is an external or internal record—that level of detail is not important to me when combined with chronology and title. I see Doug uses at least one number in his names—I could see myself doing that, but thus far I’ve been happy with just “R” instead of “R1”. Since I basically generate files outward from their contents though, it wouldn’t be too hard for me to make a transition like that.

On the Complexity of MMD

I would say this: don’t confuse the advanced usages of MMD with MMD itself. In fact much of what I say has nothing to do with MMD… like above, that whole bit about scripting a name convention change, nothing to do with MMD. Just because I go all out and write scripts to do stuff doesn’t mean you have to do that (trust me I’d be writing scripts no matter what the core format was). At it’s core, MMD is just a plain-text file with a little text-based accentuation to it, no more complicated than e-mail or what have you. For example, the meta-data block looks a bit like this:

Title: The Name of the Article
Author: Your Name
Date: 2010-09-12

That’s it, and you can even make up your fields to. For instance, when archiving e-mail I use a “To:” field. It’s all pretty free-form and the only stipulation is that there are no blank lines in the meta-data section, and that you have a “Field: Value” construct on each line.

So when applied to this system, it makes for a very useful place to add a little extra data to the file, data which doesn’t appear visibly, by default, in the produced copies that MMD can create (though some of it can be used in self-evident ways; like Title being used to name the web page).

@Lettermuck: Do you think that a mix of your file naming system, with addition of limited tagging could do the trick?

As you might better see now, that’s actually a pretty good description of what I do. :slight_smile: So long as you call “tagging” putting these things into the text file in the meta-data block (which is, remember, invisible when viewing it in its non-text form). No risk there, so why bother with anything else? Really? I don’t get any of the advantages with using super-filesystem methods and all that. With Spotlight you get all the same benefits just typing the keyword into the text file, and you get a file that is just as useful on an old DOS computer, or an iPad.

Since the media is not effectively a part of the archive, rather the archive refers to the media, I can use however much meta-data and commentation on whatever type of file it is. No limitations because I have an entire text file (or even dozens of text files) with which to “tag” this PDF file, and zero fragility since it is all plain-text and ordinary files. No reliance on system hacks etc, and its all ruthlessly simple. There is no coding or complexity. It’s just text files that say “See this file over here…”.

Using MMD is far from learning how to program. A better way to look at it is a system for standardising the contents of your file. You could of course make up whatever internal system you want to use, like Doug has with the hash codes. The advantage of conforming to MMD’s system is that you get a bunch of pre-built methods to take your plain-text file and turn it into other things. That may or may not be important to you. If all you need is RTF, then maybe using MMD to create RTFs wouldn’t be the best way to spend your time. If you need something to sometimes be an RTF, and other times be a clickable web page, and maybe other times a nicely typeset PDF, then it’s a good system to have at your disposal.

This leads to your question regarding MMD exports, I don’t keep the exports around, in fact in most cases they only ever exist in whatever TextMate uses for a temporary folder, while I view it. The only real exception to this is when the exported version itself represents something that then required a lot of extra effort to manually adjust. In most cases, what MMD produces is just fine, but sometimes I want to do stuff that is special to that document, and if I spend a significant quantity of time doing that, I’ll save the final product (or more often whatever generated that final product) as well as the MMD copy. The MMD copy will remain the “master” in my archive, the product will get saved as “media”. So for example, this message, I wrote it in MMD and then “published it” using the BBCode generator. I don’t save the BBCode version anywhere, I don’t need to. If I need another BBCode copy and I get that with a single keystroke in TextMate. Meanwhile the base file can also be used to preview as a “web file” in TextMate, which is what I’m doing to proof this copy.

When I view a file in TextMate, I press Ctrl-Opt-Cmd-P, which runs the file through MMD, creates an webpage file in a hidden temporary location, and then renders the file right in TextMate’s web preview system. All of the links are functional—they will open files in their original applications if necessary, and cross-references to other archival files also work reasonably well.

Some things, the way I describe them, might sound more involved than they actually are. For example when I say that I reference the original file in my MMD meta-data block, all I mean is that I type in something like:

Source: 10256231-I-Some article.pdf

That’s all. :slight_smile:

The advantage of this, again, is that if you search for “Some article.pdf” in Spotlight… you get that file, but you also get everything that points to it… talks about it, explains it, whatever, the whole cloud! That’s the beauty of using simple old-fashioned tools for this stuff.

File Names and Meta-Data

To briefly summarise my philosophy:

  1. The filename is the envelope. It’s what you see in the “drawer” and should contain as much information as is convenient to differentiate itself from the other envelopes
  2. The meta-data within the file (I use MMD’s conventions, but one can use whatever they like) is the full description of that file. This is what aides search routines and automatic organisation (if you employ such).

The core philosophies seperating these two is that the filename should be concise and legible in large lists but contain enough to differentiate itself; the meta-data should be as complete as you have patience to supply it with. I’ve put a lot of effort into making my system as “low impact” as possible. I don’t want to spend any more than ten seconds adding meta-data, and to further this I use boilerplates and template files with most everything set up except for a few key things that change in every file, like the title and date. TextMate’s Snippets make this incredibly easy to do.

Something to consider: a number of modern Mac programs, including DTP for that matter, have prioritised searching speed over searching precision, and this becomes especially true with punctuation. If you do adopt a token system similar to what I have developed, that is something to take into consideration. DTP is perfectly useless for the way I work, because searching for “{R2.1” is meaningless. The punctuation isn’t considered in the search and will in fact mess it up. Leap has this same problem. Notational Velocity does not have this problem. It will do precise searches very fast. Boswell also does precision searching, but it goes about it the slower way.

Why Numbers

I’ll digress briefly on why I use numbers in the first and second axis of the token, as someone asked this question in response to Doug’s blog post. On the surface it does seem less elegant than the rest, because numbers require memorisation. However I definitely have a system for these numbers, which dramatically reduces how complex it is to remember them. In the first axis, 1 is always private; 2 is always public; 3 is always concrete-auxiliary; etc. R1 is an introspective record, a dream I’ve stored, a comment on my psyche, etc. R2 is an amusing individual I saw while writing in the coffeehouse. M1 is an e-mail, M2 is a forum post. C2 is something I intend to publish; C1… not so much. :slight_smile: You get the idea. The secondary axis also has analogous meanings shared between them. Example: 4 is unfinished. When applied to an e-mail (M1.4) that means I never sent it; perhaps I wrote it in a fit of pique and decided it would be best to temper myself first. In {i3.4 it would be an unfinished concrete report of some collected information.

The other reason for using numbers, Doug already pointed out in his response in the blog comment section: It makes searching even more powerful. M2 is an unlikely sequence all by self, but with the dots it is even more unlikely. M2. is very unlikely, but Mp. might conceivably be more likely, especially with a case-insensitive search. That could have been “hemp.” instead of Private Communication, or whatever.

Numbers are also computer friendly. Alphanumerics can be used nearly anywhere, punctuation is a bit more spotty. I did briefly consider using punctuation instead of numbers, but if I ever did want to put full tokens into filenames or use them in some area that was picky about characters, it would become limiting.

Finally, numbers allow a great degree of expansion, especially if you demarcate them. While I don’t have any double-digit signifiers at the moment, the system could easily accommodate a R3.12.. I’d hope it never gets that way though, as that probably means there is a taxonomic failure and an axis needs to be split.

The punctuation placement also allows for some interesting searches. Since the minor-axis number is the only one surrounded by dots, searching for .4. will return all entries in my database that were never finished. Change that to .4.SomeName} and now it returns all unfinished items written to, about, or regarding that individual.

Organising Things

To expand a bit on the matter of folder organisation, I want to stress the importance of simplicity, which is a message I fear could be easily lost as I extrapolate, so I’ll come back to it at the end. The most important aspect of this system is that you shouldn’t be organising. That’s one of the major goals of it: reduce or completely eliminate the overhead of filing.

Now for the digression. I cannot answer simply, because Boswell has an folder scheme much more akin to Gmail’s label system. Technically any instance of an entry in a Boswellian folder is an “alias”, and so it very naturally works with the concept of having items scattered all over the place, whereas in many other systems this sort of cloning is a secondary feature, not a natural definition of what the relationship is between item and folder. In Boswell, an item can be in no folders at all—totally invisible unless you search for it.

So, I don’t actually strictly adhere to the chronology system alone. It is definitely an important component, and drives the sort method within most of my folders, but I’m able to easily maintain topical buckets which are automatically populated as I archive items. It’s just as often I’ll go to a folder like “Letters-Sent”. Broad notebook buckets like this are useful in Boswell because, like I said, it searches the slow way. So stipulating a top-level bucket that narrows the pool down from 10,000 to 800 items will mean the search performs that much more quickly. Within this bucket, as said, I nearly always sort things chronologically.

So they are really more like tags that act like folders. Coming back to the top emphasis: I don’t manually handle any of this. Boswell manages all of the organisation for me whenever I archive something. So my system remains pure to the paradigm of dump and forget, while also benefiting from a little computer-aided topical organisation that is 100% set up by me. I know it is flawless because I set it up, thus I can trust it.

So how does this stuff end up in actual folders, if I’m doing it all in Boswell? Simple, I dump the days worth of stuff out of Boswell and into the appropriate quarterly folder first thing in the morning. I don’t myself use that folder much, except in cross-references. Since I cross-ref heavily, the MMD links all point to the items in that folder, so when I click on a cross-reference in TextMate, it’s from that system that I view the files.

This also means that I have a Boswell-free redundant backup.

To reiterate: If you engage in complex organisation, you are missing the point of the system. One might be tempted to sort the "R"s and the "M"s into sub-folders beneath each quarter folder, I would caution against that. Part of what makes this system unique and powerful is that the chronological listing remains unbroken across taxonomic boundaries. Everything is listed together, no matter what type of file it is, in the order they were archived. This can yield interesting combinations! Remember that if you do need a little focus, it’s easy to do. You can either mentally block out the non-“R” stuff, or whatever, or use Spotlight to produce a quick focussed list.

Browsing and Previewing

Browsing is probably the weakest part of my system, but for a good reason. I hardly ever do it, and never as a way of finding things. The only time I’m browsing is if I’m feeling nostalgic—then maybe I’m going through a particular axis or two in a date range without caring what I come across so much, or if I’m looking for something in particular, and then the ID-Primary-DescriptiveTitle naming convention is 99/100 more than good enough to get the job done. In Boswell, if I’m not sure from that information, but have things narrowed down to a pool of 30, I can just DownArrow through the list and the text shows up immediately. Same can be done in Notational Velocity.

While there are definitely occasions for browsing or previewing, in most cases—for what I need—an excessive amount of this activity would probably mean a failure in the system. My system is designed so that I can hit a needle in a haystack within twenty seconds, no matter what that needle may be. Most often it’s more like 5–10 seconds. Very rarely is it longer, and then there is some browsing required. Even a very good system will at times suffer the fallibilities of human memory, I might just not remember enough to pull it out quickly. More precise and elaborate searches can usually accomplish this, but that means more time to acquire it. In most cases, it is like Doug says, your search result is usually one file.

This, goal if you could call it that, might not be relevant for everyone. For a graphic designer, thumbnail previews might be the equivalent of a textual descriptive name. A similarly designed system that placed more emphasis on a thumbnail for rapid retrieval would certainly be valid for some people. Programs like Leap or even just Finder in icon mode might be all they need.

In defence of naming conventions, for Druid’s sake, calling the naming system a “restriction” is to mistake the system. The main grief I have with application meta-data, and for that matter many of the meta-data systems featured in OS X applications that use Spotlight, HFS+, and other tricks, is that they aren’t very portable. All of those tags and folders and comments and labels get lost as soon as you leave the system, upload the files to an FTP server, or work in Windows for a while. Doug already pointed out the fallibility of relying upon system tools for chronology as well. They were never designed to be archive-proof for one: they are more meant to be activity indicators on the system. When was it made, when was it modified; neither is immutable. The archival date, which is a central and architecturally vital component to this system, cannot be trusted to any of these flags. The naming really is the system, everything else is gravy. Saying it is “restrictive” is no different than saying, “You don’t have to worry about the restriction of putting your files in DEVONthink to use DEVONthink”. :slight_smile:

If someone is interested in adopting a Noguchi styled organisation system, then they must have a 100% foolproof date system, and the filename is an profoundly logical and good place to put it. From above, the filename is the envelope, and this system is all about envelopes, and less about the boxes, when you get down to it.

Is this method for the super-organised? I’d say not, actually. It might look like it on the surface. Someone might look at my boilerplate without reading about how I create it, and think, “Wow, that’s a lot of meta-data work and filing”. It’s really not though. It’s, as described, five to ten seconds of entering information. I can mark up thirty files in a few minutes, and that’s usually the maximum of what I ever have to do in bulk—that usually happens whenever I come home with my AlphaSmart. :slight_smile:

If anything, I’d say this system is about dumping the super-organised mindset. One of the main reasons I designed it was because I was spending too much time organising things. I wanted something that was save-and-forget, not even file-and-forget. There is still a little filing, but honestly it’s not much at all, and it’s all habitual at this point anyway. I don’t even think about it. Does DEVONThink (Pro) (Office) (Skyscraper) supply that kind of ease? Yes, I think it does. I’ve read enough about it from people who are avid users of it to gather that it can provide that kind of thoughtless collection and retrieval. I’m not saying this system is better except in two key points: (a) it will still be working in thirty years, and (b) I believe it has psychological advantages. See also the thread on commonplacing. There is, I believe, something mentally healthy about creating the structure by hand and taking the time to give it proper care. I think the second point does have a limiting factor though, in terms of quantity. If someone is a pack rat and gathers hundreds of things per month, they would have no time for anything other than writing synopses of it all!

As Doug says above, there is almost a leap of faith in just how little you organise things. It’s feels very strange to work in this system at first. Your intuition is telling you you aren’t doing enough! All of this will be lost! It does work though. I’ve never lost a single file. The only times its failed me are when I failed it and neglected to put something into the system. There is no way around that problem though; everyone makes mistakes and any software or system will fail to produce information you never gave it. :slight_smile:

Responses

OpenMeta

@douger: But I wish I felt better about OpenMeta, there is always the sinking feeling that Apple will pull the rug out one day and all the tags we’ve put on files in that “reserved space” will be washed away. So my needs are well served with Tagger, but I worry from an architectural standpoint.

I would definitely encourage some manner of redundancy here. In my opinion OpenMeta is probably even less safe than DEVONthink in terms of portability and future-proofing. Plus there is yourself to consider: Do you want to predict that you’ll be using a Mac in ten years. I’d like to think I will be, but who knows. Ten years ago I was a Linux geek and Macs drove me up a wall. Now I have Linux on a Parallels VM and hardly ever use it. And that’s just me, like you say, this OM trick is an Apple thing, but they are not known for holding fort over things they’ve established. They are in the consumer business, not the long-term enterprise business.

I have considered looking into writing a Ruby based OpenMeta script that will process my hard-coded meta-data and make it OM-useful. It would be nice, but I definitely wouldn’t strip out the core. It would, if anything, become like a Boswell to me. A useful tool that sits on top of a system.

Avoiding Specificity on the Envelope

I’ve corrected early mistakes, like not making the file naming conventions abstract enough - too much specificity and the thing breaks down under its own weight.

That’s a huge “beginners problem” with this system. Hey, that was version one and two for me, before I started writing things up in the forum here. By then it was around version four. The first versions had a huge amount of specificity and it was a royal mess. Fortunately I kept very good records, so upgrading systems has been easy, but this latest iteration seems to be the winning ticket for me. 09004 was a final synthesis of many ideas, much of which I posted here in late 08, and honestly it hasn’t changed since then, except in ways it was designed to change. The final plain English tag in the token—totally designed to be infinite. There are many more of those now than there was then, but this doesn’t burden the system. It doesn’t hurt it at all to have .PersonA} and .PersonB} specified. That only makes it better, and they are both still {M1.1 or whatever.

So yeah, I love it too, work in it daily and benefit from it nearly hourly. I’m glad to hear you’ve remained just as pleased with it in the long run, too.

Future Proof

And I don’t worry about application functionality as much, because my data is always secure. I’m using apps against data, not housing my data in an app.

That’s really the key thing here. I don’t trust software for the multi-decade question. It’s nice, some of it is really amazing (I’m lucky enough to work for one that fits squarely in that last superlative), but for archiving data: that’s the long haul. I’m going to be carrying this archive around with me when I die. By then it will probably be in some little crystal embedded in my fingernail or something, but I’m not going to trust Boswell, or DEVONthink, or even Apple, to be on my fingernail as well. Yes, the files can be exported and moved to new systems, but so much of the connective tissue cannot. All of these “tags” and “links” and so forth are incredibly fragile in the long view picture.

So to that, I use what can be automated flowing outward from the core data. If OpenMeta can be automatically updated from a text file with a special script, then great I’ll take advantage of it and use it to the nth degree however I can, but if OM collapses my core meta-data is still there. I use Boswell to maintain an elaborate workflow with these buckets, and all without having to give it much thought, but if it collapses I won’t lose anything. Internal links? Same issue. If I used DTP’s cross-links or VoodooPad’s then I’d be really screwed if I had to move to Linux, so my links are all simplified MMD. Even if MMD fails, the stuff that makes it work is just conventions in a text file. That will always be useful even if there are no more computers left.

Working Journal vs. Immutable Archive

@Lettermuck: Regarding your ‘Boswell’ approach to never modifying archived files. Do I understand correctly, that you keep working files together in a "work’ area for up to 30 days. These files can be edited and added to, until such time as you archive them?

That is correct, though I often archive far before the 30 day expiration, even if it isn’t done yet, that is just a maximum.

Incoming Stuff

It would be interesting to know how you treat files that come in to you from external sources. Do you rename these files?

I not only rename them, I also MMD them as well. :slight_smile: Everything in my system is MMD, or is an MMD file that points to a media resource. Since MMD is so simple, this generally isn’t a difficult task. It’s usually just a matter of tossing some meta-data on it and potentially double-spacing paragraphs. Once something has come in to my system though, it slots right in with everything else. You get that same advantage of a system with no taxonomic edges for imported stuff, too.

If not then it would be difficult to fit them into your system, but if you do, it can be difficult in liaising with the party who sent the file (and knows it under the original name).

That’s not as big problem as you might think. Look at it this way, the archive is an internal storage system. I don’t (can’t) even edit it. Other people certainly don’t edit anything in it or even see it. If there is a file that is going back and forth between parties, and is still being actively edited, this is how I treat it:

  1. They send me a file that will be mutually edited over a period of time
  2. I create a stub entry in Boswell, the clock is now ticking, and I point to the original file in this stub entry, much like I would a media resource
  3. If 30 days pass and it is still be edited, I copy the current file contents into the stub file, Versionize it… now there is a new stub file and another 30 days.

So as you can see, the other person is not aware of any of this. The archive is my own internal system. To a degree the MMD is an internal system too. I prefer to render out a copy before sending or posting it.

Boswell

I have looked at Boswell. Any chance you can convince the developer to produce a Cocoa version

I’d write to the developer and let him know you are interested, that’s the best way to let them know there is interest.

I can’t really say for sure if there will ever be a Cocoa version. I’ll say I have reason to believe that is a possibility, but I also really don’t have any reason to say it is inevitable. :slight_smile:

You are definitely right in your final assessment though. It is an idiosyncratic product which would drive one nuts unless they really like the way it works already; it is extremely rigid in some philosophical ways like few programs are. It’s also not for squirrels, as you note. It’s a text authoring program, primarily. For the mass archival of text, whether collected or generated, and has no interest in images or PDF files or what have you. It’s a specialised tool in the same way you wouldn’t expect iPhoto to archive e-mail. I realise its trendy to have “everything buckets”, but personally I’ve never really seen much merit in them. I’d rather have a comprehensive index card system that, if necessary, illuminates but does not contain what my file system already does a good job of doing: files.

Point of taste, no doubt.

1 Like