Large-Scale Info Managers and Scriv

It has been great reading through these posts and really got me thinking about managing my files.

However, I’ve hit one big hurdle. I seem to be unable to reconcile the use of folders with files in my thinking. I understand AmberV’s file naming categorisation, but how does this link in with the use of folders? Creating folders with categories that are not in the filename categories seems to weaken the system as the folder information is not attached to the file.

Do I create folders that represent areas of work, home, clients or should this data be in the filename?

Looking forward to some simple answers as my grey cells cannot seem to compute!

The only reason I advocate and use folders as at all is to the avoid the trillion-file-folder problem. Most stuff gets slow once you have a few thousand files in it, and even if it doesn’t get slow, it gets unwieldy. Since the only reason I use a folder is to avoid this problem, I don’t want to spend any extra time messing with folders than I have to—either making them or sorting things into them. The idea, for me, is to make filing as mindless as possible. So to that end I have a folder called 2011, and in that I have a folder called 270. It is the 231st day of the year, so I’ll be using that folder for another ~40 days, then I’ll make a new one called 360. Thus, I make four folders a year, and I never “sort” anything. Everything just goes into the last folder in the list, and so I borrow from the IT convention of having a latest sym-link at the top level that points to this last folder. Now I don’t even have to bother with whether it is 270 or 360. :slight_smile: I just always, always save to “latest” in the Finder sidebar. End of story.

Quarterly folders work well for recollection. It reduces the list of things you need to hunt through by quite a bit, and it’s an easy target for your memory to come up with. First quarter of 2009—great, I don’t know when I wrote this letter, but I can go through a list of 80 ‘M’ files without too much hassle. Grep that list down if I know the token and I might only have a dozen or less to poke through. So, in a sense, it’s a piece of filename data that is use in my case, since the first part of the filename is the date.

As for folders not being ‘attached’ to the file. Well, that depends on how you look at it. The full path of the file does include all of its folder ancestry, and the full path is often accessible, especially in a UNIX environment. It’s often very visible in a UNIX environment, like a URL. So in a way I think it’s a pity to waste it on something redundant like the date, but like I say, I prioritise on ease of filing.

Having a top-level distinction between Work and theRest isn’t bad. If you tend to work in a “career” you might just want to leave it at that. Stuff that I archived from where I worked two years ago is still occasionally useful to me today because I still do some of the same things (taking care of web servers; coding pages; etc). So I haven’t started a new trunk for Scrivener, it just switched over from theRest at a certain point in time to Work.

  • Work-Latest ⇢ Work/2011/270
  • Personal-Latest ⇢ Personal/2011/270

That’s it.

Many thanks for your quick reply.

So you would have a folder structure that resembles the folowing:

work/year/quarter/id-supercat-mincat-key.ext personal/year/quarter/id-supercat-mincat-key.ext

I’m not really a CLI guy. I do very little in the terminal. I’m a church minister and teacher. I also do some graphic and web design.

Is it wise to separate these areas with folders or use the filename categories?

My structure could work as follows:

work/church/sermons/year/quarter/file.ext

or

work/year/quarter/20110819-0843-R-church-sermon-key.ext

Question is whether either of the above is better than the other?

Interestingly enough the thought of going in this direction does cause me to break out in a sweat as it is so completely different to what I’ve been doing. The reason for changing my system is that it is taking too long to find stuff.

I should also add that in the realm of teaching there are lots of subjects and the question comes again whether to use folder or filename categories. On the whole I prefer the filename category, especially as I’m not a CLI user and the folder then is not generally part of the search query.

You also mentioned learning GREP was useful. Is that something you would say is a priority to learn?

I wouldn’t be able to live with throwing everything in one folder per quarter of a year, not even in one folder per month. For my way of thinking it’s vital to have things together that belong together. So I am contemplating a lot with the finder before me, a folder open that contains all documents that belong to one project. For example, a renovation of a part of my house: Lots of scans of documents, a Numbers calculation of costs, important emails as text files, some photographs, etc. – well, just like a real world folder of everything that I need to have at hand while dealing with this project.

In my experience, finding a file again isn’t the main problem here. What matters is thinking: what improves it, what hinders it? Similar items are fertile in ideas, an all-files-of-the-year-folder isn’t – it’s just a big repository (to avoid the term “mess”), a simple storehouse with boxes collecting dust.

So, I have, of course, a main folder “Projects” and a folder for every project in it, the bigger ones with subfolders like “Completed”, “Previous Concepts” and so on; whatever’s needed.

Thanks Andreas.

Personally, I do prefer the chronological structure. Simply because my data will naturally file. If I am involved in a project for a year then after that year it will just disappear. If I create top level folders with project names, then I have to file them when they are complete and the chronology won’t be strictly correct as parts of the project may have been completed long ago.

The key for me, as this great thread highlights is in how you store your files. Incidentally, I used a programme called Leap (http://yepthat.com/leap/index.html) for a while that did just that. It automatically created a year, month and day folder and filed everything in those. The problem I ran into with Leap was that it used tags to find the information and that method has it’s own problems as mentioned in this thread. Safari saves downloads in chronological order and I find that extremely useful. As has been previously mentioned we tend to have a fairly good idea of hen e worked on something or created it.

I have to be honest that the idea of storing things chronologically appeals to my logic but engenders fear as it gives the impression you won’t be able to find it. That’s purely an irrational fear with no real basis.

Well ultimately it comes down to what works best for you, the data you are working with, and your relationship with it. It is difficult for me to say.

But I can say on this, where in example A you have a folder hierarchy ending in a simple filename: that’s the typical way of handling files and folders. You’ve got broad folders at the top level and things get more and more specific as you drill down, and finally you start running into files. The files themselves are named fairly simply, because they don’t need to be unique amongst many; they don’t need to communicate much for themselves when the folder hierarchy has already said much.

The reason that I myself started developing something that eschews that way of working is because I grew tired of:

  1. Having to decide which folder everything should go in; especially the ambiguous ones
  2. In the case of ambiguous ones and wanting to use aliases/symlinks/etc having to do all of that manual labour in order to keep things consistent
  3. Not being able to easily scroll through a list of everything because I don’t remember which folder it or its aliases are in
  4. Not being able to see items in context with one another even if they are unrelated

So, lots of busy work throwing files around; making folders; and if one is ambitious, aliasing them around so that your mind has multiple access points in the future in case you think of one hierarchy and not the other.

The latter two points are more discovery based, and are important to me in that upper equation. My data and my relationship with it is such that it is very useful to look at a list of things from “around then in 2006”. Even if I’m only looking for one thing, finding that thing amongst all of the things that I did at that particular point in time (and took the trouble to record) is to myself, extremely useful, because I’m the type of person that gets insights from and ideas from stuff like that.

I can find an old recorded notion, but I can also see everything that went on around that time that may have led to the notion. It’s not just a file sitting in the end of a long chain of pre-supposed descending categorical statements. It’s a chink in a chain that is my entire recorded life. Being able to see, just to pull out an example, the development of this system itself as it was recorded in I-Theory files amongst all of the personal observations; journal entries; e-mails & chats; creative work; photography—everything else that makes up my life, is pretty cool and exactly what I set out to accomplish.

So for me, the primary premise of this method was to build a better record; because I’m big on that. If your relationship with your data is more needs based; more results seeking; then some of these more existential properties might be of less importance. The chronological theory might still be useful to you, but for entirely different reasons.

So that is why I prefer no folders at all—and the only reason I started using quarterly folders is, as said, to keep the computer from getting slow. Finder takes an age to open and display a folder with 10k items in it. Over time I’ve come to appreciate the quarterly slices as a feature in and of themselves, too. Like I said earlier, it’s pretty easy to pin down something by quarter and year. You might be foggy on precisely when, but late 2008 is pretty easy to pull out of memory.

And that is of course why I felt the need to develop a formal system for naming the files themselves. A long chain of files with no categorical assignment needs to be able to convey quite a lot in the filenames.

As to which method is better, who knows. I know what works better for my self+data+relationship model, but that’s about all I can speak for. But it is interesting to note how Andreas and I are essentially operating from the same stance, your method needs to improve your thinking—but to him a chain of files is a dust-bin. To me that’s how project and categorical folders are because once the project is done, the files within it cease to live amongst the rest of my thoughts unless I specifically drill into that folder and muse through it—but even then it is divorced from time and the history of my life. It is just a categorical statement, isolated from everything else. I could turn on Finder’s modified date column and visually cross-reference things, but that’s very inefficient. So that system doesn’t work for me and my thinking.

I mentioned grep as a tool in conjunction with CLI usage because that is how one would approach the problem of paring down a long list of file names if they didn’t have a GUI. I don’t think it is necessary to learn that skill on a Mac unless you find yourself running into limitations with Spotlight’s fuzzy-results system, or cannot find a search tool that works for you otherwise. grep uses the complex regular expression query language. You can Google ‘regular expression tutorial’ to see whether or not it is for you; if so you needn’t take yourself to the command line to use it. There are a number of Mac tools that use it as well. Geeks like it for its power, so you tend to find it lurking in the “advanced” sections of programs here and there. EagleFiler, for instance, allows you to perform regular expressions in smart folders.

There are so many ways of finding something, but the chronology + categorical filename system very rarely fails me. It did for a long time feel really weird (to the point of panic) to just throw everything into one spot, I fought it for a long time; trying to build super-layers of categorical assignment via keywords or what have you on top of everything. But then I realised I was just replicating the busy-work parts of my old problem. So I pared down these super-layers (which I ironically never used, not once, to find anything) and let the information flow naturally as it was experienced, written, and fixated.

Innovations since that point have been entirely to mechanisms within it. I now use a recorded-search method. If I look for something and it takes me more than a few seconds to find it, or if I look for several things—I write that down! I create an index file of my discoveries, the holes in my theories; areas I feel still need to be documented, and then I file that index record in the system. I’ve been doing that for several years now, and I’ve already benefitted from it. A good example is this discussion right now. For a while I had my thoughts on that matter scattered all over—but because I’ve periodically revisited it due to forum interest, or been requested a list of links to the topic, I now have extensive index records and search history files on this topic. I can pull up lists with dozens of articles, e-mails, forum posts, and rich back-research into earlier attempts that I’ve voluntarily explored in free time. I now have a very broad and clear understanding of my efforts on organisation, thanks to the organisation system itself. Sometimes that’s very simple. For example, I remember some idea I had about the philosophical properties of beauty, look for it, find it in an old 2004 transcription from shorthand on a bus commute, so I write a new article back-referencing these sources with my updated conclusions. Now there are two points of discovery on this topic in my archive. It’s now twice as easy to find the thoughts. If I happen across the original (which doesn’t forward-link due to my immutability guideline) I can merely search for its ID—and the more modern file comes up because it used the original’s ID to back-reference.

So my refinements have been to methodology within the structure itself, not to the structure itself.

Maybe it’s a question of how one experiences time (I’ve read once that there are huge differences among individuals). I, for example, have often difficulties to remember when a particular event took place – I often guess even the wrong year (and if asked how old I am I often have to calculate!). This is one reason why I keep a diary – I have to … :blush:

So, obviously a method based on “when might I have worked on this subject?” is not for me. What I remember well are specific words and phrases in a text I am looking for, so Spotlight is my friend (and in older days on Windows I always had tools for full text search, in folder trees or on the whole hard disk, if necessary).

On the other hand, when I move my attention towards a specific project, time doesn’t matter. My novels grow over years – at least six in most cases, but I’ve written novels where the original idea was thirty years old (and was stored in a notebook, of course, not in a computer, because I didn’t own one then) -, so what matters when I open up the folder is what I have collected there over the years, snippets, ideas, first concepts, raw scenes, character studys, a lot of notes, a lot of files. I simply don’t care WHEN I created a certain note; I only care for how USEFUL it is.

So, I couldn’t live without folders. The only question for me is how to arrange them in order to make the most of it.

But everybody’s different, of course.

many thanks AmberV and AndreaS.

I’ve found both your comments insightful. The main point is really creating a system that works for you and this forum has helped me extensively.

I much prefer a folderless system. Simply because I have found that the more folders I have the more there are to search when it’s not where I thought it was! I also love the chronology. I do have mental block where I cannot remember when something happened, but I do have a rough guess and this always narrows down the field of search. The chronology also helps things to bow out gracefully.

A couple of years ago I ended up in a situation where I have 3000 .doc files that need conversion to rtf. This started my search into non-proprietry formats for longevity. Markdown and MMD have been tremendous helps on the way and I tend to create much in these formats. Conversion is simple as is creation. It lacks some of the complexities, but overall I find we sometimes unnecessarily complicate things. The amount of files I have that I am no longer able to access as the programme no longer runs on my system and the upgrade price is too extortionate has pushed me in the direction of open formats. I had never really thought of open formats in terms of file storage, even though subconsciously I had an uneasiness in putting all my data into DTP. This system will work until I cease to be and therefore gives me a greater sense of control.

It does however take a great deal of reflection to come up with the method that suits you. Anyone reading this far into the thread should beware. There is lots of superb stuff here and I hope it will grow. However, these are the pieces and you cannot use all of them so much choose wisely and put them together in a way that works for you.

AmberV, once you solidified your system, did you then slowly go through all your documents and apply the system recursively?

Also, the thing that bugs me with gui searching is that folders are treated as separate items. Is there a better way to search? You mentioned grep at CLI, but what about in terms of a gui?

Hi Guys,

I’m now sorted with my filing system. However, there are a few further questions with the process of filing.

My unique file id is created with Typinator which replaces some bespoke text with yyyymmdd-hhmm. This generally works really well.

The only snag I’ve hit is email. I’ve decided to save them all as rtf. I did attempt to save emails with attachments as rtfd, but mail.app seem to have a problem doing this. My problems are:

  1. I receive a high volume of emails. If I’m out of office for a day my Typinator unique id fails as it only inserts the current date and time. Is there anyway with Hazel (or something similar) to drag a file into a folder and have it rename it with a unique id derived form the email’s arrival time? Even dealing with an email an hour later will throw out the unique id that is an hour out. I’d hate to have to hand type them all.

  2. How do you file attachments? I am treating them as separate documents and using my filing system. However, it would be useful to know which email they came from.

looking forward to the your collective wisdom!

That’s a good point as well; and in conjunction with that, my methods of keeping a thing relevant throughout longer periods of chronology by referencing my searches and annotating my thoughts on older thoughts helps things to keep from bowing out if they shouldn’t.

This is all based on the original concept I came across, from the literal box of index cards method that was in part an inspiration. Whenever you pulled a card out of the box to look at it, you would put a little mark in the upper-right corner… up to five marks; but then replace it where you pulled it from (so as to preserve chronology). So frequently referenced cards would stick out in a stack by having little blots of ink. The more blots, the more you’ve looked at the card—really helps to find things you go back and look at frequently. It’s a decent method for analogue, but has its limits. For one, a thing might be frequently referenced in a 60 day period but then never used again, so its frequent-use status in that 60 day period artificially inflates its importance 200 days later. The marks never go away.

Using an internal more-cards-on-cards method embeds the frequency patterns into the system itself. However if someone liked the marking system, they could use star ratings in organisation programs that offer them, to emulate this.

There are other methods for keeping things visible even as they get older. In Boswell I would keep a “Present Tense” notebook with aliased entries, then just remove the entries when I was done. It worked okay, but I wasn’t 100% satisfied with it for reasons I could never pinpoint.

For everything else; the flood of things that rarely ever get used again—yes it is nice to have a system that naturally pushes them out of sight due to the physical layout of a stack. Reverse sorting is useful here. I always sort recent stuff to the top. Project focussed organisation requires further organisation to obscure old things. You must go back and “archive” stuff deeper to hide them. I don’t mind that terribly, but it is one more thing to do on a regular basis. That’s probably what I didn’t like about the Present Tense notebook. I’d rather design a mechanism that naturally degrades visibility due to disuse using plain-text/filesystem methods.

Hmm, something to think about. Maybe Lion’s new “Arrange By” feature could be useful here. It does have a “Date Last Opened” option.

I go back and forth on that. For a long time I was pretty ruthless about retrograding old files with modern meta-data methods. The bulk of the basic up-front system was extremely easy to do. For 99% of the legacy stuff, I was able to just do a batch re-name on everything. I made a script that checked the created date; converted that to my ID; and then appended the old filename to it. Yeah, it doesn’t have the SuperCategory stuff, but date and name is fine for all of that old stuff. After doing that, I could just easily go and cut the list into year/Q folders. So that was maybe an afternoon of work including the script design, and now I have all of that old stuff just about as accessible as everything else. It just doesn’t respond to my token searches—so that is something I am indeed gradually working backward on. I’ve gone all the way back to early 2008 (final system wasn’t solidified until Jan 4 2009), but of course the further back I go, the less relevant most of these files are to me today, so there isn’t much urgency.

What takes up most of my time, actually, is transcribing paper journals. I’ve got stacks of stuff that I wrote on busses and trains. For about eight years I had a bus/train commute that was anywhere from forty-five minutes to three+ hours long depending on where I lived—and I nearly always carried a Moleskine and pen with me—the amount of material I have left to transcribe is phenomenal. That is a slow process, and new material still accumulates from that vector, as I still walk around with pen and paper in the afternoons and weekends.

I said I go back and forth. Sometimes I wonder if the preservation of technique itself is not a useful thing to keep around. Maybe I should just leave these old files in their organised state, using whatever I used to organise them back then? The manner in which my organisational methods has shifted over time is in itself interesting information. But I think ultimately I would prefer to just have everything as accessible as the modern stuff. After a certain point, like I say, it becomes less important. Some old thought from 2003 might be interesting in a “Hmm, my how I have changed” kind of way, but I usually only ever go through that stuff in a nostalgic sense, so browsing works fine. Like I say, it would be nice but it isn’t necessary—so retrofitting is a very low priority thing for me.

You mean like in Spotlight? It should show you the full path of any selected item in the footer. If not, maybe it needs to be turned on. But is that a problem if you are using chronological folders? Path data is kind of redundant if so, since it is duplicated in the first bit of the title. The only purpose for paths is to keep lists shorter and provide a for a more fuzzy access method when the precise date is foggy in memory. Path data would be more important if categories were in the path and not the name.

Perhaps I am misunderstanding your question, though.

On the second part, I would know better if I knew what you were going for. grep is a pretty comprehensive search tool, there are a lot of things it can do—and likewise there are a lot of different GUIs out there with overlapping qualities. Are you specifically looking for a more literal search pattern ability, for instance—or the ability to get a list of filenames along with match context (like Google does)? Those are just a few examples.

That’s what I do as well. Remember to give them unique IDs (increment the seconds artificially if necessary). Since you are referencing them from the source document using that ID, that means you can look at a list of attachments, read the ID, then search for that ID to reveal the documents that refer to it. This is what I was talking about to in an earlier post, about how backward-cross-referencing is useful in both directions since the ID of the reference target is used in the link text itself. A search for that ID returns both the target and the previously unknown source (and anything else in the network of relationships which deal with that file).

Hmm, I don’t have a good tip for you offhand with this one. I rely on my custom scripts to generate the ID in the first place, so I simply have built the scripts to take a date and time string to generate an ID for something other than Now.

Does Mail set the file’s created/modified date to something useful in terms of when the message was sent? If so, check out Hazel’s renaming abilities. It can rename things according to filesystem dates—and has extensive date formatting controls for doing so. You would probably want to set up an “Incoming” folder that Hazel monitors, which performs this rename. I don’t think there is a way to search for whether or not something already has a certain pattern in the filename and ignore them. You wouldn’t want Hazel to be rewriting your existing stuff—plus if you are using chronological folders, you’ve have to remember to keep transferring your Hazel target to the latest.

So far I’m extremely happy with the system. My earlier fear of not finding what I need have passed and made me realise that actually finding stuff in the old system is extremely cumbersome!

My job is a church minister. I have a lot of sermons, teaching material and then various administrative and operational documents as well a meeting minutes etc. In terms of searching I really need to be able to search within a document as well as the filename. This has become clear with the new filing system which allows me to find most things just by targeting the filename. However, some of my documents are up to 100 pages (A4) long. It is useful to search this for references to material I am currently writing, but some old material may have a bearing on this.

May computing skills are fairly good, although I been putting off learning a shell. I have used the shell many times, but more in following a tutorial doing something specific. It has highlighted the power available there that seems to be lacking in the gui. I also toyed with the idea of learning applescript, but again not quite got round to it. Reading your post it seems obvious that your coding skills allow you to write scripts to automate your system. Is there a single language/shell that is worth investing the time into to obtain this functionality? Over the years I’ve purchased programmes that do this and that; like DTP, Leap, Yep; and liked the idea, but hated the restrictions they create. I should also mention that too many programmes make it a major hurdle to get your data out. I’ve often wondered that if I learned a relevant coding language, I could do the stuff myself how I wanted it.

I had this issue, but decided the best option was a sharp implement and a scanner. Pdf the page and add it to your system. It obviously won’t allow you to search the text, but should allow you to find the document and view it on your mac.

I’ve spent a considerable amount of time trying to find the most efficient way to collect information relevant to me. This has led to the fact the computers for me are just not fast enough. I know markdown and have note apps on my iphone, I tried morse code with iditdah, I connected a bluetooth keyboard to my iphone which made it ultra mobile and was about the closest I came, but in the end realised that pen and paper just could not be beaten. The livescribe pens are interesting, but are not cheap and you have to use their software. I came across Teeline, which is a shorthand used by many journalists. This can give you speeds of up to 150 words per minute. With a two month learning time. This was brilliant. It allows me to go fast enough to capture what I need without lagging behind. I use notebooks with perforated pages so I can tear them off nicely and scan them in. It also gives me an added advantage that most people cannot read what I’ve written and gives me some privacy with sensitive information.

Thanks for the info on Hazel and Lions new “Arrange By” feature. Hazel has proved extremely useful in the past. I’ll need to take another look. I also like the idea of incrementing the time on attachments. It should place them next to the email that they came from. Interestingly, Postbox allows you to save emails as text and writes in the text the attachment name. However, this is not all that useful as the attachments rarely come with a name sufficient for filing.

Would I be right in imagining that when you file your documents, even though you use a script that the minor category and key will always need to be entered by hand so to speak as they are specific to the document and not easily automated? I suppose it would be possible to create a number of hazel folders and each one adds the relevant super, minor and key category automatically so that it just depends on where you file your document. Once renamed it could automatically be moved into the current year/quarter folder.

You may gather that my goal as ever is zero administration!

Should add that mail comes with this date header: 24 August 2011 22:29:25 GMT+01:00

Renaming a file.

I’ve been trying to find a way for a file to be renamed according to the date in the document. I have no problem with creating my own documents and adding the current date and time at the point of creation. Where I’m struggling is once again with emails. I can create a Hazel rule that will append the filename with the date. The only problem is that I only get to choose from date created and modified and these aren’t the actual date I received the email, but the date I saved it. I need the date within the email. I save my email into a folder as an rtf. This saved file automatically has the date I received it in the document at the top under “Date:”. How to I automate the retrieval of this date and appending into the filename?

found a solution!

I’ve always flitted between the mac mail.app and Postbox. Postbox now has add-ons (extensions) and I came across one called ImportExportTools that import and exports your email messages.

The add-on allows you to specify the filename structure and more importantly adds the email received date into the saved email properties. You can save the messages as text, html, eml or csv. It also has the option to create an index of your messages as an html page with links to each message. You can also save them into sub-folders if you so choose.

What I do is save the emails into the designated folder using the filename options using the sender/recipient option followed by the subject as html files. I then have a Hazel rule set that then adds my id based on date and time which is now the date and time the email was received. I also add my -M-EMAIL- tags and tell Hazel to make sure everything is title case.

This churned out 400 emails in 5 seconds!

Now onto the file attachments.

Should anyone come up with a better way, please do let me know.

I’m entering this conversation late. I’ve read several entries, but may have overlooked a thread. If so, please forgive the redundancy.

For reasons that others have stated better than I could, I am attracted to the whole metadata-in-the-document approach, rather than using a separate database, or workarounds like Spotlight Comments or OpenMeta, which might not survive migration. Also, I need to include more data than can practically be included in the filename. I can see how embedding metadata in the document would work for documents that I can edit. But how does it work for webarchives, PDFs, and image files?

Never mind. I just found this http://literatureandlatte.com/forum/viewtopic.php?f=19&t=8348&p=68250&hilit=taxonomy#p68250.

I recently had a go at freeing myself from a long - seven years? - reliance on Devonthink and tried managing everything in the file system. I’m a journalist, always with at least 5-10 projects in my “Shipping” folder, plus a constant inflow of new possibilities and a comet trail of loose ends and prospects for information recycling.
When my main DT database started taking a minute or so to open (an appalling waste of time!), I decided to try for a cleaner, purer form of information management. I’ve always been attracted to the ideological purity of managing info in the Mac file system, and Ioa’s posts on the suject in this forum offer an attractive picture of structure built from chaos.
Turns out that I’m just not that committed. I didn’t go to any lengths to build a file taxonomy, partly because I’m very project-centric and projects fit into folders, and partly because I’m lazy.
Last week, I purged my old, fat DT work database and renamed it “Archive”, put anything that was current into a clean new database, and returned with a feeling of vast relief to Devonthink.

  • I really missed DT’s collection tools. I made some Keyboard Maestro macros that clipped mail and webpages with URLs, titles and keywords, but then had to think about where I put the clips in the Finder. DT pops up a folder list and I can file straight to the project. (Ioa’s One Long File List + file taxonomy addresses this problem, but I like folders. They suit the way I have learned to work.)
  • I often clip stuff that may be useful in the future, or information that updates a past project. In the Finder, I shoved this in a collect-all bucket which I would occasionally purge. The purges became less and less frequent, because I dreaded working out where all this miscellaneous stuff should go. In DT, it’s easy - I ask DT’s “See Also & Classify” function, and either get pointed to where related data lies, or learn that there is no related data and it’s time to open a new folder.
  • I missed DT’s search function, which seems to present results more intelligently than Spotlight. I usually find what I’m looking for at the top of a DT search list, whereas Spotlight often involved a treasure hunt or refining of search terms.
  • I found that Rob Trew’s applescripts provide better interaction between DT and OmniFocus than there is between OF and the Finder.

I don’t like having some of my stuff in the Finder, and some in DT. I don’t like searching in Spotlight and finding the hits that are in DT can’t be viewed with Quicklook. But for someone who is an information stuffer, rather than a careful filer, DT is a blessing.

DevonThinkPro has just been updated and the new version is fast. I used go check my email, make a coffee and walk the dog (well, OK, perhaps I exaggerate, but you get the idea) while waiting for my main research database to open. Now it opens in a couple of seconds. Same with starting the app. The first time I launched it after updating I almost fell off my chair. The difference is truly remarkable. Hats of to the Devon Tech crew for their behind-the-scenes magic. :smiley:

If only they would fix their tie-ins with OS X’s scanner drivers. I used to be able scan directly into DTPO, but since reinstalling a couple of months ago scanning is busted (but only busted in DTPO - everything else is fine). Reporting it is also the only time I have received poor service from DevonTechnologies - usually they are very polite and very helpful but this time I was just brushed off. Oh well, the software is still great, it just takes a few minutes longer to scan long documents.

I’ve been adapting my system over the past 8 months. I think all systems adapt. I’ve sorted the filing of all my new material, but some stark changes have crept in that have surprised me.

  1. Email is no longer archived to HTML/RTF. This became too cumbersome. I now use mailsteward to archive all my email. No more deleting. It archives every two hours and can handle hundred of thousands of emails. If the database does become to cumbersome I can archive the database on a date basis.

  2. There has been an ongoing argument about being locked into a proprietary format. I’ve been concerned by this as well and over a year ago switched to markdown and multimarkdown. The problem with these has been the lack of a bespoke editor and once they arrived getting your text properly formatted for print proved quite a drawn out process from converting to pdf and printing. I then stumbled on an article which I cannot remember where it was that basically stated that MS Word was a good format. The reason being that so many people are using it it won’t die. Interestingly enough, I have some 5000 Word docs, some older that 15 years and they all open and display properly in the latest version of Word. Not bad really. I know folks will argue about the cost, but there are more than a few open source projects no longer being developed and leaving people sitting high and dry. I now write all my text in MS Word. It’s one programme I am happy to pay for.

  3. I also use Devonthink PRO Office. It’s just a superb programme for collecting snippets of information and tagging them. Once again, you could use tags on your own documents in your file system, but this becomes to cumbersome for me. I also use Hazel (Mac OSX) which is a handy little automator so that I only have to drag a file into a folder and the date is added to the file name.

  4. I also adopted AmberV’s 30 day folder, superb idea (thanks Amberv). When I create a document it is dragged into a “Add_ID_to_Doc” folder. Once the item enters the folder the filename is automatically amended to include the date-time stamp at the beginning of the filename (via Hazel). The item is then automatically moved to my “30Day” folder (also via Hazel). I then have a Hazel rule that checks to see if the item has been modified in the last 30 days. If not, it gets archived in the relevant folder (Year/Quarter). The beauty of this is that I only ever need to work from my 30 day folder and anything I have finished working on automatically disappears after 30 days. If I need to edit a document that has been archived, I create a copy and place it into the 30 day folder.

  5. One final thing that caused problems, was the creation date of the item. There are a number of applications (Scrivener being one of them) where if you create a new project from a template the new project inherits the date the template was created. Scrivener is not alone in this. This really stuffed up my system until I came across File Multi Tool 5. This app lets me alter the creation date to whatever I want.

The system is not perfect and needs more tweaking, but is holding up well so far. The only problem I have yet to conquer is filing stuff that needs to be filed via category not chronology. I teach and have a lot of teaching material. I cannot file this in my chronological system, as I invariably need to reuse it and need all subjects that are the same in the same place. If anyone has any ideas on how to effectively do this I’m all ears!

Still my number of folders is gradually reducing!

Thanks for the comprehensive description of your workflow. Two comments.
For an archiving format, you don’t need to pay for MS Word in order to save documents in Word’s .doc or .docx formats; Pages or TextEdit or Bean will do that, the last two at no cost. I use txt and rtf (which was developed by Microsoft and hence presumably carries the same durability as .doc) myself; it’s hard to imagine text files ever being unreadable in my lifetime.

As for filing by subject rather than chronologically, I faced this issue in my biography, which is mostly organized by year, since I’m telling the subject’s story roughly chronologically. But of course his life contains issues that transcend a given year. This is where Scrivener is so helpful, because you can use keywords to tag files by subject.

A number of free and cheap metadata programs allow you tag files by keyword in the Finder, and of course you can do the same thing using the Get Info command to type in subject keywords in the Spotlight search field. I don’t use Devon anymore, but I recall you could create replicants or some such and I would make several if a given document implicated multiple subjects.

I keep my original documents in the Finder and import them into Scrivener by project, so I use the Finder, not Scrivener, as the original database. As to how to file the originals in the Finder, what I did was create folders by topic for the non-chronological bits, then I tagged each document with additional keywords. The key then is to make sure you search using spotlight or whatever and don’t just rely on the Finder folders you’ve set up.

By the way, you may want to look into the free Notational Velocity, which purports to eliminate the need for such folders entirely by relying entirely on powerful search capabilities. I’m still folder centric and don’t use it but I’ve been tempted, and I believe Amber has written about it here. The idea is that powerful search capabilities (Spotlight, EasyFind, NV) and smart folders (created via search) replace the need for a folder based system. I guess I"m just too old fashioned and unadventurous to abandon my folder based system, but I don’t rely on it exclusively anymore. Search has really changed everything.

I have more recently started using a “Fields” meta-data key in my documents, where relevant. FIelds are basically anything describing why the document exists. In my opinion there are two primary things that describe the state of a document: what and why. What is it? A quotation; an article; a torn napkin. Why does it exist? Road trip journal 2011; editing work for author X; research for book Y. Etc.

This is replacing my “FileUnder” meta-data key which I used only specifically for book notes. I would create a card for the book so it had an ID, and then any quotations, thoughts, follow-ups and notes that arose as a result of reading and researching the book would get “FileUnder” this book ID, so that searching for the book ID would yield a result of all the items which arose because of it, and a dual axis search for What:Quotation Why:ThisBook would also be quite useful.

In the 09004 revision of the system, I had a bit of the wherefores embedded in the token. Some tokens were just all about the why. There was for instance an “event” category which had event codes as the final portion of the token. So if something was in relation to moving house in 2010, I’d be able to search for Moving2010 and see all of my notes and such for that move. But of course that raises a problem as it drifts back into the BigOldMassOfKeywords problem that I wanted to avoid in the first place. In some cases the What could be ignored if the Why was strong enough, but this is not always the case. Sometimes an entry needs to declare both equally, and that would mean two tokens in the Keywords line.

So the newer revision of the system that I’m working on now will reserve the token very strictly for What the item is. All Whys will be in the “Fields” key. I haven’t decided yet if Fields should be codified into a token as well, but I’m leaning toward leaving it like the book system worked. I really like how that is so dynamic and scalable. Defining a Why means making an entry to describe what that Why is, and making an entry means it has an ID, which can be used everywhere and benefit from the whole system like anything else in it. The only real downside is that the ID isn’t extremely memorable. Typinator comes to the rescue for current events. I can make up codes like @lnl-site, which expands to “id12003990” which is the Why entry for fixing up the Scrivener web site in quarter one 2012, but “id12003990” isn’t very obvious on the entry side of things. If I come across entry X and wonder what Field it is related to, that doesn’t tell me much without a secondary search. Of course that is the case with this system in general, it is very search dependent—but for something like why an item exists at all, that is perhaps a little too opaque.

Of course, my linking syntax does accommodate an annotation field, so I could just type the name of the Why into that field after the ID. I’ll have to try that on for size and see if it is too cumbersome. Typinator would make it easier for sure.

Regarding a bespoke editor for MMD: Have you tried Fletcher’s own MultiMarkdown Composer, yet? It’s still early-bird software. There are some glitches and features are still being built up, but it is already a very nice editor for MMD. Programmable stuff aside, it is close to using the TextMate bundle in terms of ease of use. Lists just work they way they do in a word processor, for instance.

I myself would never archive anything in a proprietary format. Don’t forget that at one point WordPerfect was where Word is now—total domination with no reasonably predictable horizon for continued compatibility and readability. True, it’s a totally different world than it was twenty years ago, in terms of word processing and computing in general—there is much more inertia now that there was—but who knows what will happen. I would trust RTF much higher than .doc/x, as it actually has an open specification complex though it may be, and TXT above all else. But, I’m very cautious. I have lost too much data in the past to formats that no longer can be read, and it might be that isn’t as much of a problem any more due to how many people use computers now, in relation to how many people used computers in the middle eighties. Those old formats, even the most successful of them, would count as fringe formats when compared to Office formats today. Despite that, with open formats emerging in the early '00s, that is where I philosophically drifted and I really don’t have a huge reason to change that now. I like that I can open an entry on anything electronic that I own from my AlphaSmart to my Kindle to my iPod. You just can’t do that with .doc/x or .rtf. Universally accessible data is more important to me than formatting. For me that is almost as important or maybe even more important than the 20 year question, at this point. I think in 20 years formats will be less important than they are now—but I don’t want to bet on that, so it is still important.