Large-Scale Info Managers and Scriv

The name of mine has no date. Weird.

I’m glad you found it useful. I’ve just begin to experiment with doing the same without DTP, i.e. taking notes in Scrivener directly from a PDF, so I’ll have some more comments in the next few days.

Cheers!

Your doubt is justified. I’ve just updated to 10.4.10 and tried dragging some plain text files, both imported and created, from DTP to Scrivener.

No problem. The text files land in the Research folder just as they did under 10.4.9.

Cue X-Files music.

Okay, so I still don’t know why I can’t do the same. I haven’t experimented with it yet, but at some point I will. There has to be a reason, of course. But glad to hear it’s not an OS issue.

Also, just to note, I always do my pdf note-taking directly in Scr. With split views and all the note options, it’s very efficient, at least the way I work it. I’ll be interested to see how it works for you, AJ.

TE, I’ve seen no slow down re number of files or file sizes, save the normaly rendering slow downs with long web archives I see in any program. But with using Corkboard to preview files, etc., no slow downs as yet, and I have some pretty huge collections of research and development materials in Scr. project files.

Alexandria

@AndreasE: Do you have more infomation on Niklas Luhmann’s mehodology? How did he cross reference material?

I too own askSam. I bought a Mac a couple of months ago and I started a search for something similar to askSam. I’ve ran into some interesting programs in my search, some of which I think I’ll give a chance (Like Journler and Scrivener).

I guess the main problem I’ve ran into is that I don’t want to get locked in with any one program that captures my data in their own proprietary format. I want the ability to transfer my data to a different program or even to different operating system. I haven’t found anything yet that I’m sattisfied with. Like you mentioned, I’m thinking of just using text files and devising a method to search through them. (I’ve also started taking most of my notes on 3x5 index cards).

–Carlos

For years I have used my own text-based system for cross referencing because I’ve never trusted a vendor to stay alive in the term of decades; nor (and perhaps more relevant) do I trust myself to stick with one program for that long. I love to program hop, and having a portable, ultra-safe way of storing meta-data and links right in a plain text document has been invaluable.

That said, many of the techniques I have honed over the years have, in the past year, completely disappeared in my work. I am now exclusively using the MultiMarkdown syntax for all cross-referencing, footnotation (that should definitely be a word!), and semantic structuring (saying: This is a title, and this is a sub-section header, for example).

There are two big reasons for trusting MMD to this.

The first is that it is entirely text based. Plain text is going nowhere in the foreseeable future. While there are plenty of “mark up” syntax for plain text, many of which are much more powerful than MMD, the latter also has the benefit of being easy to read by a human. It’s syntax frankly doesn’t look much like syntax at all (with a few exceptions). It looks just like somebody typed up a document and used regular punctuation marks to “pretty up” the text document. Things like putting stars around a word to emphasise it, or a number in brackets for footnotes. So, even if for some reason MMD no longer works in the future, these documents will still be useful; and not only that since the format is so simple, a knowledgeable person could easily “re-build” the MMD engine without too much difficulty.

That moves right into the second reason why I trust it: The technology that it is based on is extremely stable, and based on the underlying industry standard methods that are open source and have thousands of people supporting them. The Perl language, which much of the pre-processing is done with, is going nowhere any time soon. The post-processing is all done in XML, which also is going nowhere. That might all sound like techno-jumble, so to put it into different terms: The types of technology it uses to function are the types of technology that applications themselves use to function. While applications may come and go, the languages beneath them evolve much slower, and last for many decades. It’s the libraries and hooks into the operating system that tend to shift around a lot faster, as any Cocoa developer can attest.

Those are the big reasons. The one big drawback is that there is no easy way to link between documents using MMD (not to be confused with Scrivener documents, as in the end those will all be combined into one document). It can definitely be done, but it requires absolute linking, which is less reliable than in-document linking. An absolute link says “look precisely in this location for the document”, where an in-document link says “look for this identifying name”. As long as you do not shuffle your Documents folder around a lot, it is pretty safe. And to be perfectly fair, this is not a problem exclusive to MMD. Many of the proprietary linking methods used in modern applications will break if you move the source document. The difference between the two is that the latter may or may not be easy to correct, while with MMD the source link is right in front of you, and can be easily corrected.

So how to use MMD with these other applications? Simple, just store all of your archival documents in whatever database or application you prefer. Since MMD files are plain text, they will be universally supported by any sane developer. You can even combine techniques if you have the time. That was what I do with Scrivener. I use MMD style cross-referencing, but then take the time to use Scrivener’s internal linking for future convenience. I get the best of both worlds. One-click cross-referencing, with a fall back method that will last decades. Some programs, like Scrivener, even have built-in support for the MMD format. Hopefully it will become adopted on a wider scale in the future, but an archival application’s lack of ability to render a pretty version of an MMD document is by no means a deterrent for storing the source document within it. Finally, because the syntax is so clear and simple, it will rarely mess up archival applications own built-in indexing methods, where more intense mark up syntax might render search indexes useless.

What do you think of the markup language Textile? Headings are signified by h1., h2., etc, up to h6., so you don’t have to count hashes. Does tables too.

Between Textile and MMD, strictly going by how readable the syntax is I don’t have a huge preference either way. They are pretty pretty legible to me, with perhaps a slight edge given to MD, and counting hashes isn’t something I notice until the header level gets pretty deep. For (my) normal usage of up to three levels or so, it is pretty obvious at a glance. There are times when I’ve wished for the h#. designation though. :slight_smile:

MMD does tables as well, while straight MD does not.

MMD’s advantage for me is how many formats it can export to, out of the box, and how simple it is to modify an export format to suit your specific project, or to even create a new exporter entirely. I made a modification to one of the LaTeX exporters to use a manuscript class in a half hour, a BBCode exporter for posting documents to the forum in about a day, and lately I’ve been picking away at a nearly full converter to Mellel’s document format (full support for lists, images, tables, header levels, annotations, footnotes, and so on). As far as I know, doing this would require a lot more work with Textile since it doesn’t have a post-processing stage built-in. I could be wrong about that though, as I’ve never looked too deeply into that end of it.

But either one is a good choice for many of the reasons in my prior post. Both official distributions are based on solid technology, and both have huge community support bases with many implementations written in even the most modern and trendy scripting languages. MD seems to have a slight edge on the authoring side, at least on the Mac, and Textile seems to have a slight edge on the webapp side. It seems like more blogs offer Textile, or Textile compatible, by default.

And of course, being built into my primary writing application is a big plus, too. :slight_smile:

I can understand that. :slight_smile:

Thanks. I didn’t know that.

Yes, its export list is impressive. It seems to me there’s an opportunity to script a similar transformer for Textile. PLextile, which can output to RTF and PDF, looks like a start.

I have some texts, but they’re all in German. And even I do not really understand how he managed it.

What I know: He used pieces of paper, the size of a postcard (simple paper most of them, not even cards), and usually wrote only on one side. Every card got a number, but he used a very elaborate numbering system that led to numbers like 21/3d26g104,1 for example. But it seems that he had not a numbering system like the Dewey decimal classification, but that he instead expanded his numbers in a rather deliberate/chaotic way: For example, he explained once that if he had a sheet with the number 57/12 where he noted down something (a thought, an idea, a citation, an excerpt from a book), the next sheet got the number 57/13 and so on, but then, when he felt the urge to add a comment to sheet 57/12, he simply gave the new sheet the number 57/12a, continued with 57/12b - or 57/12a1, if he felt like.

And of course he linked entries to others, by their numbers. So he had one note, that led him to several others etc. - But what I do not understand is how he was able to find these other items. If he had jotted down a commentary to X and had the idea, “well, this has a relation to Y” - how did he know which number to insert? He must have had a kind of index in addition to his collection of notes.

I guess what made his system valuable was that he kept with it for an incredible long time - his whole life, practically.

My impression is that the askSam of the Mac-world is DevonThink.

I have everything as textfiles, html-files and the like in a huge file system, a tree that goes down from one main header named “InfoBase”. And I simply use what OS X Tiger provides - Spotlight and smart folders - and I am delighted how easily I find everything. Plus I’m safe, TXT and DOC and HTML will stay accessible as long as we’ll have computers in our lives.

Hi,

these seem to be the two most important conditions to set up a valuable information system: consistency and an always readable format.

I am too lazy to keep up a well designed system over a long time, so I have to keep it as simple as possible. I thought DT was the solution, but it is still closed – toward the Finder and towards other DT databases. So I only use DT to clip web pages or download parts of sites, which are all exportet as soon as possible. Like Andreas I am now only working in RTF, Text, HTML and PDF, and I retrieve my data via Spotlight. This “system” works well since February. Pictures are catalogued in Aperture with IPTC tags only – so I can take a small computer to do exactly the same with GraphicConverter when on travel, Spotlight can read them. In addition I use NiftyBox to assign tags to the Spotlight comment field which I have set up in a data model that fits my needs. These tags are particularly necessary when working in a multilingual environment. It works great, not much additional work.

I wish I could recommend DT, it has so much potential and is so reliable.

Best,
Maria

Hi,

these seem to be the two most important conditions to set up a valuable information system: consistency and an always readable format.

I am too lazy to keep up a well designed system over a long time, so I have to keep it as simple as possible. I thought DT was the solution, but it is still closed – toward the Finder and towards other DT databases. So I only use DT to clip web pages or download parts of sites, which are all exportet as soon as possible. Like Andreas I am now only working in RTF, Text, HTML and PDF, and I retrieve my data via Spotlight. This “system” works well since February. Pictures are catalogued in Aperture with IPTC tags only – so I can take a small computer to do exactly the same with GraphicConverter when on travel, Spotlight can read them. In addition I use NiftyBox to assign tags to the Spotlight comment field which I have set up in a data model that fits my needs. These tags are particularly necessary when working in a multilingual environment. It works great, not much additional work.

I wish I could recommend DT, it has so much potential and is so reliable.

Best,
Maria

Interesting, Maria. That is more detailed than I’ve heard before about your ‘new,’ post-DT system. Sounds appealing and simple and not dependent on any program. With this system, it would be much easier to spread research material around to different Scrivener projects and other programs as well. I already do this with my pdfs. Hmmmm. I have been waiting to see if the DT folks are going to be coming up with v.2 any time soon, but nothing yet. Also, I’ve seen some reports about Leopard and increased search capabilities, etc. May make DT even less relevant. When I have the time, I’m going to look into this more. Thanks!

Alexandria

PS Dang! Nifty Box is really interesting! I’m liking this idea more and more!

Tagging and Storing and Spotlighting and Metadata and Databasing and Context oh my! (crossposted in the Zen of Scriv. forum)

First, let’s begin with a tautology, as evidenced by the four pages of forum commentary preceding this. There is no one right way to organize your data. Say it with me now: “Duh!”

That said, I think we can gain insight by looking into what makes a certain type of program appeal to a certain type of workflow. So, here’s a breakdown of what we need from our programs, extracted from the prior discussion. Note: Almost all of us seem to use a combination of these tools, but the most important is the FIRST interaction, basically what you do immediately following acquisition of a piece of data.

I’ve broken this post up into a few sections. First, we get the basics, a by-no-means complete listing of the ways we interact with our data. Then it’s on to the programs, what does what well, and which ones seems to appeal to certain types. Finally, we get into a couple example workflows, culled from members of this forum, with a bit of my analysis following.

Important note: I may be making an artificial distinction here, but I’m choosing NOT to discuss database vs. plain folder structure, durability of information, or any of that stuff. That seems to go more towards how we deal with data in the abstract, rather than how we deal with specific pieces of incoming data, and I like to think it’s beyond the scope of this post. However, I do understand that things like compatibility, freedom of access, and all those other things do play a role in the decisions made, so maybe I’ll get into those later.

The Basics

Tagging: One of the most obvious tools, it allows us to attach keywords to documents, crosslink, and set up a sort of complicated flowchart of connections amongst our stuff. Seems to appeal to folks who accumulate huge amounts of information and need a flexible way to organize it, without losing sight of the whole. It lets us put a simplified abstraction map over our stuff, so we can find generally what we want fairly quickly. However, tagging is vulnerable to proliferation, in which each set of new data gets its own tag, such that the abstract map begins to become just as complicated as the underlying data structure in the first place.

Searching: Be it Spotlight, Devonthink, or cmd-F, we need a way to hunt through our stuff. Tagging tackles this problem by creating another layer of abstraction that lets us search for concepts, if not specifics. Searching lets us dive through the nitty-gritty files. Also is infinitely extensible, with larger collections of data merely taking slightly more time to search. Seems to appeal to folks who need to find a specific thing in a specific document. This requires the least user-interaction at acquisition, but falls through if you find yourself doing the same searches over and over again.

Smartfoldering/Saved Searches: This is for those exceptionally dynamic data collections, when wehave so much input we can barely remember what went where and when. Less flexible than tagging, but also requires less effort on the front end. On the downside, sometimes things just fall through the cracks.

Taxonomy: This is the most basic tool. Get document, stick in folder. Works best when your data is easily classifiable, and each subclassification contains a relatively small number of documents. Lots of documents, folders, etc. sure, but it follows the handy tree structure, and it seems to be the most easily integratable (is that a word?) with the other techniques. Also, this is the one we’re ALL familiar with, from years and years of using computers with folders within folders within folders.

The Programs

There’s been a whole BUNCH of discussion about the pros and cons of each program. What’s included here is by no means exhaustive – for that, use the handy search link at the top of the forum – but it is meant to provide a brief overview in the context of the previous section.

  • DevonThink. The 800-pound gorilla in the room. It’s lightning-fast, has ludicrously powerful searching, as well as a built-in AI that works to classify documents on the fly. Primarily, DevonThink seems to cater mostly to the taxonomy/search crowd – what with its lack of tagging (which is hackable, but far from ideal), relatively spartan interface, and fairly limited smartfolder support (again, hackable, but not ideal), DevonThink attempts to make up for in speed and “intelligence” what it lacks in breadth. The classification features seems to work more or less like an on-the-fly, adaptable tagging feature. Rather than finding every document tagged X, it goes by context, says, “all of these are related,” and lets you put the label on it. On the downside, DevonThink is SO keyed for this type of large-scale that anything less feels a bit like shouting into an empty room. And it has the most powerful searching of any program I’ve ever seen, what with its fuzzy searching, phrasing support, and everything else, it makes spotlight look like sorting through a card catalog.

  • Journler. This is an odd duck. It’s half entry-based journal, half DevonThink, with a few neat features tossed in that makes it pretty good at lots of things, but not quite as powerful as others. Journler supports tagging and smartfolders fairly well, though for some reason its smartfolders won’t search WITHIN added documents. On the bright side, you can have multiple documents within a given “entry,” which allows for closer grouping of specific things, as well as a handy, attached place to leave notes without creating a whole other hierarchy just to handle it. Other positives include easy customizability of icons, perfectly usable taxonomy tools, and a really, REALLY cool and innovative way of using nested smartfolders as an implied AND operator (the top smart folder only includes items found in the lower smartfolders). The entry- and date-based structure gets in the way of the documents sometimes, though, and it’s not quite as speedy or robust as some of the other options. DISCLAIMER: I’ve decided on Journler for my work, as it fits me the best of what I’ve found. I have, however, tried every program I’ve listed here, and I’m trying to be as impartial as possible.

  • EagleFiler. This one seems more geared towards mass data acquisition and tagging. A large portion of its interface is devoted to rapidly tagging and sorting incoming data. But hey! It’s got smartfoldering, too. There’s not too much else to say, here.

  • Yojimbo. This is the simples of the bunch. Folders (but not nested folders!), tags, a few different types of entries to organize things even more, it tries to organize the data by type, and then, within those types allow for some level of organization. Personally, I’m not a big fan of Yojimbo, since it’s nowhere NEAR as powerful as some of the other options, but if you want something pretty and simple, with a heavy bend towards tagging, it might be worth a shot.

  • Nifty Box. Even more than EagleFiler, Nifty Box is all about the tags. The ONLY way you sort things here is by tags. Folders? Nope. You can have folders of tags, sure, but tags are the primary way you interact with your data. If that’s your game, more power to you! Otherwise, you might feel a little stifled with the lack of taxonomy features

  • Voodoo Pad. Almost no one on the Lit&Lat forums uses Voodoo Pad, and neither do I, really. It’s a wiki-based information manager, which lets you dump whatever you want into these documents and then crosslink willy-nilly. Think brainstorming, drawing arrows between and among everything you have, but with fairly little overarching structure. There are folks out there in the Mac community who swear by this program, but I’m not sure if it’s as useful for the writer-types who happen to frequent this forum.

  • Plain Text/MMD. This is both a technique AND a system. I’ve not employed this one, so I’m going to quote the redoubtable AmberV here, who continually makes me feel like I should probably go out and teach myself MultiMarkdown.

  • Folders & Spotlight. This one doesn’t need so much discussion – hell, we all use it daily. Its primary downside is that for all its power, Spotlight is woefully underpowered when it comes to searching. No phrasing, no operators, and it searches your ENTIRE HARD DRIVE. This can be slow. Scratch that, for anyone who accumulates enough data to need an information manager, is IS slow. But hey, it’s free, it comes with the OS, and it’s something we all use anyway. And it does work for some folks.

  • And, lastly, our favorite, Scrivener. We all know what this program can do – a friend of mine called it “the Aperture/Pro App of writing.” But for all its power, Scrivener seems more project specific – it’s all about the writing, not the research, and sorting through hundreds of folders and documents in the “Research” folder can be occasionally daunting. But since this is the reason we’re all here, I figured I’d include it. From my humble perspective, Scriv has just enough info-management oomph to cope with what’s needed for writing, but not so much that it feels bloated or can get in the WAY of the writing. For what it does, I find it more or less perfect.

The Workflows

To put the mass of the above into context, I thought I’d drop in a few workflows that highlight the various techniques in action. Because I’m writing this thing, I’ll start with mine.

First off, I’m a law student, so my world is all about PDFs and searching through them/extracting specific quotations. I don’t need to worry so much about crosslinking them, since cases on murder usually have relatively little to do with cases on government contracts (but not always!). So I tend to shun the metastructure imposed by tagging until later in my workflow, since the research itself tends to limit my intake to at least a RELATIVELY narrow field. Taxonomy works well here, with a separate folder in Journler (or DevonThink, or whatever) for each project. Searchability is KEY, and smartfolders can help too. This is where the entry-based structure of Journler tends to work out pretty well for me, since if I have a few cases on a specific item, I can group them all into one entry as “resources,â€

Brian,

wow, that was a thorough post. I took the time to read it carefully. Let me just add two details about tagging on the Mac:

(1) You mention the danger of proliferation.

That is true, one has to set up a structure, I e.g. group my tags into “projects”, “research topics” etc., and the tag’s names have a logical structure that enables me to identify project related tags from status or author or whatever related topics. I can “recycle” the same document for different projects without copying, just add another project tag e.g.

The reason I came to tagging with NiftyBox is that it allows me to keep the whole system consistent: In the first time I set up the system in a way I thought might be OK, but of course, it needed modifications. NiftyBox allows modifications while keeping the whole system consistent, gives perfect control over the process and reacts immediately. It is still not a mature app UI-wise, the next smaller update will not save all UI issues that would make tagging even easier, but the way it is it is already more useful and reasonable than any of the other Spotlight-related apps. I think I have tried them all.

(2) I have set up Smart Folders for all my tags and some useful combinations. The system is by no means slow – on a three year old G5 I get immediate response. The design of Smart Folders is not elegant or easy to use. But still I feel that now I have the best, most reliable, most complete information system I ever had. I find ideas of mine from 12 years ago and think “what a bright girl she was (JOKE!)” — I had absolutely forgotten so much that is now right at hand.

Best,
Maria

My concern with Spotlight-comment style tagging is that the meta-data is not actually stored with the file itself. I’ve had occasions where I’ve copied a file from one computer to another, and the comment disappeared on the second computer. Does anyone know how the comment is getting attached to the file, and why it would peel off in some circumstances? It reminds me of the old resource fork nightmares.

What I do like is how “cheap” it is. Pretty much everything you need is already on the system. Everything to make it easier (NiftyBox, Moru, whatever) is just an added convenience. It does mean you pretty much stuck with one platform, though; and potentially one computer?

AmberV,

I heard something like that as well, maybe from you? I never lost Spotlight comment data. There were issues with a nice frontend called Spot-something (I do not remember well, it is not developed any more). I tried to check this in the Developer’s documentation but could not find anything, maybe, I did not search thoroughly enough.

The problem with Nifty Box is similar with the current implementation: NiftyBox only recognizes tags applied inside the application.

In fact, I have been working with folder actions and apple script for a short time, even thought of writing a simpe AppleScript Studio application. But despite its immature state NiftyBox was so helpful that I spent the 20 $. I did not regret it, and I work (retrieving data etc.) outside of the application, only in the finder. I can also copy the files to my iBook, the tags are there and I can use Spotlight consistently on both machines.

Fot the extremely cautious among us, there is an additional solution: From time to time you might like to run an applescript with the files, file path (which is simple and consistent – in my case the month I tagged the file – because file location does not change any more) and the tags. In a worst case scenario you can write a script that selects the files where tags were lost, check in the text document and add the tag again. Of course, more elegant solution might be there, this is just one that comes across my mind at the moment.

I hope this was your question?

All the best,
Maria

There’s a lot of good information here. Thanks everyone for sharing.

AndreasV/Maria: you both mention that you are only working in RTF, Text, HTML and PDF…I have a question regarding the directory structure you use. Do you create directories based on subject/topic, based on project, or based on dates (or a combination of these?).

–Carlos

Carlos,

here is my solution, it works basically without any ordering in folders:

In my document folder I have a large folder “old sorting” with multiple documents sorted into several projects etc. plus backups of versions. This is going to melt down when I proceed tagging all these old files and converting them into standard formats. It shall vanish until the end of this year.

Then there is a folder “tagged documents”, with subfolders 2007-02 to 2007-07; here I add any tagged document, next month a new folder 2007-08 will be added. When I add tags to an already tagged document, I do not change its place in the file system, because this is absolutely unimportant. If sometimes I tag a folder with e.g. results of RFA analysis of pottery from a certain site, I do not undertake the effort to move the files out of that folder, they get together with the folder into the “tagged files” subfolder. Spotlight finds them anyway.

A helpful aspect of the tags is that I have a group for versioning: "original data, final version, and some versions inbetween. So when I have almost identical files, I know which one to take, and for documentation I can keep inbetween versions.

Finally I have a folder “smartfolders”, where I have a collection of smartfolders that search for any tag I use and with smartfolder that look for sensible combinations of searches. These are basically the folders where I look into normally – if do not use a spontaneous Spotlight search.

Instead of searching in the Spotlight field on top, I have a smartfolder in my dock with some presetting for what I usually search (comment, content, etc.). For elaborate searches I use this smart folder, which is quickly populated with search terms. But this is just a personal preference.

Best,
Maria

Maria, I’m interested in what you say about Nifty Box. What about a researcher who works on two machines and wants to gather data and tags on both of them, yet keep them synchronized? Does the license allow installation on two machines, and how would one keep the two files up to date? Thanks.