Keeping track of research documents...yeah, again.

I’m coming in late and only skimmed the thread, but have you tried Journler. It’s the one app that I know of that allows quick change of creation/modification dates, right in the list and, I believe, it can be a batch operation in *Get Info," but it’s been a while since I’ve used it. The application was semi-defunct for a couple of years, but the developer has recently revived it and made it available for free. It might be worth checking out because it has all the other attributed of an information storage utility, very elegantly implemented. Just a thought…

I don’t know if this will be useful, but Aperture allows you to batch change dates quite easily. Since you’ve used an liked iPhoto, it might be what you need.

Its time to summarize my experiences after initiating this thread.

In 2006 I started collecting source material for the writing of biographies of “polar heros”. Letters, diaries, reports and published articles from a host of different sources. Archives, libraries, private collections etc. I decided (luckily!) to simply take pictures of every single page. Simple, fast and very reliable. The amount of material that had to be dated and somehow sorted could be huge. For example, during a day in an archive in Reykjavik, Iceland, I photographed close to 2600 pages.

I used iPhoto to do this sorting and dating. It was fairly fast and easy to date each photo with the original document’s date, and tagging was also not too time consuming.

After the publication of two biographies ( … +skarstein ), I started collecting material for the third, fourth and fifth (whee!) project. However, with the release of iPhoto 11, dating was no longer so fast. And I did feel that I had outgrown iPhoto, that I was missing out on some awesome software out there. Thus this thread.

I wanted a way to keep track of my documents. To do that I need the ability to date the files. This enables me to sort material from many different sources into one fat chronology - which is extremely helpful. Also, I need the ability to keep track of where the material comes from, and not least, to quickly search and retrieve based on keywords.

So I set out to find software that did what iPhoto had done, just a bit better. But then I found DevonThink, and it had OCR. I hadn’t even thought about OCR before. And the past three-four days I’ve not been able to think about anything but OCR. Awesome!

I have followed up every tip in this thread, more or less. Tested all the software and more and learned a lot about the philosophy behind a bunch of them. Fun stuff, even though initially I was freaked out and nervous about loosing all the information I had so painstakingly had entered into iPhoto the past five years. But gradually I understood that I was moving towards something better than I was leaving and it got quite exciting. Fun process!

Here is what I’ve ended up doing:

I exported my ~15 000 documents from iPhoto using Phoshare ( ) - free software that enabled me to export everything I needed from iPhoto. With one dramatic change from how I used iPhoto. Instead of having the original document’s creation date (i.e., when the letter was written) entered as the file’s creation date, I made Phoshare take the date and put it into the file name, in the format of “YYYYMMDD_FILENAME.JPG”, eg: “19211231_IMG1234.jpg”. This is sortable and actually by far the fastest way to change dates on the documents. Phoshare also exported all the files into a folder for each iPhoto event, which was named after the archive collection session, i.e., the archive institution, the archive name and the date of collection.

So I had a folder with about 250 folders with 15000 files distributed between them in the finder. I imported all of these directly into DevonThink Office Pro. Drag & drop. All of the iPhoto Keywords were imported in as uneditable keywords. A bit annoying, but who cares, read on.

Then I simply told DevonThink to start converting these JPG to readable PDF. That is, OCR. Its still working on it. I gone folder by folder, so as to stay calm and avoid confusion issues if the computer crashed etc. Depending on the source material, it scans through about two documents per minute, with the most extreme OCR settings. I’m asking it to produce the fattest PDFs and the highest OCR accuracy.

The current work-flow:

  • Arrive happy and excited from archives. Absentmindedly greet kids and wife, dump every photograph into a raw file depository on an external harddrive for eternal untouched storage.
  • Import everything into DevonThink as JPGs.
  • Date by adding date to start of filename. Batch changes of date is done with A Better Finder Rename (, which can add a text string into the filename of files I drop from DevonThink onto the app-icon. Changes are immediately reflected in DevonThink’s file structure.
  • OCR all typed material, tag all hand-written material and perhaps some of the typed with metadata.
  • Search, digest and enter Scrivener with half composed sentences and inspiration.

Thats it! I have to say its just amazing to see how much better feel I now have for the source material. I have been doing searches in the about 3000 readable PDFs DT has managed to churn through so far, and I’m blown away. OCR working so well, even with poor photos of crumpled RFC flight logs from 1918. Nice! Not getting every word, but getting many.

So - thanks to everyone here having helped out! I can’t wait to bounce into the National Archives in Kew the coming Tuesday, to really put the new routines to the test.

Yay! Glad to hear your new workflow is going well!

I’d just suggest, now that you’ve settled on DevonThink, that you wander over to their forums. They have a board for usage scenarios, and I’m sure posting your workflow there would bring many helpful suggestions.


Pleased our suggestions were of use. :slight_smile:

Excellent! :smiley:

Thanks for sharing your workflow too. Always interesting to see what others do and to contemplate the possibilities for oneself. Very impressed with what you have done.