Is there software that can apply keywords to annotations?

My question is similar to another recent topic on this board about tools for annotating and then extracting notes from PDFs. I am new to the board and Scrivener, I hope I am not touching on an old and tired topic.

I am unable to find a software solution for the type of annotation and note taking that I would like to use in my research. My primary source materials are scientific journal articles. I am not attracted to tools that allow highlighting and editing of PDFs, even if annotations can be exported. LiquidText looks beautifully implemented, but doesn’t appear to scale well for a project that contains 50 to 100 associated publications. In addition, I regularly encounter PDFs that are faulty, where text selection is broken or yields garbled text.

When possible, I prefer to capture the HTML or RTF version of articles from journal publisher websites. I may begin using Markdown. I am currently leveraging the built-in capabilities of DEVONthink Pro to capture, organize, and search the content. This approach is imperfect, as some journals do not provide an HTML version of their publications. In current practice, I have a confusing mixture of note taking approaches.

Of the many different note taking tools I have explored, Scrivener comes nearest to the required feature set. Scrivener’s support for keywords, meta data, and inline and linked annotations looks quite powerful. Combined with the Outliner view, collections, and the flexible export options, it comes so close to my needs. What is missing? The main gap, as with other tools, is that keywords are applied to an entire document, rather than arbitrary text runs within a document.

Assume I am working with a program like Scrivener, I would like to apply an inline annotation and include a keyword along with the annotation. In its simplest form, the annotation would be the keyword(s) applied to a block of selected text. But sometimes I may want to include a personal comment or my own more concise version of the original text block. Over time, across a body of journal articles, I could create a collection that gathers all the annotated text blocks associated with user-specified keyword(s), along with any of my associated comments. The icing on top would be if selected graphic elements (i.e. data figures) could be annotated and “collected”. There are additional meta data that would be provided: the bibliographic source, PubMed ID, DOI, Year, First Author, a link back to the annotation in the document, and so forth.

Some example keywords:
Therapeutic Hypothesis
Patient Population
Trial Results
Biochemical Assay
Cellular Assay
Rodent GOF
Rodent LOF
Idea for Follow-up

I could almost do this now with Scrivener. I would have to stringently apply my own keywords (no typos) to inline annotations. I could copy the document snippet and use it as the content for my inline annotation, allowing its later extraction. But that is a lot of gymnastics and it is impractical to manage a controlled term keyword list manually.

My own solution to this problem is to remember that a “document” is an arbitrary construct. It can be as long or as short as needed. If a journal article is relevant to five topics, I can create five notes as easily as one.

(In practice, what I actually do is create one note and split it, since the pieces then inherit the bibliographical metadata.)

Back in the pre-computer days, students were instructed to take notes on index cards, one topic per card. That method transfers to Scrivener more or less exactly.


I have considered this. But then I feel I am doing more of the work, not the software. Is the origin of the annotated snippet trackable, can you easily return to its context? The root level of the organization system, whatever the software, will be brimming with snippets.

I experimented with OminOutliner, where each top level entry represents a publication with columns for Author, Year, Title. The child rows are the annotated snippets. For assignment of keywords, I used a column containing a Pop-up List data type. But OmniOutliner only allows selection of one item from the list, I cannot apply multiple keywords. Between that issue and this not be a full-featured solution, I gave up on OO.

Can you describe that splitting workflow in more detail? The inheritance part is key.


I use Papers 3 for what you describe, and you could use Bookends, if you’re on a Mac.

It’s pretty simple, actually. Much more complicated to describe than to actually do.

I keep reference materials themselves in a DevonThink database. As I read papers, I assign aliases based on the citation information. (Gao2020, for instance.) This sounds pretty crude in a world where hyperlinks are everywhere, but it’s simple and fast and because it’s just text it’s robust against changes in technology. I’ve successfully retrieved notes and their related papers years later.

Then, in Scrivener, I make whatever notes I want into a document named with that alias. Usually I’m just paraphrasing the paper, but I might also copy and paste quotes, images, whatever as needed. If you want to put the full citation in this document as well, you certainly can. (I probably should, as I spend a lot of time at the tail end of a project checking references, but I usually don’t.)

The next step often waits while I read more material on a topic and learn what the important themes are, but you could certainly do it immediately. I assign a keyword to the document based on the DevonThink alias. Then I duplicate the document (Documents → Duplicate) and use Scrivener’s Document → Split command as needed to create as many pieces as necessary from the duplicate. Duplicating preserves my original notes as written in case I need them for context later on. The pieces will inherit both the title and the keyword: using the title to begin with lets me skim the Binder to see what I’ve read already, while using a keyword allows me to change the title without losing the alias.

I now have a Binder full of “index card” analogs: each document contains one author’s thoughts on one topic. From here, I can do whatever I want. I can assign topical keywords, I can group related “cards” together. (Using the Corkboard, if desired, because Scrivener will autofill the Synopsis if I haven’t explicitly assigned one.) I can duplicate some or all of the cards and drag them into another project. I can export to another program. I’ve done all of the above at various times.


Katherine, that description was illuminating and has convinced me to give Scrivener a test run. Reading about the Duplicate and Split commands in the manual was instructive:

In this workflow, is one “atomized” and annotated publication a single Scrivener Project, or could I store multiple such annotated publications in a Project? If my goal is to use keywords assigned to snippets derived from multiple related publications to generate a collection and perhaps an export, does it matter how the underlying publications are organized in the Binder hierarchy?

A keyword based on the citation – Author and Year – would not be sufficiently unique, but including some title words could be. I will have to experiment.

As I said, I keep original publications in a DevonThink database. I have one “Notetaker” project per year, but you could split your notes up however you like: by year, by topic, by tying them to a “real world” project like a book, and so on. For long term storage, I’m experimenting with exporting an entire “Notetaker” project from Scrivener and re-importing it to DevonThink. (I find DT somewhat more seamless than Scrivener when handling truly large research collections – millions of words – that relate to more than one “real world” project.)

Scrivener doesn’t care what keyword you use. Using something like the DOI code for the paper in question would work just as well. I like my approach because it’s short and human readable, and the occasional conflicts are easily handled by just sticking an extra letter or number on the end (Gao2020a, Gao2020_2, etc.) Even if I miss a potential conflict, it’s not the end of the world: DT will show me all of Dr. Gao’s 2020 papers on demand, and it’s easy to clarify which notes refer to which.

For export purposes, the organization of the exported snippets matters a fair amount. The organization of the source publications doesn’t. In fact, part of the point is that the organization that makes sense for the source publications – by year, by author, by general topic area, whatever – may not make sense for whatever your planned output is.