Academic Bibliographies: support citeproc as an option during compile

nontroppo · April 21, 2018, 2:50am

Better support for bibliographies is an ongoing persistent feature request for Scrivener, and it is one of the most important demands for all academic / legal writing. There are of course lots of workarounds and workflows, even automating bibliography generation automatically during compile; but these involve 3rd part tools that numerous users are uncomfortable / unwilling to install.

Keith has stated that an API is too big a task for him to rebuild Scrivener with. BUT there are a few options that would be much simpler for Keith. Chief among them is to bundle citeproc-js that would be triggered during compile. citeproc takes a series of temporary citation identifiers, a bibliography text database and a style and returns a fully formatted bibliography. It needs a JS engine, which I assume Keith can use the already existing Webkit engine for (i.e. no dependencies or large additional bundles like Pandoc). It would be the responsibility of the user to supply the text database and style file (which could be stored in the Binder).

The author of citeproc-js, Frank Bennett, is active on this forum, and I believe is sympathetic to offering advice on integration.

Technically, the simplest route would be use a mechanism like Compile replacements, so we have some text like:

Perception has long been considered inferential {@helmholz1860; @gregory1980}.

Scrivener regex searches for e.g. {@[^}]+, collects these tags and calls citeproc to generate the in text replacement and the final bibliography which is appended to the compiled document. Slightly more elegant would be to use an inline style to mark up the in-text temporary citations.

Compile already does technically much more challenging feats. The major challenge here is optimising to what stage of the compile path this should be done. My assumption is it needs to be done quite early, to keep everything as RTF before compile converts to its multitudinous output formats.

SCOPE: this only applies to the compile workflow. Finding and inserting temporary citations is IMO a separate concern, for which I believe there are already ample options. Some users want CWYW, but this would add a significant level of complexity to the editor and the benefits are subjective and marginal at best. The user would need to provide the database and style files, but all reference managers and a simple search of a website deals with this.

COST / BENEFIT: the cost is an added pane and complexity in the compiler. The benefit is obvious for most academic users, because this would provide a way to generate properly cited academic output without the need to scan and fiddle with in external tools. Scrivener is already an excellent writing environment for academics; bibliography generation during compile would make it even better.

derick · April 22, 2018, 3:10pm

This is a great proposal imho. It’s a platform-agnostic solution & there are so many CSL styles available. And it fits the Scrivener write -> compile workflow.

AmomentOfMusic · April 26, 2018, 8:31pm

I third this. Being able to generate everything at once would be awesome and would really improve my workflow and, as the first poster remarks above, this solution work well with the compiling mentality of Scrivener. What I am doing now is okay, but pretty complicated by comparison.

FYI there are some work arounds to enable CAYW, at least when using Zotero.They are a bit finicky to set up, but well worth it in my opinion.

KB · October 4, 2018, 7:05pm

Although I didn’t reply at the time, I did take note of this request to look into it when I had time. After a quick bit of research, though, I’m too really sure how citeproc is supposed to make academic bibliographies easier in Scrivener - certainly no easier than using Scrivener with a third-party bibliography tool.

As far as I understand it, citeproc and citeproc-js are at the heart of both Mendeley and Zotero, and these are fully-fledged development efforts in their own right despite this. Looking at the citeproc-js page, it seems that an awful lot of coding would be needed to build it into Scrivener. From the description in the original post here, I was expecting something simple that perhaps took a piece of text and a couple of readily-available third-party files and produced a bibliography automatically. I can’t see that there’s anything like that, though.

So how would building code around citeproc or citeproc-s be any different from trying to build, say, Mendeley or Zotero from scratch but as part of Scrivener?

Perhaps you could provide some sample files that show how easy this is?

All the best,
Keith

derick · October 6, 2018, 5:17am

Hi
Thanks for following this up. I’ll tackle a first response and hopefully the other posters will chime in. I think one big advantage is what I meant by saying it’s platform agnostic - as long as the user is using a reference manager that can export a bibliography in BibTeX or csl-json it doesn’t matter whether you manage refs with Bibdesk, Mendeley, Sente or whatever. I think this is much more attractive than an approach that requires the user to commit to a particular reference manager.
Best
Derick

lunk · October 6, 2018, 6:23am

Could you please clarify this? Do you mean that I would have to export my complete library in that format for your system to work?

derick · October 6, 2018, 7:34am

You’d need to have a bibtex file with all of the citations that you want in the document. In practice I find it easier to keep it all in a single file rather than create separate files for each pub.

If you’re on Papers, it looks very easy to do:

gist.github.com/BrainStormCente … 82040d3082

If you’re not I would bet there’s an export to bibtex option on whatever you’re using.

Note that the discussion is for something that would only run at compile, so it’s not as if you’d need to do this constantly or need to have a sync set up somehow.

lunk · October 6, 2018, 7:41am

But that means that I would have to enter the citeprocs while writing, get a list of them from Scrivener, go back to my reference manager and find all those references, export them to a bibtex file, and after having done all that I can finally get my bibliography? Is that correctly understood?

derick · October 6, 2018, 10:29am

No just export your entire reference database to bibtex and let the computer do the work of determining which ones it needs for the article. You’d need to include the citation keys in the document of course but everything from there is automated assuming your refs are in your reference database.

derick · October 6, 2018, 10:31am

Also, I’d recommend you (and Keith if he hasn’t) try nontroppo ‘s scrivomatic (linked above) to get a better sense of the process.

lunk · October 6, 2018, 11:29am

I don’t like Markdown.

lunk · October 6, 2018, 12:31pm

Whhere does one find the the citeprocs while writing?

kewms · October 6, 2018, 10:32pm

Lunk makes a good point. The citeproc keys have to come from somewhere. So there needs to be some mechanism to look them up while writing. What is that mechanism? Can you use the temporary citation keys generated by your reference manager?

If the user converts their existing reference database to BibTex format en masse, they then have a single flat file, right? So they have to search that file to find the citeproc key for a particular reference? And what happens a week later when someone sends them three more papers relevant to the project? Doesn’t that undermine all the reasons why people use reference databases in the first place?

Katherine

derick · October 7, 2018, 1:57am

You can use whatever citation keys you want so long as they match the cite key field in the final bibtex file. (For 95% of users I suspect this will be some variation of Author-Date-ID where ID is a unique identifier typically a single letter for cases where Author-Date isn’t unique.) And you can search and insert them however you want.

If you get additional papers, they’ll need to get new cite keys. e.g. Li-2008a, Li-2008b, Li-2008c and so on using the format example I gave above.

Any database is going to require a unique identifier for each record so this functionality should be present in any reference manager in some form or another.

But L&L doesn’t need to worry about any of this. All that is required are cite keys in the text, a bib file, and a csl file. The user can produce these and manage references however they want.

On the ubiquity of BibTeX export see en.m.wikipedia.org/wiki/Compari … t_software

On the 8500+ CSL styles already available, see citationstyles.org/authors/

lunk · October 7, 2018, 7:17am

But as a user I would have to worry about this. That’s why I asked for an example of how this would work.

lunk · October 7, 2018, 7:48am

I use Papers 3, which has a built in system for inserting what they call citekeys in the text. When I am done writing I compile to rtf, open in TextEdit/Word/Pages and simply transform the citekeys and create the bibliography.

derick · October 7, 2018, 8:54am

I use BibDesk and I have it set up to copy the cite key in the format above when I drag a reference out; I also have an Alfred workflow to search and copy the cite key of the current selection and an Alfred snippet to search and paste. All this works system-wide.

How you’ll do it will depend on your choice of reference manager. Searching for using your reference manager with LaTex will probably have you covered as the process of inserting cite keys should be the same though the format is differently.

nontroppo · October 7, 2018, 11:36am

I’m currently in n Airport terminal waiting for a looooong flight, so I will have to wait till I’m back, but just want to reiterate some important points.

There are three main phases to dealing with references:

Searching for, storing and managing reference items. This is done manually or via many APIs to academic search systems.
Writing your work using some method to link to specific reference items. For citeproc, this is done using temporary keys.
Finalisation: scanning the text and replacing temporary keys with styled replacements and a bibliographic list.

Here we are only dealing with the third phase.

Citeproc does not “replace” a reference manager, it simply provides an additional route through which a user can convert temporary citation keys into a finalised bibliography.
Scrivener does not have to recreate all the functionality of managing references. All citeproc does is scan a text file for keys, and replace thouse keys with in-text citations, plus creates the formatted list of bibliographic entries using a citation style (the CSL file).

THUS: a user still uses their preferred reference manager to search, store and manage keys. They will also use their reference manager to easily search and insert temporary citations, i.e. using linking to a reference manager that Scrivener already supports. Other systems exist like magic citations in papers, floating citations in bookends, or several Alfred workflows that automate working with Zotero and others.

MAIN ADVANTAGE: this removes the manual scan stage that all academics apart from those who use Endnote and Word (which supports CWYW) or Bookends and Mellel need to perform. It means when you compile in Scrivener, your document already has all the temporary citations transformed into finalised text.

@kewms — the bibtex file can be automatically created from several reference managers, for example I use applescript to automate this from Bookends. The user does not interact with the bibtex file, it is merely an intermediate “dumb” copy of their database useful for automation.

Now as to providing Keith with some better example code, I will ask the citeproc-js programmer for advice first and see what he can suggest.

lunk · October 7, 2018, 11:49am

There is a fourth phase, which is the one that takes much more time than any of the others, even if I insert all refs manually: reading, understanding, interpreting and relating each reference to my current work.

This doesn’t sound like an “out of the box” system which I can use directly. It sounds as if I am required to do several extra steps to get it done “automatically”. I guess it has to do with what you are producing. For those that write several books a year with several hundred references in each I can see the advantage. For us writing a few scientific articles for journals per year with each only having like 30-40 references, I don’t really see that it will save any time.

derick · October 7, 2018, 12:53pm

Yes, of course, but that isn’t really germane to the issue of how you manage in-text references is it?

I haven’t used Papers, but if the facility to add cite keys to the text in Scrivener already works then all the automation I mentioned above is already handled for you. The additional steps would be to provide the bibliography as a .bib file and a CSL file. Given that Papers has been bought out by ReadCube I’m sure they will support export to bibtex.