Track Changes / Compare Docs (WAS: Publishers and workflow)

I work with a couple of SF/F publishers (okay: Ace, aka part of Penguin Group, Orbit, aka part of Little, Brown, and Tor, aka themselves) and the novels I write get delivered into their workflow. Which, hitherto, has been the traditional kind: print to paper, send lump of dead tree to copy editor, send copy-edited MS to author for carpet-chewing and corrections, send CEM to typesetter for introduction of new and fascinating mistakes, send galley to author for return death-match, then roll the press and remainder the run.

(Well okay, that’s how it’s supposed to work, but I digress …)

I’m told that from this year, Ace are moving to all-electronic workflow, i.e. they’re getting rid of the lumps of dead tree. MSs will be delivered electronically, ideally in MS Word format. Copy Editors will mess with the text online and email the MS back to the author, who presumably has a word processor that can cope with change tracking. This will then get sent to the tripesetters who will slam it straight into InDesign, then a PDF will go back to the author for on-screen perusal and annotation (presumably with PDFPen or a similar app).

Welcome to the glorious future of the 1990s!

I bring this up because I gather that the other major publishers have similar plans/day-dreams/forlorn hopes, and it’s a potential problem for future Scrivener development. Scrivener’s a great tool for composing the work, but it stops dead at the point where it runs up against the editor’s email inbox. As more publishers move to all-electronic workflow, this is going to become more of a problem. Is there anything that can be done about it – specifically about the change tracking side of things?

(What I envisage is something along the lines of: an option in “Compile manuscript” that will save the RTF or Word output as an extra file in the Scrivener project (say under a heading “published drafts”, comparable to “Research”), and then a tool to compare/merge the RTF/Word file returned from the copy editor before re-exporting it. Obviously preserving the Scrivener document structure throughout the round trip via another word processor is impractical, but is some kind of visual diff/merge tool for individual files achievable? And more to the point, is it desirable?)

charlie, thanks for the hilarious account of how publishing works today. My co-author and I just went through this process with a trade house in NYC. We begged them to let us work in electronic formats all the way, but no, they hired a copy-editor who had no experience in fiction. She sent us paper sheets full of pointless changes (correct grammar in dialogue), and we could do nothing but write STET over and over, thousands of times. Corrected proof came back to us, again on paper. Again the struggle to preserve our original intentions. Publishing is literally stuck in 18th-century methods. (We also found that editors are quite naive about how to handle e-mail and attachments.)

Some smart cookie (probably spelled Bezos) will take over publishing soon and make it all-digital, all the time, and the old houses will fold. Like iPhone apps, books will sell for $1.99 and authors will earn 1% of nothing. And writers will settle for it, because they lust to see their names in print.

While I agree that Scrivener should be mindful of this future, I’d prefer to see it remain a composing tool and excel at that. I can’t imagine publishers wanting to see our Scrivener files. I just wish they’d agree to postpone going to paper for as long as possible.

Desirable, definitely; it’s the practical aspect that is the problem. In an ideal world, what I’d love to implement in Scrivener is this: in 2.0 the snapshots area will be built into the inspector, and what would be great is if, upon loading up a snapshot to view alongside the main text, the snapshot was highlighted with all the changes that have made. That is, that a diff/comparison is shown right in the snapshots inspector.

Unfortunately, it’s the practicality that is the problem. Diff/merge/compare is not trivial - far from it. If it were, it would be in a lot more word processors and programs already (even Nisus doesn’t have a track changes feature yet, I believe). As I understand it, Word and Pages track changes by having the user switch on this feature, then observing all edits. This is easier (though still far from trivial) than having two files compared. Comparison becomes even more difficult when you have to take into account rich text changes, too.

I did some research into diff algorithms and techniques when considering this, and my head was left swimming. One way of approaching it is to look for the longest common substring (LCS) between the two texts, then the next longest and so on. Any ranges of text that aren’t found as common substrings have obviously been changed. Finding the longest common substring is a mammoth programming problem in itself (just check out the Wikipedia page on the problem), but even if you crack that, then it still doesn’t help you find areas of text that are the same but have been moved and so on.

Disregarding the ability to diff/compare snapshots, a basic track changes feature wouldn’t help much either, as any track changes feature within Scrivener itself would only be of limited use unless it could export the track changes accurately to a Word document - another Herculean task that so far only Apple have managed with Pages, and I hear that that only has limited success.

If everyone just used coloured text instead of full-on track changes, life would be much easier. :slight_smile:

All the best,
Keith

Keith,

I don’t know if it is technically related to track changes, but in Nisus you can examine the “Compare documents” macro. It does a nice work of showing changes between RTF documents.

Paolo

Paolo,

Many thanks for pointing me in the direction of that macro. Interestingly, it uses the command-line diff utility, which I had already discounted but now clearly warrants another look. It’s plain text only, but rich text is incredibly difficult to compare, and it would at least highlight edits. I’m going to look into this more carefully and try to implement something like it in 2.0.

Out of interest, which of the three options in the macro - paragraph, clause or word - is the most useful? Or do you use all three?

Charlie, you can test this out like so:

In Nisus Writer Pro, create a document, save it, then make some changes and save the changed version as a different file. Have both open in different Nisus windows. Download the Compare Documents Macro from here:

nisus.com/forum/viewtopic.php?f=18&t=3458

Then in Nisus, go to Macro > Load Macro to load the Compare Documents macro you downloaded. Then go to Macro > Compare Documents to run the macro.

Let me know if this does the sort of thing you are after. If so, I am hopeful that I could use -diff in a similar way to the Nisus macro to compare Scrivener snapshots with the main document…

Thanks and all the best,
Keith

I’ll look into Nisus Writer, but first I need to download the thing – I fell off their upgrade escallator a couple of versions ago due to NW having a couple of show-stopping limitations (being able to simultaneously look at two or more windows into a long document is an essential pre-requisite for writing, IMO).

On the subject of diff(1), note that diff was originally designed to spit out a command script for the ed(1) line editor to transform input file #1 into input file #2. Modern versions of diff (if you can bear to read the texinfo documentation – damn the FSF for horking up that abomination upon the sanctity that is the traditional UNIX man page!) can produce other forms of output, including input scripts for the patch(1) tool (does much the same thing: patches a file from one version to a newer one).

Now the interesting thing about ed is that (a) it’s a line editor, (b) sources are available from a variety of sources including under a BSD license (this goes for diff, too), and (c) it’s very simple-minded: typical commands are regexp/command tuples that unpack to things like “go to line 23, word 3, and delete to end of word”, or “search lines 4 to EOF for pattern /foo/ and replace matches with string /bar/”, or “at end of current line, append string /baz/”.

Applying diff to a file, to give a visual output, you basically look for delete commands and replace with strikethrough formatting, and look for insert commands and execute them with some form of highlighting. Making the edits clickable and doing useful stuff with them is, however, another matter. (Speculation: some sort of metadata tag for altered text?)

Anyway … this is just random speculation; but if you find the BSD license to be non-toxic, you might want to look at the FreeBSD sources to diff and consider whether you can wrap them in a library and use them to drive your own editor.

(He says, teaching grannie to suck eggs.)

Update: on RTFM’ing, it looks like the GNU diff program can spit out ed-compatible edit commands; the BSD one is limited to patch(1) compatible output, and does linewise stuff. Hmm. And roll-your-own minimum edit distance stuff is hard (been there, got the scars – many years ago).

Keith,

For me, it really depends on the type of change I’m evaluating. If I know only a few words were changed, it is Words. If a more extended series of changes was carried on, it is Clauses (finer than Paragraphs, but less confused than Words).

Paragraphs, I only seem to find useful when full blocks of text were added or removed. Otherwise, it seems to lack the needed finesse with ordinary revisions.

Paolo

Charlie,

Hoping I’m not transforming this into a Nisus forum, here is another hint related to that program:

You can do something similar with another macro (Open Copy). It does not updates the copy while you edit the original, but I guess you can open a new copy each time you need an updated reference document.

Paolo

For anyone interested, I’ve uploaded several files showing how Nisus Writer’s “Compare Documents” macro processes things. You can download them here:

literatureandlatte.com/misc/NisusDiff.zip

The NisusDiff archive contains 6 files, as follows:

LoremIpsumOriginal.rtf - the original document, five paragraphs of lorem ipsum text.
LoremIpsumEdited.rtf - an edited version of LoremIpsumOriginal
ChangeInfo.rtf - a description of all the changes made to the original file in LoremIpsumEdited.rtf

The final three files are merged diff documents, created from the original and edited RTF files using the “Compare Documents” macro. The macro has three modes - compare by paragraph, by clause, or by word. There are thus three corresponding documents, each one created from a comparison between the original and edited documents in a different comparison mode:

LoremIpsumByParagraph.rtf
LoremIpsumByClause.rtf
LoremIpsumByWord.rtf

(In order to see the highlighting, I recommend opening these files in TextEdit, Bean or Nisus, as the Macro applies highlights using the \cb - background colour - RTF tag, instead of using true RTF highlights, meaning the highlights don’t show up in many RTF editors, including Word and Pages. Er, or you can just import them into a Scrivener project, of course, as Scrivener reads OS X text system highlighting.)

Did you hear that? Something just went right over my head!

From what I can see, this is exactly what the Nisus macro does using the diff utility and a couple of RTF files (though of course it converts them to plain text first so you don’t see any formatting changes in the comparison, which would be a nightmare anyway).

I don’t think I was ever considering anything like that anyway. The main idea of how it would work is this: Scrivener 2.0 has the snapshots moved to the inspector. You can also drag snapshots from the “show snapshots” table into the header bar of an editor, to view the text of a snapshot, read-only, in an editor pane. So to use the comparison tool usefully, here is how I see it:

• You’ve sent a document off, and now you’ve received it back with lots of changes in it. Scrivener can’t read track changes or anything, but it would be able to do a very basic comparison on the plain text content. So you take a snapshot of the original document, and then you select the whole thing and copy and paste in the new version. Now you take another snapshot.
• You split Scrivener’s editor, so you have the new version of the document in the bottom editor, and whatever else in the top.
• In the snapshots pane in the inspector, you now select the earlier version of the document and drag it into the header bar of the top editor. So now you have the older version, read-only, in the top editor, and the newer version, editable, in the bottom editor.
• Now you select both snapshots in the table in the Snapshots pane of the inspector and click on a button - “Compare by Paragraph”, “Compare by Clause” or “Compare by Word” (I’ll worry how to fit such buttons in the inspector later…). Now the text in the inspector shows a diff’d version of the two documents, merged and compared, with green and pink highlighting or different text colours (or whatever you set in the preferences to represent additions and deletions) and strikethroughs.
• So at this stage, you have the old version of the document in the top pane, the new in the bottom, and a highlighted comparison in the inspector. You can now scroll through the version in the inspector looking for changes, and use that to find the changes in the two files. (Or I suppose there should also be a way to get the compared version into one of the main editors, too.)

This should be a decent tool for finding the changes between two versions of a document. And if you wanted to find the changes to a whole manuscript, you would use this process on a compiled version of the MS, and have the comparison open in one pane while browsing through your original documents in another.

Does this make sense? Based on the files generated from Nisus, does it look as though it may be useful? I think it could be very useful myself, and am rather excited about the whole thing.

Not at all! Remember that I taught myself Cocoa to create Scrivener, so there is a lot of stuff I feel rather sheepish about not knowing, including pretty much everything you just said!

Many thanks and all the best,
Keith

Okay, I’ve done a little work on this. I took a look at how Kino’s Nisus Macro used diff (essentially by splitting a document’s words or clauses across different lines, running diff and then sewing them back together again with formatting) and done something very similar in Cocoa. On top of “Diff by Word”, “Diff by Clause” and “Diff by Paragraph”, I’ve added a “Diff by All”. This takes a quick look at the text first and tries to decide whether to break paragraphs down into clauses and clauses down into words depending on the content of the other document. It should usually provide the best of the other methods.

You can try it out here:

Download the Cocoa “compare text” test app

EDIT: An updated version here -

UPDATED Text Comparison Demo

EDIT 2: Updated again:

UPDATED Text Comparison Demo 2

Put the original text in the top pane and the edited text in the bottom pane, then click on one of the “Diff” buttons at the bottom. The result - text combined from both with formatting to indicate what has been removed and what has been added - will appear in the text view on the right.

Obviously this is a plain text comparison only and will not show any changes to formatting (the text in the right will use the font and formatting that is most common in the newer document, but will have no other formatting than that and the indications of textual differences). As it uses /usr/bin/diff, it cannot process formatting at all.

I think it works quite well, though, and that it should be useful for checking the differences between an older version of a document (snapshot) and a newer version if built into 2.0. Let me know what you think.

All the best,
Keith

Interesting. I think it could be very useful.

On the basis of a very quick test the only thing that seems to throw it is punctuation, where a stray comma or full-stop in the “edited” version causes it to strike through the original containing clause in its entirety and then insert the amended version in the “difference” pane (rather than merely highlighting the punctuation itself). But maybe that’s to draw one’s attention to it — a feature not a bug!

H

I like the direction your thoughts are moving Keith. I know absolutely nothing–even less than that–about programming and difficulties therein, but this would be very useful to me and allow to stay within Scrivener as much as possible. I really hope it makes it into 2.0 version or 2.0+

JRP - did you try the demo linked in the post above to get an idea of how it would work? That’s pretty much exactly what would be in 2.0.

Hugh - thanks for looking at it. Yeah, the punctuation thing is because of the way it breaks things up. Basically, it takes a paragraph and checks to see if that appears in the other document. If it does, it doesn’t do anything. If it doesn’t, it splits it up at punctuation and checks to see if each chunk is in the other document. If not, it breaks each chunk into words. But in the case of the following:

This is a sentence this should have a pause.

Becoming:

This is a sentence, this should have a pause.

Then both parts of the sentence exist in the first document so they don’t get split up into words - if they did, then only the punctuation would get highlighted. I’ll give it a little more thought…

Thanks and all the best,
Keith

Okay, I’ve updated it slightly so that it shouldn’t repeat whole clauses if punctuation is added. Try this version:

UPDATED Text Comparison Demo

Am I right in thinking this is Intel only?

Best, Martin BB.

Hmm, possible - I tested it on Tiger but not on a PPC machine. Try re-downloading it - I’ve just rebuilt it so that it should work on PPC architectures too… Hopefully.
All the best,
Keith

OK, thanks, I’ll give it a go.

Best, Martin BB.

Seems to be OK on a very brief test. Many thanks,

Martin BB.

Keith,
I am impressed! I haven’t had time for a detailed test, but an initial trial was, well, fun. I was a little uncertain of the function of the All/Paragraph/Clause/Word distinction, but a quick play around soon cleared it up. I can easily see how using these could help quickly find the location of changes in a large document, then narrow down to specific changes.
I like this. A lot. :smiley:
Tim

Yeah, the idea is that there should only be one setting - the “All” setting in the test. The other settings are there just because it was based partly on Kino’s Nisus macro, which provides those three. The “All” is an attempt on my part to get the best of all three so that only one setting is necessary.
All the best,
Keith