In lieu of "track changes"...

jharrison · April 14, 2007, 3:11am

What about integrating and extending the snapshot feature so that you could show a “diff” of the current document vs the last snapshot (or any two snapshots) in the split window and click through the changes? If this could be done from the perspective of an assembled document, it would be very useful and potentially more functional than Word’s track changes. For reference, TextWrangler implements a document comparison feature along these lines.

KB · April 14, 2007, 10:26am

I think this would be incredibly difficult to implement. I have no idea how I would say which parts of a document are different and which parts are the same. Suppose I scan through every letter of both documents until I come up to something different. Okay. But then what if the something different was an insertion? How do I know which document the insertion is in? Do I then scan through the rest of one document to see if there is anything that looks the same as the next piece of text in the other document, and if not, go back and do the same in the other? And what constitutes being the same? Two letters? Three letters? A word?

It’s really a very complicated process to work out the differences between two documents - which is why I went for the much simpler and more straightforward option of Snapshots!

All the best,
Keith

howarth · April 14, 2007, 2:46pm

Alternatives to Word’s Track Changes:

DocuComp || 1.03 (Classic only)
Guiffy 8.0
TextWrangler 2.2 and BBEdit 8.6.1
zsCompare 3.03

All of these applications compare files and report differences, and most merge the files as well. Guiffy and zsCompare also compare folders. The latest versions of DocuComp run only on Windows, but they have a server version that is Unix compatible. I’ve written to ask if they expect any further development for the Mac platform. For the moment, TextWrangler is the best alternative, and it’s free.

jharrison · April 14, 2007, 3:06pm

I understand that “track changes” has been discussed before and it wouldn’t be appropriate to implement a full-blown Word-like feature with cute graphics, etc. However, a really useful comparison feature may not be as difficult as you think. You’re part way there with snapshots. The snapshot (or the older snapshot if you’re comparing two snapshots) is the reference copy, and all change tracking would occur only as a comparison to the reference copy. Anything that’s in the newer copy and not in the older copy is defined as an insertion. The converse is defined as a deletion.

The classical diff tools did a line-by-line comparison, not purely character by character. Programs like TextWrangler that can soft wrap extend that to a paragraph-by-paragraph comparison (each paragraph functions like a line). Thus the program steps through the file comparing paragraphs, and once the changes are logged the comparison display in a split screen autoscrolls to display corresponding paragraphs and highlights the differences in each. Deletions are typically indicated by highlighted text in the older copy, not strikethrough embedded in the current copy as in Word. I’m also assuming that what’s really of interest are the textual changes and that we can ignore style-related changes in the RTF.

If you wished greater granularity, you could experiment with splitting on apparent sentences (period-space/s-capital), though paragraphs would probably be fine for the purpose. Any diff-specific reversion features would be based on the level of granularity chosen (by whole paragraph or sentence, not individual edited word or character) and would only apply when the comparison was between a current document and its most recent snapshot.

Diff algorithms are widely published in a number of computer langagues and in pseudocode, for example, see “diff” in Wikipedia or Google for “diff algorithm.”

My point in bringing this up is there might be an opportunity for a very useful “Scrivener-style” (that is, what a writer would really need) comparison feature here. The only additional “cute” feature of interest would be the abililty to easily attach a note or link to a paragraph or sentence containing a “difference” that would allow some explanation or commentary.

jharrison · April 14, 2007, 3:37pm

Oops – I just noticed that this topic was recently addressed in this forum under the heading “Snapshot contrast and compare.” Sorry for the topic duplication.

KB · April 14, 2007, 5:19pm

I don’t want to spend too many words on this as I’m already behind on 1.04 and the new seed of Leopard is waylaying me big time, but…

Okay, consider these short texts:

[code]This is the first paragraph.

This is the second paragraph.[/code]

[code]This is the amazing paragraph that comes first.

This is an inserted paragraph.

This is the revised second paragraph.[/code]

Now, were would Scrivener begin comparing these? Let’s go paragraph by paragraph, as you suggest:

1st paragraph:

“This is the first paragraph.” vs “This is the amazing paragraph that comes first.”

Okay, everything is fine right up to "This is the the ". Presumably, Scrivener would have to go through character-by-character. Or would it go word-by-word? Then what? Okay, the next characters are different. We have “paragraph” and “first” in this paragraph again, but they are in a different order, and there are some insertions. How does Scrivener work out exactly what is different? You can say, "Well, it just looks at the words and can see that it maybe needs to highlight “amazing paragraph that comes”. Easy to say, but how to do it in code? Not so easy at all. (It is always very easy to say, “I don’t think it would be very hard…” if you’re not the one who has to code it. )

Well, maybe we just take a paragraph-by-paragraph comparison, like you say, and mark everything as different in the paragraph from the first character that is different in the comparison. That would be fairly straightforward (for the first paragraph if no paragraphs have been inserted - see below). The following words in our example text would be marked as changed in this case: “amazing paragraph that comes first”. But then consider these two paragraphs:

To be, that is the question, whether it is nobler in the mind to suffer the slings and arrows of outrageous fortune or to take arms against a sea of troubles and by opposing end them.

To be or not to be, that is the question, whether it is nobler in the mind to suffer the slings and arrows of outrageous fortune or to take arms against a sea of troubles and by opposing end them.

Hmm, now our up-to-the-first change approach falls down, because only the first two words of the paragraph will be noted as unchanged: " or not to be, that is the question, whether it is nobler in the mind to suffer the slings and arrows of outrageous fortune or to take arms against a sea of troubles and by opposing end them" will all be marked as unchanged.

Again, I ask: upon encountering a character that is different in each paragraph, how does Scrivener then decide where the text resumes unchanged? Firstly it would need to look at the next letter of the original file, and then it would parse forward to the next occurrence of that letter in the newer version. Then it would check the next letter of the original file, and check to see if the next letter in the newer version is the same. If not, it will have to start looking for the first changed letter again and start again. If it is the same, it will have to check the next letter. But how many letters does it take before Scrivener can say that they match? Consider these two sentences:

The hairy fox liked to play dice.

The hair of the fox-trotting lady looked like it had been plastered to her dirty forehead with cement.

Potentially, in the second version - which is clearly not even linked to the first sentence in any way - it could get marked thus (bold indicating changes, non-bold indicating text that is unchanged):

The hair[b] of the[/b] fox[b]-trotting lady looked[/b] like[b] it had been plastered[/b] to [b]her [/b]di[b]rty forehead with [/b]ce[b]ment.[/b]

Now let’s go back to our original example. In the original example, the first paragraph was straightforward (well, it should have been, but we have already seen that it was, in fact, not). Now we come to the second paragraph - but wait! In the second text, a paragraph has been inserted. So the last part of the original text reads:

This is the second paragraph.

And the last part of the new version reads:

[code]This an inserted paragraph.

This is the revised second paragraph.[/code]

What happens when Scrivener comes to compare the second paragraph? Well, the inserted paragraph and the original paragraph start the same: "This is " - so Scrivener could easily mistake this and think they were the same paragraph. Which means that two words isn’t enough to decide that two paragraphs are the same. (But if that was the case, when comparing “There is one and only one way to do this” and “There is only one way to do this”, Scrivener would fail to realise that the first two words are in fact unchanged!)

So, how does Scrivener decide that a paragraph is an inserted paragraph and not just a heavily edited original paragraph? Well, you might answer, Scrivener could parse forward to the next paragraph and see which is the best fit. But what if the next paragraph is the original paragraph but has been edited so heavily that Scrivener cannot recognise it?

And this is just the parsing code. You are also suggesting that Scrivener should have a split view, note attachments and all sorts. Essentially, what you are asking for is a month or two’s worth of development on a feature that is not integral to Scrivener and not central to my original vision of Scrivener. There is a good reason I went to the length of calling the feature “Snapshots” and not “Versions”.

So, the answer to this is very positively a negative: there will be no version comparison in an 1.x release of Scrivener. After 1.04, in fact, new features will not be added anywhere nearly as often - after 1.04, it is going to be mainly bug fixes. I have been saying that since 1.0 (and it’s in the readme and on the website!) but this time, honest guv’, I’m going to get back to my writing with Scrivener. 1.04 is already becoming a horrible chore…

Thanks for taking the time to post suggestions,

All the best,
Keith

P.S. AmberV, if you come across this, could you link to this thread in the FAQ? This question is asked occasionally, and this is my comprehensive reply.

howarth · April 14, 2007, 7:20pm

I just want to make clear that I listed those other programs to indicate that comparison software is already available. In my view, there is no need to add this function to Scrivener. TextWrangler does it for free.

dafu · April 14, 2007, 7:53pm

Or, you could trundle over to the Terminal and use “diff”.

Of course, then you’d have to read the near-one-hundred page manual.

Dave

KB · April 14, 2007, 7:56pm

Hey howarth, yes, I know you weren’t asking for this feature.

And I hope my post doesn’t come across as “arsey” as we say over here. It’s something I get asked for occasionally and just wanted to explain exactly why I think it is a big code challenge and not something I want to undertake. I certainly wasn’t trying to denigrate the suggestion, which is quite legitimate and understandable.

All the best,
Keith

jharrison · April 15, 2007, 3:01am

OKaaaay…I guess I’m not the first to raise this issue.

I do understand that Scrivener is being written to satisfy a particular vision and set of priorities. That vision is making a contribution to the way that software writing tools are understood and I think it will have a beneficial and lasting impact. I have no problem with Scrivener staying on that path. I’m also not a Cocoa programmer, so all my comments are tentative.

That said, KB did ask some (perhaps rhetorical) questions above…

If I were going to approach this problem, I’d just pick a published algorithm with the characteristics I wanted from the more than 30 year history of diff programs, and implement that. As a starting point, I’d probably take a look at Heckel’s 1978 paper “A technique for isolating differences between files” and then review Cacycle’s public domain diff.js, which is based on Heckel’s paper and is the Javascript diff code used in wikEd, the Wikipedia online editor. diff.js is a word-based comparison and it’s lengthy (over 900 lines of javascript), but it’s well-commented and seems to have a good reputation.

As an alternative, I’d also look at XinDiff, which has an LGPL javascript implementation with a functional demo available in a web page. The code is in the page source and is a bit less than 200 lines of javascript. It allows a selectable character, word or line diff and formats a merged output file showing all changes (other displays are possible). It’s not fast–we are talking javascript in a browser here–but it appeared to do a good job comparing two versions of a 2600+ word manuscript excerpt of mine with more than 80 addition and deletion differences.

Regarding the UI, Scrivener already has an excellent split screen display that’s used in several settings and it would seem logical to tap that for diff views. The comment about change notes was just an aside–a nice thing to have but not critical, and it would require some thought to implement.

I’m not sure I understand the subsequent comments about TextWrangler, diff and other external comparison tools. Running a generic multi-directory comparison against two Scrivener packages (I’ve tried FileMerge) does show differences, but that’s not particularly useful since the files inside the package as visualized by these tools are not named or organized similarly to the Scrivener display, and you can run them only against separate packages, not snapshots in the same package. Plus, what you really want to do is inspect changes and possibly edit them further in the Scrivener writing environment; trying to note the position of a number of changes in generically-named file fragments and then find them in Scrivener just isn’t a solution. Cutting and pasting into TextWrangler to do a diff doesn’t seem workable either. The ideal for someday would be to have this as part of the supported editing workflow.

Hopefully I’ve responded usefully to the questions raised. I do understand that this would not be a trivial task and there is a need to focus effort where it most effectively advances the vision. Best wishes for continued success. – JH

AmberV · April 15, 2007, 5:31pm

Got it; thanks.

brett · April 16, 2007, 10:52pm

Just spotted this, which appears to do the same thing as the other text compare utilities mentioned.
I remember using DocuComp a few times back in the day, but, for the reasons Keith explained, even a few changes would prompt so many change indications that it quickly became difficult to sort them all out. It really only worked well when dealing with near-final copy, where only a few changes would be made between versions. For that reason, version comparison seems most appropriate in the post-Scrivener formatting phase (i.e. after you send a near-finished doc to a coauthor or editor) rather than Scrivener’s story development stage.
I reluctantly agree that Word’s implementation of track changes is pretty good. I wish somebody would come up with an app that compares rtfs, but given Keith’s explanation, I’m not sure it’s worth the time and resources it would take to add it to Scriv even if Keith were so inclined. Of course I’d use it if he did, but…

Gordon · April 18, 2007, 8:36pm

FileMerge.app, free from Apple when you install the developer tools. A very nice, graphical compare utility for text files.

xiamenese · April 19, 2007, 12:50am

Seems a lot to ask, to fill up the hard disk with developer tools that you’re not going to use, just to get FileMerge … If it were possible to install FileMerge without the rest it would be a more interesting proposition.

Mark

Gordon · April 20, 2007, 12:06am

Oh, c’mon. Just move FileMerge.app to your application’s folder, then delete the top-level Developer folder. This is hardly a burden, and it’s free, and you already have the tools with the DVDs that came with your computer.

Khadrelt · April 20, 2007, 3:27pm

There’s a lot of other cool stuff in the Developer tools, too.

AndrewG · April 22, 2007, 10:22pm

I just caught this thread. I’ve written file-comparing code in the past, and it’s no picnic. Plus working out a good UI for it would probably be a real hassle.

Like others, I compare my manuscripts by first exporting the Snapshots I want to compare, and then doing the actual file comparison in Word. FileMerge is nice, but going though Word lets me stay in touch with my formatting. Of course I have the hassle of shuttling back and forth between Word and Scrivener to actually implement my changes. Scriv’s “Project Search…” is the indispensable tool here that finds the right file for the text I want to change. It’s a slog to grind though a lot of changes this way, but happily my style is such that I don’t do this often.

-Andrew