Source control/version control integration

I’ve been quietly pondering this a while now. I even mentioned it in this previous thread. However, I think it should be a separate topic.

I would love to see Scrivener have limited ability to tie in to various third-party source control programs (Git, CVS, Subversion, etc.). It wouldn’t have to be much – some limited UI elements to allow users to check-in/check-out/pull/push the current project. I would fully expect something like this to require the user to install the pre-requisite source control programs and point Scrivener to them, or even supply the associated command lines.

It’s probably more work than I think it is, but it would be a neat way to allow some group collaboration and synchronization across multiple devices, as the whole point is that you’re saving a group of files all together at the same time.

While I don’t have such a use case personally, I agree that this might make sense.

I am thinking about documents that are published time and again in a revised, updated version (manuals and such). Here, collaboration and tagging of releases via source control systems might be useful. Also, if you translate stuff, it might be an idea to have a branch for each target language and some mechanism which points to related original- and translated-language-versions of the document.

Isn’t this what Scrivener’s snapshot feature is for?

Sort of, but not really. Source control is much more fine-grained: only the differences are saved, so you can save frequently without blowing up disk space usage. And since differences are saved, you can look back and see exactly when a particular change was made without having to run diffs on the entire document.

The problem what is the “difference” between one version of an RTF file and the next? Source code control is simplistic and remains (even after almost a century) wedded to the paradigm of an 80-character wide punch card image. Those are easy to calculate the differences between but when an RTF line is arbitrarily long with a bunch of directives and textual content the differences are not easy to compute. Is the change in the “line” caused by editing the text itself or changing the attrtibutes? And what would be a line in a Binder document anyway? Picking an RTF file at random from one of my projects, its first line in Scrivener is four lines on the screen (under the current ruler settings) but those are all one single line in the RTF file itself. The RTF is 493 characters in length. A couple of lines down in the RTF file is another one that is almost 1K characters long with Scrivener displaying it as nine lines subject to the current ruler setting. Sure that file could be put into rcs, subversion, git or some other flavour of the month source code control system but the moment I change one character in the text or one attribute the suggested savings are not so small.

I haven’t played with it, as I’m mainly a Windows user, but my impression was that it was there to give you multiple point-in-time versions of your work as a safety net, not to allow the full set of branch/merge workflow that source control gives.

I’m less worried about the diff and compare issues another poster talked about – as you point out, source code text/Unicode files are not RTF files – but being able to say “this is the definitive project as of THIS time” and easily go to any of those saved states without affecting anything else.

I’m a million miles away from knowing anything about this topic - but the discussion so far has really only focussed on a Scrivener project as if it’s a single file. As we all know, a project can consist of hundreds, possibly thousands, of inter-related and frequently updated files. Are any of the systems listed above capable of dealing with such complexity, and doing so utterly reliably? (I raise this also partly because when the suggestion of version control has been made in the past, the folder-nature of a Scrivener project has been cited as an issue - a search in these forums will reveal previous threads.) But as I say, I’m a version-control ignoramus.

Version control systems store changes; stored versions of files that didn’t change don’t get changed. Programming projects routinely rely on version control to manage changes in thousands of files made independently by dozens of programmers. It’s been around for decades; it’s mature technology.

What is ‘anything else’ in this context? I can’t imagine that a properly integrated version control system would affect a different project from the one you’re working with, so if you go back to a previous state of a project (which is the biggest collection of data that Scrivener manages itself) what else is there to leave unaffected?

EDIT: (apologies for my previous misattribution)

Agree with you Robert but that attribution is wrong; it was all devinganger’s work not one word from me.

I still think that snapshots are what is being discussed here. (But I don’t use them myself so can’t be authoritative.)

the difference between vcs expectations and what scriv does is actually much simpler to understand.

The CONTENTS of some files are based on the CONTENTS of other files. So if you roll back file Q you would need to roll back file Y, but file Y may relate with an explicit file G that is non-existant in current state. This means that you can only version a project as a WHOLE entity not a set of loosely related files. If you can live with the idea that you have to roll back entire project as a tagged version then have at it. But at that point snapshots and standard backups are much easier.

Branching and such is easy under this method as well. Granted so is copying the file as a “new” project. Actually copy is much simpler than branching.

If you really want to do this, then the current method requires scrivener shutdown prior to check-in/check-out.

As to integrating SVN/GIT directly into scriv so you can CI/CO from the binder… interesting idea. I think the same issues would apply, but if scriv is managing the CI/CO then it could manage the updated to linked files (rebuild link? warn for user decision? ??). I’m just not sure the feature would really be worth it for all but the technical types (and the technical colabs).

Yes, Jaysen’s point is the one that in my state of ignorance I was trying to make in my post above.

Snapshots, according to the Windows manual, only work on an individual document and only on the text of that document. Which makes sense, when you think about it – I’m about to make a revision to a scene and want to bookmark how the text was before I start mucking with it. That snapshot doesn’t affect any other document in the project – and BTW, snapshots don’t capture notes, synopses or meta-data.

The manual itself says:

The key thing to understand about Snapshots is that they provide a way to set save points for individual items in the binder. They are not a tool for providing an overall snapshot of the entire project, structurally speaking, and are probably not the best tool for taking a quick snapshot of your entire draft.

With a source control program, which was originally designed for multi-engineer computer programming projects, it allows you to add/remove/edit files in the overall project, but keep track of all of those changes separately – originally via check out/in workflow, but modern versions use a more collaborative approach. Example: I create a new function in a new file. Ashley checks that file out, adds more functionality, and checks it back in. Bob checks it out, does a code review, fixes a couple of bugs, and checks it back in. I check it back out, compile the whole program, run it through its tests, declare it good, and increment the project version. Anyone who checks out that version now gets all the project documents as they existed when that version was created.

Under the newer collaborative model, I create my own local copy of the project by downloading the latest version (or a specific version) from the repository. I now have my own copies to work on, independent of anyone else’s. When I commit my edits back to the repository, they get written back up (if they were the only edits in that section of code) or I get alerted to a conflict that Ashley also worked on the same function I did. Since she committed first, I look at her changes and compare them to mine, resolve them, and commit the resulting file. If Bob incremented the version before I did so, then my commits go back to the version I’d checked out from, not barraging up the new version.

For software projects, the ability to keep track of changes/differences in each text file is key. Single developers on small projects will often use this systems just to keep different versions straight when developing new versions or backporting bug fixes into older versions.

So how does this relate to Scrivener? Well, I don’t think a source control engine would handle the RTF and XML file structure well, nor would I want them to. Project-level granularity is good enough. However, being able to take a point-in-time backup of the entire project and save it as a separate version that I could easily revert to would not only be a great way to help manage project versions, but to also handle syncing between multiple devices or collaborating with multiple authors. It would fix some of the issues with using Dropbox and other file-sync solutions because it’s not an “always sync” type of solution.

I can do this now, clumsily, with my favorite version control tools and Scrivener. I just have to save the files, shut down Scrivener, use the tools to make the commit/check-in, etc. Gets a little time-consuming. But if that kind of integration was built in to Scrivener (transparently call the tools of your choice without re-implementing how they work, open and close projects as required) it would be pretty cool. And you could use Scrivener snapshots as they are today, to continue having a file-level snapshot granularity.

Haven’t checked in windows, but if you select multiple binder items, the snapshot should include all those items in the ‘version’. This should be discussed in the forums ad-nauseum.

The problem with the meta data is that it is divorced from the text. The contents of the binder are stored in RTF that are named under the control of scriv (this I’m sure we all know). All meta data will refer to that internal name (again, we know that, just stating it for clarity).

Here’s he problem:
everyone (Joe:J, Sally:S, Random:R) checks out a legal copy of the project. At time of checkout the project has binder-item:of-file list of A:1.rtf, B:2.rtf, C:3.rtf.

They then take the following actions
J -> D:4.rtf
S -> rename C in binder to D (D:3.rtf) add new C:4.rtf
R -> Add I’m special:4.rtf, delete A-C

It is easy for us to say “change the backing store format” but the problem remains. Even in pure source control you see this same type of issue with refactoring code (this is where we yell at junior devs and tell them to learn branching, yes?). The real problem with this whole thing is that the uber geeks among the scrivnerati who understand how we should interact with VCS are NOT the normal customer. How do you support this type of complex concept for someone that isn’t really a developer when many many developers (with extensive training) can’t really seem to get it right?

Just to reiterate, I get the value of SVN/GIT integration. I’m just not sure it is practical from a L&L implementation/support perspective.

Git (much more useful and relevant than SVN) does this and always has. As it says:

None of this is hard to handle with Git. Let’s toss SVN out the window which does indeed have a version per file and talk about Git, which is significantly more relevant and useful for the use case.

Git tracks the status of the entire repository. If you jump forward or back, every file is moved to what it was at the time of that commit, which I believe will handle the use case you are mentioning here.

Furthermore, this would allow people comfortable with Git to branch and work on different pieces.

First of all, I dispute that anyone who can’t handle branching fits the description “with extensive training”

Honestly, I believe that you could add a “git commit; git push” command to Scrivener and another command to “git pull” the latest with a simple clean UI that would benefit most people without them knowing anything. At the same time it would allow those who can use Git branching and merging to work on shared trees together by using the those tools outside of Scrivener.

I care about this topic deeply and I’ll be happy to provide you both some scripts and examples for testing, and to work on tools to solve any problems you identify. Feel free to toss me a test script for bad situations and I’ll build you a working use case.