Dropbox - online backups and synching between computers

iguano · October 28, 2010, 7:01am

Thank you so much for adding in support for Dropbox in Scrivener 2.0 via the automatic backups! I’m new to Scrivener and it’s a big adjustment for me to work outside my Dropbox folder in Scrivener, but the automated backups to my Dropbox folder definitely help that. I would love to hear people’s experiences with Dropbox and Scrivener 2.0 and what kind of workflows you use between the two. I’m still trying to figure out the best approach.

For those of you not familiar with Dropbox, it’s a great online “cloud” storage. They give you 2gig free and it works like a champ.

If you’re thinking about it, maybe use my referral link which will get you an extra 250mb for signing up - Sign Up Here

Anyways, would love to hear workflows and what does and doesn’t work for you. Thanks!

iguano · October 28, 2010, 7:03am

I forgot to add that what I liked about Dropbox when I used it with, say, Word was that it would only upload the part of the file that was changed when I saved it, instead of the whole file every time. Ideally, I would like that with Scrivener. Has anyone achieved this?

kdbertel · October 28, 2010, 5:56pm

I think you may be mistaken here. I’m pretty certain that Dropbox is only atomic to the file level, not the byte level. So it uploads the entire changed Word file–not just the “parts that changed”.

I am also a bit confused on the “parts that changed” bit. Word’s file formats are binary in final form–the original, .doc, was explicitly so, and the new one, .docx, is literally a PKZIP file (or at least, one of the standard compression algorithms), and compressed files always appear as binary junk.

So, quite frankly, I don’t know how any program, much less one that works only at the file level, can tell what was changed inside of a Word document and only upload that. At least, in any way that would be more efficient than just sending the original file.

Since Scrivener actually produces a folder with subfolders and files, and Dropbox is atomic to the file level, it will only update the files that have changed as a result of using Scrivener. Although, if you’re doing it as backups, then you are producing PKZIP files (or one of the standards), which puts you in the exact same bucket as .docx, above.

In short, Scrivener’s backups’ interaction with Dropbox is exactly like Word’s files’ interactions with it.

At least, as I understand it.

iguano · October 28, 2010, 11:30pm

Here’s from their website here:
Efficient sync - only the pieces of a file that changed (not the whole file) are synced. This saves you time.

More info from their admins in the forums:

Yes, DB syncs only the changed bytes, and nothing else, so incremental uploading is a core feature of DB. Not only does this save bandwidth, but it also saves time. It allows you to restore your files to any previous versions, from the web interface.

Dropbox tries to be very smart about minimizing the amount of bandwidth used. If we detect that a file you’re trying to upload has already been uploaded to Dropbox, we don’t make you upload it again. Similarly, if you make a change to a file that’s already on Dropbox, you’ll only have to upload the pieces of the file that changed.

It definitely works on the byte level and I’ve witnessed this many times. (I work as a programmer and file systems engineer in my non-writing hours.) If you open up a word document and edit one line of a large document and save it, it changes the part of the file where that text is, some statistics information in the file header and the file info such as Last Modified. Those three snippets of information are then uploaded to Dropbox, not the whole file.

Certain actions can completely rewrite the file, necessitating a full upload of the file, but in most cases, it doesn’t need to.

When I first started using Scrivener, I hoped this would be the same. I work in the PC world during the day, but live creatively in OSX and I’m not familiar with how it handles packages of files, which is basically what a Scrivener file is. I had run a few tests where I created a large Scriv file, saved it and uploaded the entire thing and then made a small change to it. When I saved that small change, it re-uploaded the entire thing rather than either the file or the portion of the file within the Scriv package that changed.

This is, ideally what I would like to avoid. Previously, I would work on a Word file in my Dropbox. I constantly save as I write and after each save, it would upload the changed portions of the file right away, creating an instant online backup that I can sync with my other Mac.

I know I can get used to a different workflow and simply periodically back up my Scrivener project to my Dropbox, I’m just going to miss that instant save/sync/backup that I have become accustomed to. Unless someone can provide workflows of how they use Dropbox with Scrivener other than what I’m currently thinking.

kdbertel · October 29, 2010, 1:53am

Huh, so they are atomic to the byte. Well then, my bad on that point.

Out of curiosity, how do you determine that it only changes those bytes? As near as I can tell, whenever I edit a Word document, it uploads the whole darn thing back up (since it takes it a while, and some of mine are big files).

I’m confused about your statement about Scrivener, though. Scrivener doesn’t have just one file for any project–a project is a whole hierarchy of files and folders. On OS X, it conveniently hides it in a package, which is basically just a folder with metadata so OS X treats it like a file. But it is, in fact, nothing more than a folder. Dropbox doesn’t (yet) have any sort of support for this, so it thinks it’s a folder.

So if you have a Scrivener project saved in Dropbox, it will only update the files you end up touching during your editing session (and, especially since .rtf is not binary, it should be pretty good about the only-the-bytes-updated thing). I can confirm this from my own Dropbox logs; I created a project that ended up with over 100 files in it, but when I opened and edited some stuff, Dropbox would only update a dozen or so (depending on what I touched, and some of those are just the simple “here is where you are in the project” and user-lock stuff).

Mind, if you’re regularly backing up to a .zip file, and doing that through Dropbox, then you’re doing a more massive re-write of the file, so yeah, it’d do the whole thing.

I’m curious why the extra seconds matter so much to you, though. Are you charged per usage, or are on a high-latency connection or something? Or do you just dislike waiting the extra few seconds it’d take for stuff to sync between all of your computers once you close Scrivener on one computer?

AmberV · October 29, 2010, 3:26am

Actually it is even less than that. The part that makes the Mac treat it like a package is in the system file type database. Try changing the file extension to ‘.scri’ to see what I mean.

iguano · October 29, 2010, 6:16am

For me, what I experienced was a complete upload of a 15mb Scrivener project every time I made a small modification. Those tests were in version 1.5 and it may have had something to do with how I had it set up. (I’m very, very new to Scrivener and still figuring it out.) Still, I didn’t like the idea of a very long upload for the 15mb every time I hit the save button - and I hit Save a lot!

I’ve been running a number of tests with the 2.0 NaNoWriMo version and I don’t know if it is something to do with files being laid out in the project differently in this version or if I’m doing something different/proper but I am seeing a big difference in the upload. Instead of uploading the whole Scrivener project, it just uploads the files that have changed inside the package.

Yeah, sorry - I keep saying the wrong thing. I meant Scrivener project, which is made up of files and folders. Still getting used to the OSX file system and packages.

After my tests, I do have a plan on my workflow. I find it too difficult to get away from working within Dropbox as I like the instant syncing. I write on both a laptop and a desktop and I’m not too sure I want to constantly refresh from a synced backup in my Dropbox folder. I’m also aware of the reported issues of Dropbox and Scrivener projects becoming corrupt. It sounds like the next version of Dropbox may help solve this, though I’m not sure. In any event, my workflow is that I’m going to have my Scrivener project in my Dropbox, and I’m going to backup my projects on application exit to a folder outside of Dropbox. Dropbox will be my main backup, but if anything happens to the project, I can go to the most recent backup on either my laptop or desktop.

Jaysen · October 29, 2010, 9:34am

Besides that fact the “db says so” there are some basic things techniques that are used to determine exactly what portions of a file need changed. The most common is a block based checksum method where a specific number of of bytes, normally a number of disk blocks, are read and a checksum is calculated. If the checksum matches then move on to the next comparison block. If there is a difference then do a lower level comparison and resolve the differences. This is much easier and more efficient with text than with binary. Typically with binary data there will be lots of little changes interleaved throughout the data or a point in the data past which all data is changed. Hence the reason that a large word file that is smaller than a scriv package would take longer to sync.

One of the newer tricks that I am seeing with some sync software is that the new zipped formats are unzipped before comparison begins. This allows the lighter weight text comparison to be done while simultaneously reducing the total BW needed to transmit the changes. It is a bit heavier on the local systems, but seems to be light enough to be worth the overhead.

kdbertel · October 29, 2010, 12:34pm

I understand how to do it programmatically. What I’m asking is how I can determine that. Is there some Windows application that can do a binary diff on two files (I used to have one that did a text diff), or some command-line fu for it?

I was simply questioning the “When I edited this document, I know I only changed a small number of bytes!” statement It seems like a fairly bold claim, and I was wondering how anyone could know that.

Plus, Dropbox for me only tells me when files are changed (“4 files have been uploaded” “foobar.xml has been downloaded”), never how much bandwidth it used to do it.

Now that is pretty nifty, and makes sense. Since bandwidth is turning into the current computing choke point (as opposed to CPU speed or RAM), it makes sense to use some computing cycles to save on bandwidth usage. Any idea of Dropbox does this yet?

Jaysen · October 29, 2010, 12:49pm

The most obvious check is to snoop on the network interface (look into ethereal) or on the running process (using a debugger). Being more of a unix guy I can tell you 101 ways to do this in OSX/Linux/Solaris/BSD/HPUX/Irix. I can only think of using something like cygwin to install some basic truss/struts tools on windows. I really try to stay out of the windows dev platform. Occupational thing.

The systems I am using are typically a bit ahead of mainstream tools. Some of them we wrote ourselves. I do not work with DB much, but the one set of “under the hood” checking I did on behalf of an internal customer not too long ago showed that they were doing a straight checksum based method. Nothing like unpack and sync.

Laxaria · October 29, 2010, 4:25pm

I’ve used Dropbox more as a back-up tool than a synchronisation tool… The ability to just use it to host files is amazing, really. Not good for a full computer back-up without paying, but it is really a well reputed application.

Cadence · October 29, 2010, 7:23pm

I use the Dropbox beta build, currently 0.8.112, since it allows selective syncing and does a great job syncing OS X meta-data (I tag file with Leap, so syncing OpenMeta tags is crucial to me, plus labels are synced). It can be downloaded at: forums.dropbox.com/ (“Latest Forum Build” Sticky).

I have Dropbox installed on all my computers – Windows and Mac. Plus my iPod.

The Scrivener project I’m actively working on is stored in the computer’s Documents folder and not in Dropbox. Before I quit Scriv I backup the project to zip in Dropbox and wait for it to sync. When I want to open the project on a different computer I copy the zip from Dropbox to that computer’s Documents folder, unzip, and start working on the project. When I finish, I backup to zip in Dropbox again… You get the picture.

Jaysen · October 29, 2010, 7:38pm

Maybe I am dense, ok I am dense, but why not just put the scriv file AND the backup in DB? DB isn’t screwing up work that is active, just screwing up the saves. This will make sure that your backup always has the changes in mem as well.

Wait. Scriv dumps the files to disk so the backup would be corrupted as well. Never mind.

Cadence · October 29, 2010, 7:51pm

LOL. Now I’m confused. Hmmm…but I always am, though, ain’t I?

Jaysen · October 29, 2010, 7:59pm

ok, so here is a fundamental piece to understanding why DB chokes on scriv

When you open a project you are only opening the binder file (was binder.xml) and any file that is displayed in the editor pane. When you select another file in the editor scriv writes your changes to disk, closes the file you WERE looking at then opens the next one. This is why it is so fast to edit a project with so many little snippets and why saves don’t take a week.

So if DB whacks a file that is on disk but not being edited (not in the editor) then your backup will be corrupted.

Make sense?

Cadence · October 29, 2010, 8:22pm

Partially. Reading your description, if DB is syncing a “live” Scriv project, isn’t there a greater danger of file corruption? Isn’t it safer to work on a local copy of your Scriv project, and let DB sync a zipped backup?

Jaysen · October 29, 2010, 8:26pm

Cadence:

Jaysen:

ok, so here is a fundamental piece to understanding why DB chokes on scriv

When you open a project you are only opening the binder file (was binder.xml) and any file that is displayed in the editor pane. When you select another file in the editor scriv writes your changes to disk, closes the file you WERE looking at then opens the next one. This is why it is so fast to edit a project with so many little snippets and why saves don’t take a week.

So if DB whacks a file that is on disk but not being edited (not in the editor) then your backup will be corrupted.

Make sense?

Partially. Reading your description, if DB is syncing a “live” Scriv project, isn’t there a greater danger of file corruption? Isn’t it safer to work on a local copy of your Scriv project, and let DB sync a zipped backup?

Exactly. Hence the last sentence of my first post.

The advantage to being headless is that no one is surprised when you miss the obvious like I did.

Cadence · October 29, 2010, 8:54pm

LOL… Nah. I suspect you’re anything BUT headless. Not to mention I’m forever in your debt for helping me split those pdfs.

Jaysen · October 29, 2010, 10:19pm

Your welcome.

Now, how to cash in that IOU.

iguano · October 29, 2010, 10:38pm

If the file corrupts, will I know right away or not until the next time I try opening it? I could then restore from the previous backup.

Also, any idea on what causes the file to be whacked? I’ve been using DB since it became publicly available and I’ve never had issues with any files being corrupted. Though, this is the first that I would synch a “package” rather than an individual file such as a Word document, image, PDF, etc.

The main reason I want to not have to be synching up backups is because I have created a Scriv project from my research and work so far and it is close to 200mb. I don’t want to have to synch up the massive backup every time - it would take quite a while.

I’ve been looking at the notes on the various 0.8 builds and it seems like they may address some issues that have caused issues with Scrivener in the past, but that’s an uneducated guess as I’m not too familiar with the OSX file system.