Garbage collection

(Brief report – I’m on the road right now; more details later.)

Working with a Scrivener project in Scrivener 1.51, I was wondering why my backups were swelling alarmingly on a new project; up to 24Mb. I did some digging. Turns out I have some PDFs and web archives. Some tweaking of the PDFs using a compressor takes it down to 14Mb. Then …

I have a brain wave and start doing cut-and-paste of the web archive texts into new text documents in Scrivener, minus the bloaty graphics. (A lot of newspaper web pages run to around 1Mb to hold about 500 words of text. Plus junk like adverts and flash animations.)

My workflow was: select all desired text in a web archive. Create new file, paste text into new file, drag web archive to trash, empty trash. But the backup zip archive wasn’t shrinking.

I did some poking around in my .scriv project directory from the command line, and it looks as if web archive files are not being deleted when I empty the trash. Manually zapping the files (with “rm -i”) works and doesn’t seem to damage the document structure in any way – I’m down to a 2.2Mb backup archive now. This also seems to affect deleted PDF files – I replaced a bunch with squished JPEG images or text files copied from their text, trashed them, and found they were still kicking around in the project directory.

Don’t have the time to verify if this is a repeatable bug (create new project, add stuff, delete stuff, see if it’s still there) but I have my suspicions … if it is, fixing it would be very helpful to those of us who are fanatics about backups!

Finally: the BinderStrings.xml file is large (about 30-40% of my project in size!) and doesn’t seem to shrink any. I’m wondering if there might be a garbage collection issue here, too.

Not to address the bug, which I haven’t looked into, but you might be interested in the preference for importing web pages. In the “General” section near the very bottom, there is an import option for handling HTML files. Default is WebArchive, but you can switch that to text which will automatically do the steps you are currently taking to get pages in.

Additionally, you can tell Scrivener to do this when dragging and dropping WebArchives from your drive.

Hi,

Thanks for the bug report. I just tested this and it worked fine, though - the .webarchive got deleted from disk when I emptied the trash. You mention that this happens with a backup, though - could you give me your exact steps? Is any synchronisation involved? I’m just wondering, since you mention that this only affects a backup file (unless I misunderstand you), whether it could be some synchronisation process whereby files aren’t be deleted but only new files added. If this is in the original project or through Scrivener’s Backup To feature, then that’s different, of course.

Thanks and all the best,
Keith

I wasn’t synchronizing with anything; just making copious backups between steps (in case I accidentally corrupted my project). Yes, what I was noticing was this:

  • Import a web page (import options: convert HTML files to web archive)

  • Select and copy contents of web page to a new file in the binder

  • Drag web archive to trash

  • Empty trash

… And the .webarchive file persists in the .scriv folder.

Other possibilities: the only thing that occurs to me is that I’ve got CVS/Subversion-compatible saving enabled.

Update: I just tried to duplicate the problem with a new project, and the test webarchive I imported was deleted correctly when I emptied the trash. So it looks like I have a subtly corrupt project (which is a novel under contract, with a real-life deadline): whoopee!

(If you want I can send you a saved copy of my preferences, and the project – I kept some of the backups. NB: I’m on the road, in a hotel with limited broadband, so this may have to wait until next week.)

((Off now, to export everything into a stable format, just in case.))

One possibility is just that there is a permissions conflict, so that somehow Scrivener’s isn’t being given permission by the file system to delete the web archive (and other files, by the sounds of it). Can you delete other files from the project successfully - I mean text files - or do they linger, too? Also, when you reopen the project, if files are existing in the file wrapper that have been trashed, they should appear in a _Restored Files folder in the project. Is this happening?

It would be great if you could send me the file that’s having the problems, thanks. Just zip it up and send it to me at support AT literatureandlatte DOT com, and I’ll take a look as soon as I get it, whenever you get chance.

Thanks again and all the best,
Keith

I’ll have to check the permissions when I get home. One thing I’ll flag; frequent use of rsync to replicate my work directory tree between multiple machines. (I need to check the UID I’m using on each machine – that’s a possible source of headaches, isn’t it?)

I’ll mail you the file when I get home (2mbps outgoing bandwidth rather than a wet piece of string – it’s a 14Mb zip archive).

Are you rsyncing over ssh or via an intermediate drive? if so the perms should match the UID and umask of the user logging in (ssh on the remote or login on the local).

This assumes are are not using root for the rsync. That is a bad idea for more reasons then just permissions.

Let me know if that does not make sense.

I’m rsyncing over ssh and not as root.

Update: I just looked in /Applications and the app permissions turned out to be … unusual, let’s say. (I change machines frequently and am overly reliant on Apple’s Migration Assistant.) So this may just be an idiot permissions whoopsie.

Does Scrivener leave any kind of logfile lying around that I can look over? Or is it possible to enable debugging output?

I’m afraid there’s no log file or debugging flags, no. If you’re syncing the structure between machines, this could very well be the problem. It wouldn’t have permission to remove the files on some of the machines, perhaps, so that when it gets re-sync’ed, they get put back in the original too. Would that make sense? I have to admit I’ve never used rsync so I will have to look into it, but this is my first thought.
All the best,
Keith

KB,

rsync over ssh assumes the permissions of the authenticated user on either end. Which means that as long as the UID used in the ssh authentication is the same used to login to edit the files then deletes should be permitted.

Wait… what is the exact rsync option line? Delete is off by default so that might be it.

I’m using rsync -avz --delete , so the delete option is on. Besides, what I was noticing seemed to be Scrivener not deleting files in a given project when I deleted them – I’m on the road and not rsyncing with my server right now.

Not as root that should be OK, but it might be leaving UID out of whack. Try just rsync -rtlpvz --del This removes the -go which would set group and owner to match.