Importing a web page crashes scriv

I tried to import the following URL via File->Import->Web Page: … print.html

It repeatedly crashed Scriv.

I tried importing into a different project: same result (so not a corrupt project).

Then I saved the page to a .webarchive from Safari. Importing the webarchive also crashed Scriv.

I tried running the webarchive through textutil(1) on the terminal; textutil(1) segfaulted.

My guess is that something in the web page is causing Apple’s textkit to barf, but it renders okay in Safari. The problem is, next time I tried to import it (from the URL) it worked (after about five crashes in a row).

I have no idea what’s causing this – I have a feeling it may be in one of the system libraries Scriv depends on, rather than Scriv itself – but I thought I should flag it up.

There’s definitely something going on with that web page when saved to a web archive. I tried saving it from Safari as a webarchive, then double-clicked the .webarchive file to reopen it in Safari. The webarchive page repeatedly reloaded with an error message at the top of the page saying there was an error with the page, and after a couple of seconds the OS X crash report came up saying that “Safari web content quit unexpectedly”. So it seems that when saved as a web archive, the page crashes one of the modules of WebKit, which is what Scrivener uses to render web pages. Unfortunately it’s therefore out of my control, as the crash is occurring inside Apple frameworks.

There are other ways to import a web page, luckily. Instead of saving as a webarchive via Safari, can’t you save as source only? Chromium and Firefox can also save web pages to your local machine, and then maybe Scrivener could import them; or you could clean them up with a text editor before importing. For instance, delete all the javascript. If the images are needed, Firefox will let you save the page as web page, complete, a method that downloads all the images to a local format. The html file can be edited and then opened or imported in Scrivener (or opened in Safari and then saved as a webarchive).

I notice in this case that the url ends in `print’ indicating this is the printer version of the article, so an even simpler method of saving this would be to go to the site in a web browser (any browser so far as I can tell), then selecting the text of the article, copy, then create a new document in Scrivener and paste. Formatting and links will be preserved. [added: come to think of it, pix will also transfer.]

