Where are Web Archives actuallly stored?

It appears as if the archive is controlled somewhere out on the internet. I have disconnected myself from the internet and I’m able to load some web pages… and some I can’t.

I want to be sure that the web pages I am importing will still be accessible months from now…

Is Scrivener archiving and placing a copy of the HTML file somewhere within the project?

Or is the web page still on the Internet?


Scrivener downloads the page into the .scriv project as a .webarchive file (the same format that Safari uses when you select File > Save As…). However, the downloading of .webarchive files is far from perfect. It seems that not all of the content gets downloaded into the webarchive - certain images and so on do not (but the text is downloaded). I don’t think there is anything I can do to force everything to download into the webarchive as I have looked into this - you’ll see that Safari has the same problem, in that if you load a webarchive seaved by Safari while it is offline, not all of the content will be visible…
All the best,

Unfortunately this is a limitation of the WebArchive format and the way Apple’s system retrieves data. As you note, it doesn’t seem to grab everything necessary to display the file offline, which of course means it isn’t very reliable as a true archive format. Another problem is that they will rely on caches stored in your Library. Caches are not meant to be permanent, and once those disappear, content may disappear from the archives. There are other problems with the format. It isn’t very transparent, meaning you cannot easily edit the archived page. This is sometimes necessary if the page had a reload code embedded in it; sometimes it is just desirable to remove advertising cruft and so forth. The archival application, Together, addresses some of this with an easy to use WebArchive editor, but still doesn’t solve the network access issue.

There are several options:

  1. If you need the full content (media, stylesheets, javascript, etc.) of the page, I recommend using another browser like Firefox, Camino, or Opera. They all have save modes which will download everything required to display the page. Alas, these methods do not mix well with any application taking advantage of WebArchives, such as Scrivener and many archival applications, because they save the web page alongside a folder with all of the necessary content, rather than in a cohesive bundle.

  2. If you just need the text, I highly recommend changing Scrivener’s preferences to use text archival for web pages. This method usually requires a little clean-up. This usually means deleting a bunch of header and footer gunk, and maybe a sidebar. Another alternative is to print pages as PDF and then archive those.

This is a known bug. In fact the latest WebKit releases have the bugs fixed, but these are not yet part of the main stable operating system release. You could update WebKit manually, but this carries risks in that it might break existing applications that haven’t been updated for it yet.

Thanks for the replies… they are a lot of valuable help.

AmberV… when you said to “print the pages as a PDF and then archive those…”

Did you mean to say “Save the pages as a PDF?”

I’m going to try that while I wait for your reply.

Once I have saved the page as a PDF- I’m assuming I can then import the PDF and I’ll have a “permanent” copy of the web page(s) stored within the project.

Is this correct?



I think what Amber means is that in order to save as PDF you have to go to File > Print, then select PDF > Save as PDF. But yes, then you can bring that PDF file into Scrivener and all of the images will remain there. Her other suggestion of converting to text is also a good solution.

It’s good to hear that Apple have this fixed for future versions of the web kit - I didn’t know that!

All the best,

Yes. This is how I do it. I select print. Then under the PDF drop down I select SAVE AS PDF. The nice thing about this is (1) it is easy to organize and print (2) it saves link information.

I never use the web archive feature because to me it is too limiting.

Of course if all else fails and the PDF is not giving you the information you need you can always do the old fashion screen shot (⇧+⌘+3) (full screen) or (⇧+⌘+4) (selective window) and then just pull in the “pics” for static reference. Usually if you are trying to reference a scene in a video file or flash file that is not “printing” correctly to PDF.

If that makes any sense?