When you drag a web page into Scrivener, it defaults to saving it as a Web Archive. If that’s the case, then why does it need to access the net when viewing that same page in Scrivener later? It seems to be loading images from the original web site instead of using what it downloaded originally. If I had saved it as a Web Archive and then dragged that into Scrivener, presumably it wouldn’t do that. My question is why is it doing it when I drag a page into Scrivener?
I’m afraid I don’t actually know the answer - Scrivener just uses Apple’s WebKit (the same used by Safari) for displaying and saving web archives. The behaviour is determined by WebKit itself - certainly there’s nothing in Scrivener’s code that does this; all Scrivener does is call a load request on the file URL; it has no knowledge of the original web page itself.
Scrivener supports importing and viewing the WebArchive format via the Web Kit component of the Mac, whatever that format does or does not do correctly is a problem with the format and the viewer that has been supplied to use for working with it. I’ve never been a big fan of it for this reason, and for the reason that I generally do not want the entire page anyway—all of the data mining tracking bugs, Flash ads that make Web Kit unstable and a security nightmare and even the navigation components around the content I intend to archive. I want to trim all of that junk out, and so for my purposes importing as text is the best solution.
And yes, some legitimate stuff can end up not being archived as well, it depends on how the page was coded. Increasingly, web designers create web pages that are all but inoperable all by themselves. The browser has to go and fetch data from potentially dozens of different web sites, for everything from CSS to Javascript to images, just to put it all together into a form that most people will recognise. I believe WebArchive does some encapsulation, but not everything will get imported.
Yes, but the whole point of a web archive is that it’s off-line. Scrivener for some reason is trying to load elements from the original page online, which is what doesn’t make sense.
Yes, but as we have both said, that’s down to Apple’s WebKit, nothing to do with Scrivener. You can always covert them to text or import the files as PDF instead.
You can default webarchives to open with TextEdit. Not pretty, but they shouldn’t update when loaded in the external editor (TextEdit), as they do when loaded in Safari.
given how useful having an archive of the text and visual information from web pages can be that functions offline, i’d like to request that mac users get some kind of “pdf” import option when they try to pull a web page into research the way pc users do
of course it is possible to export to pdf in advance and import but saving the extra step and keeping the nice formatting scrivener uses would be ideal and a great timesaving feature imo
I can’t speak to whether directly doing so is possible with the tools available on a Mac, but all Mac software that can print through the standard OS X printing dialogue can create a PDF, it is a native feature of the operating system, and Scrivener can be a direct target from any print dialogue, from the “PDF” button. If you don’t see that option, you might need to manually install it. Instructions are provided in §11.4, Print as PDF to Scrivener, pg. 141 of the user manual.
I like to combine the ability to “print” a PDF of a web page directly to Scrivener with the “Reader” feature of Safari, which cleans up the fonts and usually includes only relevant images from the web page. Some Evernote extensions available in other web browsers have a similar website view that can often be used to make uncluttered PDFs for importing to Scrivener.
A good one for Firefox (which also has a built in reader mode) is the HackTheWeb extension, which takes its inspiration from the now defunct Aardvark. When you use that extension, your mouse highlights regions of the page in a red rectangle, and you get a few buttons to do things with those regions. For example you can select the article portion of a page and hit the i key to “isolate”, stripping every element outside of that box out of the page. If you use Stylish as well, you can even save your modifications so that the site will go on looking the way you trimmed it.
It’s useful for those occasions where the reader feature doesn’t work right, or if it just does too much change to the article’s text formatting.