Importing a web page, does it scrape the HTML?

Will importing a webpage scrape all the images, html, css, etc?

perhaps give it a try and see if it meets your needs?

I did, and I like it. But I dont know if its getting the HTML/Images from the internet or from its own internal file and data. In other words is the data archived permanently.

The answer is a bit complicated by modern web design practices. The WebArchive format, which the Mac uses to store offline copies of a page and Scrivener uses, was designed well before “Web 2.0” happened. For simple pages, the answer is yes, it scrapes everything it needs to display the page properly—any directly referenced JavaScript, CSS, images and other resources should be downloaded and bundled up in a tidy fashion.

Where that starts to break down is in pages that aren’t fully “there”, so to speak, that stream their contents in dynamically. This can range from missing advertisements (which hardly anyone would grumble about) to getting nothing at all but an empty container that looks blank, or even just a login page you can’t really get past.

In some cases, you can get around such problems by archiving the printable version of a page. But as many sites have shifted away from having a “print view”, with modern standards making that a more dynamic thing that happens when the browser tries to print a page directly, that is rare to find. In some cases you might need to hunt down a browser extension that lets you archive what you see in the browser as a PDF file, and then import that instead. Or if you can’t find something like that, print from your browser and use the dialogue to save as PDF.

1 Like

DevonThink will, in most cases, allow you to download a webpage in different formats. I usually download as a PDF but some sites are tricky and I have to experiment until I find a format that works.