Snapshots of web pages

I noticed that if I import a web page, it’s really just a link to that web page and the contents are not searchable like a text document. Obviously, however, web pages can change over time and sometimes the information you are looking to keep can be lost. It would be very cool if you could utilize the Snapshot feature on a web page, which would keep a snapshot of the web page and thus keep the content in the project, and make it searchable. Having multiple snapshots from different times could also be very useful, depending on the web page you are capturing.

This could also enable (in version 2 and onwards) the ability to compare the text in different versions of the web page, although I guess I could see that being more difficult than comparing plain text.

There are several ways to import a web page, handled in the Import Options section of the General preferences pane. You can convert HTML to WebArchive or text. The former will retain original page presentation, the latter will convert the HTML to RTF, which does a good job of extracting the information, a reasonable job of retaining formatting, and not so good at presentation. WebArchives are immutable, and RTF conversions are, like all rich text files, editable.

It sounds like you are using the WebArchive function, which downloads (most) everything required to display the page, and packages it into a single format that is portable and works regardless of connectivity. That it looks exactly like the original web page does not mean it is loading the page from the Internet. So what you see in Scrivener after you import is an archived copy that won’t change.

There are a few exceptions to this. Apple didn’t do a 100% perfect job and some elements remain linked to the web and will change over time. This is particularly true of iframe elements, I believe, which ordinarily are nothing to worry about since most pages just use those for ads. Some pages do you iframes for actual content though, and here is where it can get to be a bother. These cases are not common though, in fact I think not all iframes are susceptible to this issue.

Snapshots wouldn’t solve the problem, because it is a problem in the webarchive format itself. If you snapshot the webarchive, even if you go back and view it later it might still change in accordance with the above bugs since the format itself is at fault.

You should be able to search within a page though. If I download the L&L homepage, and then search for “award” that page returns in the result list. Perhaps the lack of highlighting is what threw you off? [b]Cmd-F[/b] works fine too, and that does highlight.

To say the least. :slight_smile:

My advice, if the information of the page is more important to you than the presentation: use the text import option. You’ll get all the benefits of snapshots and everything else that you can do with the regular editor in Scrivener, such as highlighting, annotating, etc.

I am new to Scrivener, but what I did was drag a web page from a browser into a research folder. When I use the search in the top right corner, it gets no results from the web page.

After reading your reply I went to the app and played around a bit until I noticed the ‘Add…Web Page…’ option. I tried that with the same web page and sure enough it was searchable. I guess this means there are two types of web pages, ones that are imported and ones that are just links. Of course, from looking at the two pages in my research one wouldn’t be able to tell which was which.

I guess if I have a feature request here, it would be that there was more of an indication of what is a link and what is a web archive. It would also be nice if there was highlighting in searches…

Hmm, while you are right that there is more than one way to import a web page, the result should be identical. The menu command is just there for people that either are using a web browser that doesn’t support URL dragging, or if they don’t have the web page loaded in a browser. Dragging the URL from a compliant browser to the Research folder in the Binder ought to produce the same WebArchive that [b]Cmd-Opt-W[/b] does.

What you are also correct about, is that dragged URLs don’t get added to the search index. This is actually a bug, I think, that shouldn’t be happening. Fortunately the bug has been fixed in the latest test builds for 2.0, so it shouldn’t be a limitation for much longer. I tried to fix the issue with Opt-Save to reset the search index, but that didn’t seem to work for me. I’ll summon the developer and see if he has any thoughts on this matter that can help you out for now. Like I say though, it looks cleared up for the future.

And to be clear, stuff that is dragged into the Binder shouldn’t be links. The only kind of linking that Scrivener has is very clearly established in the References pane, which doesn’t ever show up in the Binder so there shouldn’t be any confusion.

Do you have a reproducible case where a web page is updating itself after a URL drag, but not updating itself after importing from the menu?

Concerning the updating, I don’t know that that is happening. It was an assumption on my part based on the fact that the page was not searchable. I just assumed it was a link and not an archive, and thus not searchable. I didn’t think it was archived and just not added to the search index. If this is fixed in the next version, great.

I still think having some way to keep multiple versions of the same web page (other than just adding new archives) would be nice.