Scrivener Link -> Web Archive not working

Having some trouble with Scrivener Linking on my MBP, using Scrivener 1.53 on OS X 10.6.1

I have added a few web archives (Add Web Page) to my Research folder. When I try to apply a Scrivener Link from a term to an archived web page, I’m seeing two problems. First, the resultant page just hangs there at “Loading web archive…” in the main window. Second, the URL to the original page appears at the bottom, but if you click it it goes to literatureandlatte.com, and not where it is supposed to.

The web archive appears correctly if i browse to it in the leftnav hierarchy, so the first issue appears to only happen when it is navigated to as the result of a Scrivener Link. However, the second issue (the URL target) is true, regardless of if you go there via the Scrivener Link or if you open it from the leftnav.

Thanks!

I can’t reproduce the first at the moment - could you take a look on the Console (~/Applications/Utilities/Console.app) and see if there are any errors reported there?

The second issue, whereby the link in the footer goes to our home page, is a known issue that was introduced in 1.52 and which will be fixed in 1.54 in the next couple of weeks.

Thanks and all the best,
Keith

Thanks for the ACK on problem #2. Problem #1 is not recurring at the moment, but from the time of my post (note we are in different time zones), I do see an error message in the Console:

9/18/09 2:57:37 PM [0x0-0x305305].com.literatureandlatte.scrivener[5188] Debugger() was called!

There are no other errors for Scrivener in my console log, and the only other errors within ± 30 mins occurred just prior to this Scrivener error. They were:

9/18/09 2:54:06 PM [0x0-0x2c92c9].com.operasoftware.Opera[4790] Opera(4790,0xa02c8500) malloc: *** mmap(size=2416427008) failed (error code=12)
9/18/09 2:54:06 PM [0x0-0x2c92c9].com.operasoftware.Opera[4790] *** error: can’t allocate region
9/18/09 2:54:06 PM [0x0-0x2c92c9].com.operasoftware.Opera[4790] *** set a breakpoint in malloc_error_break to debug

Opera is my default browser, not sure if Scriv calls that to render in the main window, or if it uses WebKit / something else. Thanks!

Strike that last report, I have recreated it.

Nothing is logged in the Console log, but it wasn’t repeating because I had cleared out my failed web site and replaced it with a PDF.

The error recurs when there is an HTML anchor appended. For example:
http://en.wikipedia.org/wiki/Castor_oil#Traditional_or_folk_medicines

I would prefer to be able to use anchor tags for long docs, so that my notes can jump me straight to the matter of interest, while still capturing the full source.

Thanks again!

Thanks for the extra info. After testing it in my 2.0 build I found it worked fine there, but in my 1.5x build I had the same problems - in other words I have already fixed it for 2.0. I have therefore copied the fix into the next build of 1.5x, so 1.54, which which will be available in the next couple of weeks, will fix this.
All the best,
Keith

iMac 2.8 GHz Intel Core 2 Duo running 10.6.1 and using Safari 4.0.3

I’m getting the URL jumping to the Lit and Lat page as well. Only noticed it very recently so don’t know if it’s got anything to do with Safari interaction.

Roll on 1.54 :laughing:

Hello,

Just checking if I am correctly understanding Scrivener’s web archive feature.
As I do research on the web, and find a page I need, I copy the url to the system clipboard, then when I select import webpage in Scrivener, it prompts me for a title for the webpage and loads up the page in the Scrivener Editor window. So if I am offline, the page is still available to me because it is saved as a web archive.
But when I go to use the archived page (a homepage for a website), it opens up an external browser and goes to the internet to get the info that links to the archived page.
So it seems Scrivener only archives one page deep from the url.

What about information from pages the webpage is linked to, is Scrivener able to go and get each page so that I can view the website and not just the first page of the website, when I am offline?
Or would this make an unmanageable, huge, archive?

Thanks for any direction or advice you can provide.

Sincerely,

C.H.

You are correct in your assumption that Scrivener only downloads the initial page you tell it to. Going any further is a fairly involved problem, actually. It’s one thing to tell a downloader that you only want to download page links that refer back to the same area of a website, but that very well might not get everything you want. You can increase the tolerances for what you’ll download, but every time you do that the risk of getting way more than you intended will increase. For instance, it’s not even safe to assume “same website”; consider someone who downloads a blog from blogger.com. And just opening it wide up to any old link whatsoever will inevitably lead to you shutting down the program before it fills up every molecule of your hard drive trying to download a substantial chunk of the World Wide Web. That last word, it’s very appropriate. :slight_smile:

So the solution is a plethora of options which must be constantly fiddled with. No two bulk downloads are the same, and each require a degree of playing around with level thresholds, type filters, total file max limits, interval masking, connexion multiplexing, and domain/directory limiters to get the solution just right. Whole programs are devoted to the problem, and they work with varying levels of effectiveness. Not everyone wants people to download their whole site (or even parts of it), and employ restrictions that these sorts of programs are supposed to honour, some let you ignore that at your own risk (these restriction files are often there for a good reason, as it is possible to get stuck in download loops with page “names” that change every time the program requests them). Nowadays it is becoming increasingly more difficult as web designers ditch years of good standards and build their sites out of a pile of dark magic that only works in the most mundane and mainstream of uses.

Some things to investigate:

  1. SiteSucker
  2. wget (a UNIX tool, but has some decent Mac OS X front-ends)
  3. Blue Crab
  4. DeepVacuum

And if you already own DEVONthink Pro, it has a very basic deep downloader built-in that might work for you.

The next problem is what to do with the content once you download it. Many of these just dump several thousand files into a directory and let you sort it. Not, in any way imaginable, useful for Scrivener. Some of the more Mac savvy programs might give you options to store everything in a series of modified webarchive files, but they won’t link to one another in the way you desire, at least not in Scrivener. The best the downloader can do is rewrite every single URL it establishes a download link to, so that it points to the local copy instead of the original URL. Once you drag both files into Scrivener, their names will be changed and they’ll be relocated into the project file—broken links, even if Scrivener could handle one webpage linking to another imported webpage within Scrivener (it can’t, last time I checked).

Your best option might be to just leave the download directory intact, and link to it via a Reference in Scrivener, letting it load in a browser, which will navigate on your hard drive as if it were a remote website—until you reach the boundary of where you established the downloader limits.