Authentication issue importing Web page

bayareaguy · January 13, 2008, 10:36am

I just tried to import a web page from an internal site which

uses https:
uses basic authentication

It seems Scrivener wasn’t able to do it. It didn’t say exactly why - just that the page could not be imported.

How should I proceed?

KB · January 13, 2008, 10:42am

Probably best just to copy and paste the contents of the page, then. Scrivener uses the WebKit’s built-in methods for grabbing a web page content, so this page probably doesn’t want to be saved. You can try saving it as a webarchive from Safari and then importing into Scrivener that way (by dragging the saved webarchive from the Finder into Scrivener, but my guess is that Safari may not want to save it either. Worth a try, though.
All the best,
Keith

bayareaguy · January 13, 2008, 10:53am

I think the problem is Scrivener isn’t following up with the https certificate / http authentication necessary to complete the transaction.

When I opened the site with Safari, I got the “Certificate” dialog. After than I gave my HTTP authentication details (name and password) and I was able to see it fine.

I then saved the file as a webarchive and dropped it into Scrivener and it worked fine as well.

One thing that did not work was using Opera’s “Save as”. Opera writes a .mht file containing all the web content as a MIME document. Safari seems to use something very Mac specific.

KB · January 13, 2008, 11:01am

Scrivener cannot do everything that Safari can do - the certificate check etc in Safari is possible because Safari is a dedicated web application with a whole team working on it. Scrivener’s save as web page is pretty basic - it just tries to grab the web page. If it hits resistance such as this, it won’t work, and the workaround of saving as a webarchive from Safari in this instance is a good one.
Best,
Keith

P.S. Yes, Safari’s webarchives are very Mac specific, which is decidedly annoying.

AmberV · January 13, 2008, 11:38am

Not only that, it doesn’t seem to save all linked content from the page. I haven’t used webarchive enough to confirm that, but I have heard of people who have tried to load their webarchives a year or two later and find certain content missing. The base HTML gets saved, it seems, but all external stylesheets and images might not. Again, I haven’t confirmed that, but it is enough of a concern to use the word ‘archive’ very gently.

Personally, I just prefer Scrivener’s convert to text, or just grabbing the text via copy and paste. I’d rather be able to edit the result to remove advertisements, unnecessary navigation clutter, annotate it, and so forth.