I cannot import any web pages

bmsingh · November 10, 2019, 8:15pm

Is anyone else having issues importing web pages to the research folder? I have tried to import all day, but every time I try it says “web page could not be imported.” I can understand that some will not work but I have not had a single one work all day and this is getting really frustrating. I can copy the link into a document, and I can add the web page as “webpage complete” for the most part, but I really want it to import as “PDF document via webkit”.

DavidR · November 11, 2019, 5:09pm

Hi, and welcome to the forums. Alas, Scrivener v. 1.x has not been able to import Web pages successfully for quite awhile. It has to do, I think, with how the formatting of Web pages has advanced and the code underlying Scrivener has … well, not. The good news is that this should be working in the new v. 3 when it comes out. It may already be working in the beta version, if you want to try that. (A number of people already seem to be using it productively.) The only other alternative right now is to save the pages as PDF from your browser and import that to Scrivener.

JJSlote · November 12, 2019, 4:37am

I must respectfully disagree. I’m confident we won’t see a robust direct web page import in Scrivener 3, and I don’t consider that a shortcoming. For Scrivener to process a modern web page with its scripts, cookies, ads, adbugs and endless scrolls, it would have to become a browser in its own right, and that is simply not its remit. I think users will obtain better results by deploying their browsers to save research pages to local files, and dragging those into Scrivener. The MHTML format is preferable to the PDF because the text will zoom and wrap to fit in Scrivener’s editor pane.

Microsoft’s Chromium-based Edge browser has MHTML as an option under Save As. For Chrome, the capability is buried; try the instructions at the site below to enable:

winaero.com/blog/enable-save-as … le-chrome/

Important note: Use file extension .MHT, not the default .MHTML for your single page web file. Scrivener will not recognize it otherwise. Import the MHT file by dragging it into your binder, or by File > Import > Files.

(These suggestions apply to Version: 2.9.0.28 Beta (735337) 64-bit - 06 Nov 2019.)

Good Luck – Jerome

devinganger · November 12, 2019, 7:07am

Just to clarify, do you mean Edge as generally available in all Windows 10 builds, or the specific new Edge-built-on-Chromium which is in open beta but has not yet been generally released, and is not yet slated to replace the normal edge until some future Windows 10 feature update?

The reason I ask for clarification is that Edge as shipped by default on Windows 10 is not built on the Chromium engine; that’s a separate project Microsoft is doing. Folks should understand if they need to download and install the Edge Chromium beta, and all of the supporting pros/cons that entails. I’ve been using it as my main browser since it was still dev channel only and have been pretty thrilled with it – being able to load most Chrome plug-ins has been really nice – but it’s not a 100% replacement for Edge yet.

JJSlote · November 12, 2019, 9:10am

Thanks, Devin. To clarify, the Chromium-based Edge is not the default Edge shipped with Win10. One can obtain it on the beta or a companion channel here:

microsoftedgeinsider.com/en-us/download/

The benighted Win10 default Edge is not even graced with a Save As.

Cheers – Jerome

DavidR · November 12, 2019, 2:07pm

Thank you, Jerome. This is good advice, and I take your point about the difficulty of importing Web pages today, without having an actual browser engine, or doing a lot of pre-processing to strip stuff out. For clarification, do I understand that “we won’t see a robust direct web page import in Scrivener 3” is an impression based on your testing of the beta?

I also use Evernote, which has a good feature for clipping Web pages into its notes, including options to clip articles, simplified articles, full pages, and screenshots. Evernote notes are formatted in a variant of HTML, not RTF, which is why their Web clipper works pretty well, though not perfectly. The simplified article option does particularly well at removing extraneous matter. Maybe someday Scrivener 3.5 or 4 could adopt something like that.

JJSlote · November 12, 2019, 3:10pm

Actually, I regret the point. It’s one I’ve made on the forum with regard to pasting into a Scrivener doc from a web page source. But it doesn’t apply to imports. Scrivener 2.9Xb has a browser, the qt Web Engine. The built-in browser displays a dynamic web site when we drag the site’s token (the icon just to the left of “https://” in the navigation bar), from another browser into any Bookmarks pane. If we drag that token into the Binder, I’m pretty sure the same web engine saves the dynamic page as an MHT.

Now, if Scrivener has trouble importing a page, and it sometimes will, I’d still suggest saving to MHT in one of the other browsers and dragging the file in. But please disregard the accompanying points about Scrivener’s browser tech and the prognosis for V.3; they aren’t valid.

My bad - Jerome

DavidR · November 13, 2019, 4:43pm

Thanks for the correction, Jerome. It’s actually good news. And it reinforces the point that v. 3 will give a better means of importing Websites than v. 1 has had for ages, maybe ever.