I’m new to Scrivener, using it under Windows 7 Pro 64-bit. I’ve been trying out various capabilities, including importing material from Websites. Many sites seem to work fine using any of the options avaialble from File > Import > Web Page. However, I’ve tried importing several blogs (into the Research area of the Binder, of course), and they seem to have severe problems no matter what method I use.
For instance, trying to import theologyphdmom.blogspot.com/2012 … maybe.html has the following results.
- Dynamic Web (Embedded Browser) loads a background texture, show a progress bar at 49% or 87% (it varies from one try to another) in the footer, along with the URL, and then just sticks there for several minutes, until I give up.
- HTML (Text Only) similarly downloads the background texture, and the footer shows the URL and a progress bar stuck at 20% or something indefinitely. Sometimes I get the message, “Scrivener appears to be having trouble downloading the entire contents of this web page. Do you wish to ‘Import’ the content downloaded thus far or continue to ‘Wait’ and see if more data can be downloaded?” There are 3 buttons: Import, Wait, Abort. If I choose Import, I still get just the background and the stuck progress bar.
- Image (Browser Quality) is faster and simpler: it instantly returns the message “URL Import failed.”
- PDF Document starts off promisingly: there is a progress bar, then a little window showing progress on the conversion. But then it too stalls, I get the “trouble downloading” dialog, and choosing Import displays a document with just the background.
- Plain Text does the same as Image: “URL Import failed.”
I’ve tried several blogs, and they all have similar problems. Other types of Websites seem to be OK. This is a problem for me, since at the moment blogs are the main type of Website I’m trying to import. Of course copying from my browser and pasting into the document works, but it’s not the same.
Is this a known issue? Can you at any rate reproduce it?
As you note, copy and paste will always work, and additionally sometimes these “readability” browser plug-ins that locate the page’s main content and strip out all of the non-essential data, allow you to save the HTML in that cleaner state. Another trick that often works well is to use the page’s print feature, if one exists (this is more common on news pages). Most try and trigger the browser’s print function, but after you cancel that you are left with a clean version of the page that rarely fails to import.
The unfortunate fact is that modern web development trends are a massive mess of cross-wiring between Facebook, Google, ad services, comment features and so on, and these do not often play well with download libraries that assume a more straight-forward page design.
I’ll update you if we can track down the source and find a way around it. In the meanwhile it looks prudent to avoid attempting to directly import from blogspot.com.
Thanks, Amber. I understand the problem. The Blogspot blog whose link I posted has a fairly simple appearance, but I realize there may be all kinds of scripts running there. Other blogs I’ve had problems with are on patheos.com, such as patheos.com/blogs/faithforwa … ng-a-damn/. These pages have a much more complicated, busy appearance, so I suppose they may have even more stuff going on in the background.
I’ll use copy and paste with blogs for now, which works well enough, since it’s generally the text content, not the formatting and links, that I need.
Thanks, I’ll take a look at those as well!