Problems importing from blogs

I’m new to Scrivener, using it under Windows 7 Pro 64-bit. I’ve been trying out various capabilities, including importing material from Websites. Many sites seem to work fine using any of the options avaialble from File > Import > Web Page. However, I’ve tried importing several blogs (into the Research area of the Binder, of course), and they seem to have severe problems no matter what method I use.

For instance, trying to import theologyphdmom.blogspot.com/2012 … maybe.html has the following results.

  • Dynamic Web (Embedded Browser) loads a background texture, show a progress bar at 49% or 87% (it varies from one try to another) in the footer, along with the URL, and then just sticks there for several minutes, until I give up.
  • HTML (Text Only) similarly downloads the background texture, and the footer shows the URL and a progress bar stuck at 20% or something indefinitely. Sometimes I get the message, “Scrivener appears to be having trouble downloading the entire contents of this web page. Do you wish to ‘Import’ the content downloaded thus far or continue to ‘Wait’ and see if more data can be downloaded?” There are 3 buttons: Import, Wait, Abort. If I choose Import, I still get just the background and the stuck progress bar.
  • Image (Browser Quality) is faster and simpler: it instantly returns the message “URL Import failed.”
  • PDF Document starts off promisingly: there is a progress bar, then a little window showing progress on the conversion. But then it too stalls, I get the “trouble downloading” dialog, and choosing Import displays a document with just the background.
  • Plain Text does the same as Image: “URL Import failed.”

I’ve tried several blogs, and they all have similar problems. Other types of Websites seem to be OK. This is a problem for me, since at the moment blogs are the main type of Website I’m trying to import. Of course copying from my browser and pasting into the document works, but it’s not the same.

Is this a known issue? Can you at any rate reproduce it?

Thanks, I’m taking a look at this and I’m pretty sure it is one of the many JavaScripts littered in the HTML file. This will probably be an issue with anything coming from this blogging service, as most of them look common to it, rather than specific to this page. Merely saving the page from a browser to HTML and editing out all of the scripts by hand does allow me to import the file successfully, but that isn’t a good solution for you, and it doesn’t help us track down which offending script is halting the process, so more research is required.

As you note, copy and paste will always work, and additionally sometimes these “readability” browser plug-ins that locate the page’s main content and strip out all of the non-essential data, allow you to save the HTML in that cleaner state. Another trick that often works well is to use the page’s print feature, if one exists (this is more common on news pages). Most try and trigger the browser’s print function, but after you cancel that you are left with a clean version of the page that rarely fails to import.

The unfortunate fact is that modern web development trends are a massive mess of cross-wiring between Facebook, Google, ad services, comment features and so on, and these do not often play well with download libraries that assume a more straight-forward page design.

I’ll update you if we can track down the source and find a way around it. In the meanwhile it looks prudent to avoid attempting to directly import from blogspot.com.

Thanks, Amber. I understand the problem. The Blogspot blog whose link I posted has a fairly simple appearance, but I realize there may be all kinds of scripts running there. Other blogs I’ve had problems with are on patheos.com, such as patheos.com/blogs/faithforwa … ng-a-damn/. These pages have a much more complicated, busy appearance, so I suppose they may have even more stuff going on in the background.

I’ll use copy and paste with blogs for now, which works well enough, since it’s generally the text content, not the formatting and links, that I need.

Thanks, I’ll take a look at those as well!