Rendering web page text

I have imported a web page: http://www.scriptfrenzy.org/node/402156.

The text should read “You’ve heard of the one-act play.” But instead of the apostrophe, I see the character string .

Why does this happen?
Is there anything I can do to get the text to be rendered with proper apostrophes?

Hm. It’s happening because the special characters (smart quotes, ellipses, etc.) aren’t coded in the HTML source and the web tool Scrivener’s using can’t read them. I’ll make sure Lee knows, but I don’t know how easy a fix this will be.

Meanwhile for you, the easiest thing to do here would be to just select the text and copy/paste into a new document and add a link back to the page in the document references. Alternatively you could drag and drop the page into your binder (load it in the browser, then drag the icon/button in the left of the address bar into the project binder), but that’s basically just using the editor as a browser, so you’d need an internet connection to view it and the page isn’t archived at all (so if it gets removed from the server, for instance, you wouldn’t have a copy any more). But depending what you’re doing, that might be fine.

If you use Safari, you can save the page as a .webarchive and then import that into your project; with other browsers you can get extensions for saving to PDF, which again you could then import. (Or if you have a Print to PDF service like CutePDF installed, you can use that.) That would preserve the layout and images so it would look more like it does on the browser. Could also save the page as HTML, open it in Word and save it in another format there, but that seems needlessly complicated, whereas the other options have general use.

Hmmm… Are you sure? Here’s some text from the Scrivener copy of the HTML file in question:

In that last paragraph, it looks to me like the quote character in “You’ve” is simple HTML. Perhaps I’m missing something.

And not only does the HTML in the last paragraph (which is rendered incorrectly) appear to be simple inline HTML, the HTML source in the paragraph before that – again, simple inline HTML – contains two apostrophes and they are rendered correctly.

Curiouser and curiouser…

Yes, I’m sure the source HTML is using non-ASCII characters. If you view the source, you’ll see the quotes are “smart” quotes rather than simple quotes–you can see it in your posted sample above: the single quote in “You’re” in the first paragraph is straight whereas the one in “You’ve” is curled. Likewise you should, if you take a sample from lower on the page, see the ellipsis character used rather than three periods. You can also see a couple places where they did use a code, e.g. at the end of the second paragraph you quoted you can see “You got it—a big old”–and you’ll note that em-dash renders correctly when you import the page into Scrivener.

Amazing! I had to put the two quote characters immediately next to one another to see that they were not the same. Thank you for having a better set of eyes than I! And my apologies for not understanding what you meant.

I might just write a Python script to do a text substitution. Wouldn’t be too hard.

So then I take it that the HTML renderer does not handle Unicode characters?