Sigil Says: "HTML files not well formed..."

Compiling to an ePub and then opening it with Sigil I get the error message “This ePUB has HTML files that are not well formed…” It then attempts to correct it (successfully??).

Is this a known problem, and if so, what can I do about it? I’d like to avoid figuring it out by trial and error.



Unfortunately it might be something specific to the content. I’m not having any difficulties with validation using some testing material. The ePub opens fine in Sigil 7.4, and passes the built-in FlightCrew check from the Tools menu. Sigil does come with copies of Pretty Print and HTML Tidy embedded, so it can and will automatically clean up HTML. Thus, if it passes validation after the cleaning, and you don’t see anything obviously wrong with it, it should be good to go. You may be able to find a more detailed report somewhere of what it changed.

OK, I’ll selectively compile different chapters to see what causes that message.

I isolated it to one short chapter. When converted to MOBI, the paragraphs after the error are bulleted paragraphs instead of a regular.

The error that Sigil reports is: error on line 13 at column 81: Attribute redefined.

If you want me to send you the epub file from that one chapter, I can do that.

I’m having a very hard time fixing that chapter. I cut all the paragraphs and paste them in without formatting, and I’m still getting errors.


I got it fixed by copying everything out and then back in again as text, and then carefully reformatting.

A number of times I’ve had problems in which regular paragraphs were replaced by bulleted pargraphs. I’m concerned that it’s an error that could creep into my document at the last minute and not be discovered until after publishing.

Can you give me any insight as to what might be happening here and how to avoid it?



My guess would be that some text got pasted in from another word processor. Lists are a common area of mild incompatibility between RTF editors, as not all programs describe lists the same way. This can result in “phantom” lists that suddenly pop up in normal paragraphs and other oddities; it could be a malformed list command or two in the source RTF was generating bad HTML. It’s strange that survived through Paste and Match Style initially, though. My first guess, given that, would have been a stray invisible Unicode character. Sometimes those can be a pain and mess things up, while also being difficult to find on account of their invisibility. But it sounds like that wasn’t the problem either.