Compiling to html in Scrivener 3

Jaaaarne · October 2, 2021, 1:27pm

Could please someone advise me if there’s a way to export clean html from Scrivener 3? By default it writes rather untidy html with unnecessary inline styling in simple tags (em, strong), and also adds classes to every paragraph, which is also redundant (default paragraph style is controlled by the file’s stylesheet).

Examples. Instead of italics Scrivener writes italics (unnecessary span with a class assigned). And instead of This is paragraph. it writes This is paragraph..

Is there a way to turn it off and get clean html without classes and redundant spans? I can’t seem to find a relevant setting in the export screen.

Thanks in advance!

rms · October 2, 2021, 2:06pm

Your question nudged my curiosity … I searched this forum for “html without styles” and found something from many years ago HTML and stylesheet but still seems pertinent. I did a test compile to "MultiMarkdown → Web Page (.html) and the resulting html seemed pretty clean. Read the linked thread for some commentary.

Probably a way to use a text editor with clever search and replace to completely clean up as you wish.

xiamenese · October 2, 2021, 2:09pm

Any help?

Mark

Jaaaarne · October 2, 2021, 2:45pm

Thanks for the hint. Multimarkdown to webpage does, indeed, produce clean html. However, it introduces another issue : this way en dashes and em dashes get encoded as html entities.

However, now you’ve got me looking this direction, and I’ve found out that I could get clean html with dashes intact by compiling Pandoc to epub, then extracting xhtml file from it and editing the header to my needs.

It solves my issue. But, frankly, I don’t think this is normal. There should be a setting or something to get clean html from the obvious menu item. I really wouldn’t have guessed to go experimenting through the “Compile for” menu.

Jaaaarne · October 2, 2021, 2:47pm

No, this is a different issue. This is about different software working with Scrivener. However, rms has already given me an idea.

Thanks anyway.

xiamenese · October 3, 2021, 3:51pm

If you read on down you will see that Ioa (AmberV) posted a “clean-compile.zip” working with the MMD–HTML format, rather than compiling simply to MMD. That is what I was actually needing, and it is that which I wondered if it might be of interest to you.

Rdale · October 4, 2021, 2:09pm

Have you tried using the Replacements tab of the compile window to translate those fancy-dashes into plain-text dashes?

AmberV · October 5, 2021, 11:44am

One problem I see here is that there is a bug with the Convert ‘smart’ punctuation to ‘dumb’ punctuation setting, in the Transformations compile format pane, where dashes and ellipses are always converted to ASCII equivalents, which in turn are then converted by MMD back into typographic punctuation (and MMD prefers entities to characters for maximum compatibility). You should be able to turn that option off and get UTF-8 style output.

So once you see that fix come up in the release notes, that should be the approach to take if you want clean HTML with UTF-8 punctuation.

Jaaaarne · October 17, 2021, 2:29pm

It would have actually been an interesting thing, as it is supposed (?) to provide pure html without headers, as shown in the example inside the zip. However, I was not able to make it work. I still get a complete html page (albeit without classes). Seeing how there’s no “Processing” pane in the Compile format settings, it’s probably absent from the Windows version altogether and this solution won’t work.

Still doesn’t resolve the UTF-8 punctuation issue. Good to know that it’s a known bug, but I won’t hold my breath for the fix. Seems like something minor to waste the resources on.

Anyway, thanks for your input everyone.

AmberV · October 18, 2021, 12:58pm

Make sure you are setting the Compile for setting to “MultiMarkdown”, so that the Processing pane can work. If you set it to “MMD → HTML” then it will use the hard-coded command line and ignore any settings in the Processing pane.

You may also find this thread useful, where this same workflow is discussed in context of Windows settings.

I’ve also updated the clean-compile.zip file in the previous thread to include both a Mac and Windows example. The only real difference is the path to the executable. I also disabled access to this Format from all compile types other than plain MultiMarkdown, since that seemed to be causing confusion.

Jaaaarne · October 21, 2021, 11:23am

Thank you, the updated settings file indeed works.

However, I’ve just noticed that compiling for MultiMarkdown (any flavour) doesn’t respect the paragraphs created by pressing Enter (same goes for Pandoc, unfortunately, I just didn’t notice it at first). You need to have an empty line for it to recognise it as a new paragraph.

Compiling to Webpage or Epub produces correct paragraphs, but brings in tons of styling garbage. Compiling to MMD or Pandoc removes garbage - together with paragraphs.

AmberV · October 21, 2021, 2:41pm

That’s because there aren’t any paragraphs to respect. Markdown requires an empty line to indicate paragraphs—it makes a lot more sense when you consider it is designed to be used anywhere. Ever seen a plain-text file with no space between paragraphs? Yikes. So a bunch of adjacent lines in Markdown will be combined into one paragraph. This is because many plain-text files will be hard-wrapped (such as email source).

If you must for some reason use word processing style paragraphs instead, then the tips given in this thread should help. Even though this refers to the LaTeX template, I have made the same setup in all of the provided stock Markdown-based compile formats.

Jaaaarne · October 23, 2021, 2:13pm

Okay, got it. I’m just used to Typora, which adds those empty lines on single return silently, so that there are none visible when editing a file (and as it works in live preview mode, it looks like a perfect minimal word processor). And when exporting clean html, it gives me exactly what I expect: correct paragraphs and UTF-8 punctuation. I don’t know how the developer achieved that, but that’s what it does. I probably expected Scrivener to do the same for some reason.

AmberV · October 23, 2021, 4:39pm

Oh yes, Typora is an interesting one, in that it almost treats Markdown more like a storage format, where you don’t even see it unless you are editing the span or block itself. You can still type in the syntax yourself, but one could almost use it entirely without knowing a thing of how to, as it has menu commands for almost every kind of formatting.

But as you note, if you go into “code view” you see the paragraphs actually are double-spaced. It’s more a matter of how it treats Markdown writing in the “front end”. It is designed to obscure it as much has possible, so in that way it makes sense to adopt word processing style paragraph entry.

With Scrivener the writing process is more like you would see in a more typical Markdown editor. But, as noted, it has so many options for transforming your text on output, that you can essentially kind of “Typora it” if you really want!

arthaey · November 2, 2021, 6:53pm

If anyone else ends up in this thread searching for a similar problem with all their paragraphs getting combined into a single huge paragraph…

I wasn’t using HTML, Markdown, or LaTeX, just the “Default” format compiled for “Print”, but I was still getting my paragraphs merged. I finally figured out that it was because I had checked the Convert MultiMarkdown to rich text in notes and text (while testing out different compile options).

Uncheck that box, and paragraphs return!