One of the big advantages of using Markdown to generate your HTML for blogging software is precisely how clean and devoid it is of any opinions on how it should be displayed. Those using Scrivener’s stock HTML output have a massive tangle of extremely opinionated CSS rules (stuff like indent settings, fonts, font sizes, line heights, paragraph padding, kerning…), so much so that it if you paste it right into a blog it stifles the blogs native look and feel. As you noted above, something you’d have to deploy a bunch of messy RegEx to clean up. I wouldn’t recommend anyone go down that path. Scrivener’s native HTML is better for those that have no clue what any of these abbreviations mean and just want to upload an .html file that looks somewhat like they told the compiler to make it.
If instead you have this, which is more like what Markdown is going to give you:
<h2>Article Subheading</h2>
<p>First paragraph</p>
<p>Second paragraph</p>
Then it is entirely up to the blog’s theme and overall style settings to make that look the way it should. No processing, no post-processing… you’ve just got some CSS in WordPress’ setup somewhere that looks something like this:
/* Only paragraphs following other paragraphs are indented */
p { text-indent: 0 }
p + p { text-indent: 0.25em }
So it is for this reason that I don’t really include much by way of examples for HTML formatting in Scrivener’s Markdown compile format list. In most cases people want that option because they either have their own CSS or want to plug the HTML into some system that does.
LaTeX compile formats, on the other hand, you’ll find lots of examples. I’d like to provide DOCX/ODT examples, but limitations in the compile system make that difficult (to style things properly there you need template files, and a reliable means to point at them on the user’s system). Plus on the Windows version the Processing pane was never even finished, lacking an embedded script facility, meaning I couldn’t even do a trick like packing binary data into a self-extracting script. Plus, its debatable how much of an “example” that would be to anyone without programming experience. Anyway, I digress.
Yeah, I get that, but I would have thought that since (1) Scrivener already knows I want a first-line tab for each para, and (2) MD supports HTML, so the Scrivener compilation could have inserted three or four at the start of each para, I sort of hoped that the fact that Scrivener wasn’t doing that was simply because I wasn’t pressing the right buttons in Scrivener compile.
So, couple of things here:
- Scrivener works in a fundamentally different manner when you compile via one of its MMD or Pandoc settings. It’s better to think of it as a plain-text editor almost (though that’s not really the case in many important ways, that’s a better place to start from than thinking things like “Scrivener knows I want a first-line tab”—for this, no, it does not).
- Whatever you just did there to “indent” is not how you want to do indents in an HTML file (or pretty much any kind of file). The example I gave above should illustrate what is more like what you want to see and do. So even if Scrivener did have higher level “indent paragraphs following X” type controls in the Markdown workflows, it would not be converting literal tab characters to non-breaking spaces or anything like that. It would be most likely generating CSS and inserting that into the Pandoc/Markdown metadata block so that it can be inserted into the HTML file’s <style> block, or generating a sidecar .css file along with the .html.
But we don’t, because that is a rabbit hole we don’t want to go down, and in most cases MultiMarkdown and Pandoc already have an excellent infrastructure for formatting the raw .md file into something pretty, especially for HTML. Reinventing all of that doesn’t seem like a good use of time.
To conclude, I do think you’re on the right track with using MD here, just expect less formatting from it, because in most cases like WordPress, less is more.
So to wit, I would just strip them out if anything (though it doesn’t really hurt anything to leave them in as they will be ignored by most browsers; shown as a single unwanted space at most). It’s better to let the WordPress theme handle indent policy, or add the CSS to the HTML file to let it handle that, which you can do by adding an “HTML Header” metadata key to your compile Format’s Metadata pane. But I would only go down the latter route if I was making a simple static site. For any decent CMS or similar, it’s going to want clean HTML and minimal CSS.
And in fact there is one important reason you would want to strip out prefixing tabs for MD files, and that is because such means “this line is a code block line”. But I would go about this by using the Edit ▸ Text Tidying
submenu to strip leading tabs out of the source, rather than a Replacement. As I say, in nearly every case, rich text and Markdown alike, that’s not what you want for indenting paragraphs anyway.