Suggestions wanted for Scriv to PDF pipeline: via pandoc, or other?

Hey everyone. In the past few months, I’ve gotten much more comfortable using pandoc as an intermediary for ePub production. Adding custom markdown to my formats and tweaking the CSS has given me a basically one-click pipeline: Scriv → pandoc → Sigil for verification. Now I’m looking to do the same for the PDF version of the same, with the goal of commercial print-on-demand availability.

As I have some specific character style requirements, I’d prefer a route that allows me to preserve that markup so I can render it my preferred way. Small caps, for example, and a style that forbids space-breaking when justifying text.

The default 6x9" paperback PDF compile format is fine, and I can clone and tweak that to get close… but I am a bit spoiled now. :smirking_face: As a career nerd, I appreciate having a million command line options and a config file that I can iterate over.

So, I’m looking for suggestions here. I glanced at Scribus, but am not sure how one pours content into it without twiddling. And honestly, it feels like work. I’d prefer spending a week to tune a config file to make pandoc just operate as desired.

Is the default pandoc multi-markdown “lossless” here, and does Scrivener give me enough options to launch pandoc and generate that PDF? Or am I looking at some more scripted solutions?

You’re in the right place, even if you need a more scripted solution. Start by selecting “MultiMarkdown” from the Compile For dropdown (yes, the name is legacy and confusing at this point), then duplicating the “Basic Pandoc” compile format and going from there in the compile format designer. You’ll have all of the things you are used to with the ePub pipeline, with regards to styles and such.[1]

You may find some things don’t work as well as Pandoc itself is still (I feel anyway) somewhat in a transitional phase where it comes to custom styles and how different output file types handle them. HTML-based has the best support for things like named ::: div blocks, and [inline]{attributes}. How well those work with its PDF output (and whether that is using LaTeX) is up to Pandoc though (or more accurately, your output filter), not so much what Scrivener itself can do.

Short answer: you might need to use some different markup methods with your style configuration, than what worked for ePub, but you should be able to get where you need to go.[2]

That all aside, you’re ultimately looking for the Processing compile format option pane, which is only available to the “MultiMarkdown” setup. This is where you plug in your Pandoc path, and then provide the command-line options you want. With that you can get one-click output for most basic things. For more complex things, a simple wrapper script that you target from this pane, is the way to go.

Once you’ve got that concept mastered, you will no longer be limited to the DOCX, ePub and DocBook entries in the Compile For menu.

There are some complex examples of this workflow on the forum here, such as the Scrivomatic setup, that might be worth taking a look into, as well as Quarto integration, which uses Pandoc to streamline PDF production.


  1. Alternatively, you could switch back over to Pandoc→ePub and edit your compile Format for that. Once you have the format designer window open, click the gear button in the left sidebar header area, and add the “MMD” checkbox to make this format available to plain “MultiMarkdown”. You might want to duplicate that though instead of editing it directly. Chances are your PDF workflow will need to diverge from what ePub wants. ↩︎

  2. For instance I can get a custom class with [inline text]{.named-style} with ePub, but if I’m going word processing output, I need [inline text]{custom-style="named-style"}. This is what I mean by it feeling transitional. I don’t understand why this couldn’t be abstracted, but that’s just my opinion. ↩︎

1 Like

Small caps is well handled by Pandoc, but this will require a bit of output-specific tweaking, as it depends on the layout engine capabilities of the final target.

As you are comfortable already with CSS, a Scrivener → Pandoc → HTML PDF engine is worth considering. I’ve long tinkered with PrinceXML as its founder, Håkon Wium Lie, is one of the creators of CSS. Ultimately TeX and now Typst are better for my needs, but I think HTML for print is somehow more elegant, and it is only for complex layouts where the dependency on Javascript solutions makes it harder to use.

Scrivener → Pandoc → Typst is the other way I would go. You can tweak the Typst templates to your liking and Typst has multiple ways of marking chunks of text for special formatting etc.

Excellent ideas, thank you both. I’m thinking that I want fewer applications involved, not more, but that’s more from a sense of simplicity than anything else. I’m not sure what an HTML conversion would buy me that well-styled MMD will not: ultimately, it’s just structured data. I’m considering what the book distributors accept as input, too. I need to play around with it, I think.

The key part is finding that map of my semantic tagging (“don’t allow whitespace in here”) to the target language’s own styling. ePub was easy, relatively, because it all happened at the CSS layer. Once I’ve worked out the mechanism to get the styling markup in place, pandoc properly turned them into span or div elements, and then I could target them. Figuring out how to do that in another print-ready format is my challenge.

Will for sure check out the pure-pandoc options. Just from some samples I’ve seen around, and from my ePub transformations, it looks capable of managing things like “format for a 6x9 page” and “generate a ToC” and so on. A Scriv->LaTEX->PDF pipeline would be the real dream, I think, but I need to baby-step my way in there. Knowing that I can do this without a lot of monkey-mousing around in another tool is ideal.

Not sure why you’d want to go the pure LaTeX route. It is way more complex than is needed for a relatively straightforward layout. LaTeX is excellent for very bespoke layouts that necessitate a high level of arcane knowledge, and endless searching on stackexchange. The Scriv->Pandoc->XXX route tries to keep the document identical. The rules for whatever XXX is can be tailored step by step. If you need XXX to become LaTeX PDF, that is easy, but you are not constrained by TeX ( pandoc’s pdf engine supports pdflatex , lualatex , xelatex , latexmk , tectonic , wkhtmltopdf , weasyprint , pagedjs-cli , prince , context , groff , pdfroff , and typst ; phew!). I would contend the end goal for you is that you can spit out BOTH an EPUB and a PDF from a single Scrivener compile. My pandoc workflow usually always spits out multiple formats simultaneously, then I choose which one to use.

1 Like

Nerd cred, mainly. LaTeX has always felt like the ultimate in customizable, codable layout solutions, and I have a deep abiding respect for its longevity and creator. But yeah, very likely overkill.

Except for a few solveable riddles (Dude, where’s my ToC?) the stock Scrivener PDF paperback output is fine. Then I looked at my few uses of custom character styles and decided what I really needed was a system like the CSS + XHTML, but for print.

I almost certainly don’t need that. Not right now, at least. What I really need, I think, is to give myself permission to flail around a little and experiment, and maybe finally actually RTFM for pandoc and friends.

1 Like

My mantra is that the thing you already know is usually going to be the best tool for the job, even if it is technically overkill. I use LaTeX to pretty-print my journal entries. That’s like using Autodesk Maya to make an animated GIF, but if you already know Maya, maybe that’s actually faster than some hands-tied-behind-your-back simpler tool you don’t know yet.

5 Likes

Just to add: if you will ever—ever—need to output to docx, then the pandoc-based approach is what you want. For example, many academic presses will not accept a tex or pdf document as a final submission for publication.

1 Like

Well, after some initial tries, routing through LaTeX’s default “book” style is automatically worlds better than the HTML->PDF experiments I’ve run. It’s not academic publishing but fiction, with all the general style expectations that come with it.

I need to do more digging and sort through pandoc’s config file some more, especially around page geometry, but I feel like I’m already starting from a better place here than with weasyprint->PDF processing.

Will keep poking at it for sure!

If it turns out you need more than “Book”, the class I tend to use the most is Memoir. Its main downside is in how much it does, it is a very big class with ~600 page user manual. On the one hand it means you don’t as often have to go package hunting to solve a problem—but this can be a negative too: it has so many packages merged into it, if what it does isn’t what you want, and some other package is, you can occasionally run into conflicts.

What you do get though for that is a good leg up, when just starting out. Memoir is going to do the hard part of finding and integrating a bunch of useful general purpose book making packages together, provides you with a decent gallery of chapter designs, page layout designs and so on, and most of them are fairly easy to customise to a surprising degree. If you’re satisfied with its design gallery to choose from, then you’ll be hitting much less of a brick wall of learning.

2 Likes

Interesting… “book” is pretty good so far, but it’s early days. I’ll try it out. :+1:

Edit: Ah, memoir supports \part which is necessary for this title (of course I’d learn on a complicated example!) Now to figure out how to make that transformation happen from Scrivener → markdown → LaTeX :thinking:

While you are aiming to LaTeX as your engine, for completeness for Typst, here are some bookish templates:

Pandoc also uses tempates for all its output formats, so for Typst you create a custom pandoc-template that references the typst pre-made template, and this gets imported for you. For example to get wonderous-book, add #import "@preview/wonderous-book:0.1.2": book to the start of your pandoc template, then use pandoc metadata to fill in the fields. In fact i have some scrivener test projects available for book templates with Typst here and following posts:

You can see the actual Typst generated from Scrivener via Pandoc and resulting PDF live here (in this case I downloaded min-book as a file and include it rather than link to it online):

The great thing is you can tweak the Typst and see the changes live, when you ar happy, copy it back to your Pandoc template and voila.

2 Likes

This was the magic. Once I found the right option to treat my # Top headings as a \part it’s very much clicked into place. I’m still wrangling with some features, like redundant section headings inserted by both me and memoir but I expect that’s me not yet finding the right variable or option.

One key question I have for you, though: how, if at all, do you manage the \frontmatter? I’ve used Scrivener’s existing folder structures to place my desired frontmatter, e.g.

Front Matter/
  Manuscript/
  Paperback/
     Title Page
     Copyright
     Dedication
Main Project Folder/
  Part 1/
    Chapter 1/
...

Including the Front Matter/Paperback in my compile places it after the \mainmatter instead of after \frontmatter

I’m trying to avoid putting “learn Haskell” on the mental todo pile :sweat_smile: and am much more inclined to leave markers in my text and post-process the resulting TeX file prior to the hand-off to PDF conversion.

I’m happy with pandoc’s overall approach, and very happy with LaTeX, but things seem to be a little more inadequate in the MuiltMarkdown side of the world.

Are you compiling frontmatter on its own and using pandoc to insert it after the header? Enquiring minds would love to know.

memoir is perfect for me here, I think. The manual is indeed beefy, but I can easily ignore sections not relevant to me, like figures, tables, and citations. It’s quite digestible if I ignore all the stuff not in my work.

I’ve never had that problem. If you compile to plain Markdown without any processing, do you get double-headings? Maybe you’ve got them typed into the text as well as being generated by the compiler from binder titles?

One key question I have for you, though: how, if at all, do you manage the \frontmatter? I’ve used Scrivener’s existing folder structures to place my desired frontmatter, e.g.

A custom Pandoc template is the answer here (don’t worry, no Haskell, or even Lua, necessary!). Templates have their own basic codes for inserting bits of your document, or checking for conditions. It reminds me of ERB, if you’ve ever messed with Rails, or even HTML/PHP in simple cases. The template file is the content, foremost, and you can sprinkle light conditionals and variable insertion into it where needed.

I don’t always need this approach, but for this and other things where I would like to insert raw LaTeX markup, I have a Section Type, called “Raw LaTeX”, which designed for passing through markup. With Pandoc you don’t have to worry about it as much and can just use “as-is” to avoid it generating a ToC section, but with MultiMarkdown you do need to use a special verbatim environment as it assumes you’re writing about LaTeX rather than using it. :wink: The Section Layout can insert that for me, so I don’t really think about it while writing, and the source works equally well for Pandoc.

Thus I have a “— Start Main Matter —” item in the binder that I can drag up or down. For that to work right, I remove the \mainmatter call from any templates I use with it, so that I can have that control.

Here is a snippet of the default LaTeX template, acquired via pandoc -D latex):

$if(lof)$
\listoffigures
$endif$
$if(lot)$
\listoftables
$endif$
$if(linestretch)$
\setstretch{$linestretch$}
$endif$
$if(has-frontmatter)$
\mainmatter
$endif$
$body$

You can just remove the three if-then-end lines that output the \mainmatter markup for a crude fix. For a fancier fix you could dig into how the template codes allow for variables, and set this as part of your YAML, running an if check for that setting. :slight_smile:

In fact if you want to get really brute-force about it, you could dispense with most of the if-thens, since this is for your document anyway. For some of my Pandoc templates they are indeed that simple and “hard-coded”. There is just a little here and there for inserting the main matter where it should go, setting up the title page, and stuff like that. The default has to accommodate everyone that wants to use Pandoc out of the box, and so it is necessarily sprawling. There is nothing wrong with having a template file that is 99% static LaTeX preamble if that’s all you need.

…seem to be a little more inadequate in the MuiltMarkdown side of the world.

It’s good for what it is, an ultra-tiny and extremely fast processor that can be embedded anywhere. It produces decent output, and if you have no issues with how it does that, it can be fine.[1] Fortunately it’s not terribly difficult to go down that path a ways, realise you need more, and switch to Pandoc. At the very least you can use pandoc -f markdown_mmd to handle what few bespoke Markdown variations there are between them, during the transitioning phase.

P.S. You may want to drop the “Title Page” from your front matter folder. If you look through the default template, you’ll see there is a section that will build that for you, based on the YAML metadata (title, author, date, subtitle, etc.). Unless you want to go wild with the design of it, I suppose, though in my opinion that should be done with the template rather than the document.


  1. The Scrivener user manual PDF is a memoir project, and uses MMD to get there. I will be switching to Pandoc in the future though, as I will be the first to admit that it has outgrown it. ↩︎

1 Like

I think what I’m still adjusting to is “give unto pandoc that which is pandoc’s”

I’m preparing my part and section titles in Scrivener’s section layouts, so the markdown output looks something like:

[Part One: The First Part]{.part}

This properly applied my class part when making an ePub so I can target and style it. What happens in LaTeX → PDF though is messier, though:

Part 1

Part One: The First Part

I’m getting LaTeX’s own default on there “Part 1” as well as my desired from the source. This might be best solved as a template, as you say. I’m expanding my knowledge about where to put these things. But I still feel like my section titles should be exclusively present. I very much don’t want this, for example, which happens now.

Chapter 23

Acknowledgements

Again, an auto-generated title, followed by my desired title.

What I’m doing to debug right now is cutting Scrivener out of the loop and just processing a dead-simple markdown file in pandoc. I think that’s the current gap in my knowledge. I’m (over)confident that I’ll get there.

My end goal is the same: once source Scrivener project that I can ship to many formats without manual monkeying. This is my “learner” book and I’m using it to figure put the prep process.

2 Likes

Oh hmm, I’ve never tried to force document structure using styles like that. As you say, it might be time to give more control over to Pandoc. You get a lot of goodies without much effort, and this is all is a good example of that (with a little help from Memoir, too).

I would only consider switching over to styles for very unusual heading types, that perhaps are not part of the document hierarchy, like “See Also…” lead-ins to lists.

Basically I’d aim for something like this:

\frontmatter

## Acknowledgements
## Prologue

\mainmatter

# Part One
## Chapter One
### Section One
(etc.)

The division declarations for switching matter types are assumed to be coming either from the template or your binder items, as the case may be.

I would emphasise how clean and simple this is. No class spamming, no overrides, just marking down exactly how much you need to convey the book structure and letting the heavy-duty software take over the rest (and of course from your point of view this is all coming from neatly indented lists in the binder with probably largely structure-driven Section Types you don’t even have to think about).

Some of the magic that would be happening here is Pandoc, in converting these hashes to headings of the right depth, but some will be Memoir as well—for example if you check its documentation for what the \frontmatter declaration does, in its section on Logical Divisions, you’ll find it automatically suppresses numbering on headings and uses lowercase Roman numerals, resetting everything but various caption numbering for the main matter.[1]

You really don’t have to do much forcing, in other words, as it has sensible defaults. The main oddity here is using a level two heading for your main matter items, which you need because you’re using level one for parts (and you probably don’t want a full page spread for acknowledgements!).

With Scrivener you could accomplish this by creating a “Front Matter” section type, and then assigning that to a Layout that forces the hash level generation to two, regardless of the item’s depth in the binder, in the Title Options tab.

But a better approach might be to create a “Front Matter” folder (not to be confused with Scrivener’s feature for it) at the top of the draft, for these sections, which would automatically bump the hash level to two anyway, and then for the section layout that handles this container folder, plug the \frontmatter declaration into its Prefix tab, and the \mainmatter into its Suffix tab, with the option to place it after all child items. That would be even cleaner!

So a little setup, either way, but once you have that you can more creatively and freely outline in the Draft without really thinking about these technical details.

Here’s a little demo of that idea, in action, with the resulting PDF in the binder. It probably won’t compile for you without tweaking the command line or switching it off so you get pure .md, but the important stuff is all in the section types and layouts, anyway.

sample_frontmatter_compile.zip (233.4 KB)

You’ll note that by switching over to using hashes for headings instead of styles, you can probably dramatically simplify your section type/layout setup too. Since the compiler is adding hashes based on outline depth, and those map to \part, \chapter and so on automatically, we don’t need to spell out in great detail what we want. Your choices often narrow down to: does this need the concept of a heading, does it need text?

What I’m doing to debug right now is cutting Scrivener out of the loop and just processing a dead-simple markdown file in pandoc. I think that’s the current gap in my knowledge. I’m (over)confident that I’ll get there.

That’s almost always what I do when I’m figuring something out. I can punch UpArrow+Return on the last latexmk -xelatex test command line over and over, get it all right with tweaks to the .tex file, then move on to putting that solution into the Pandoc template or whatever, and once I test that it works good against Markdown source, make Scrivener create that. Working the other way around is very slow, compile is by far the bottleneck, and feels too much like remote control to me.

My end goal is the same: once source Scrivener project that I can ship to many formats without manual monkeying.

And getting the structure more semantic, less style-heavy, is I think going to go a ways toward that, even if it might seem like abstraction might make it more difficult—like I said above, there is a lot of magic that a lot of volunteers over the years have put together to make our lives easier.


  1. But let’s say you don’t have the benefit of using Memoir, or your design case doesn’t fit its criteria, you can use ## Heading Title {.unnumbered} to essentially mimic much of this behaviour. It’s easy to make a Section Layout that does all of that for you. ↩︎

3 Likes

I agree with @AmberV for the heading structure, you should let Scrivener generate the headings levels to ingest and tweak from there. You can still use heading attributes added by the compiler to tag with styles or id {#identifier .class .class key=value key=value} for HTML / TeX / DOCX etc. Pandoc - Pandoc User’s Guide

becomes something like:

## Part One: The First Part {.part .unnumbered}

This should work for both HTML and TeX. One really helpful trick is to use standard input to test pandoc conversion quickly. You write something like pandoc -t latex in the terminal then [enter] then type your markdown the press [ctrl]+[d] to run the command on your entered fragments. So to see how LaTeX and HTML treat the heading above:

➪ pandoc -t latex

## Part One: The First Part {.part .unnumbered}

\subsection*{Part One: The First Part}\label{part-one-the-first-part}
\addcontentsline{toc}{subsection}{Part One: The First Part}
                                                                                                
 ➪ pandoc -t html

## Part One: The First Part {.part .unnumbered}

<h2 class="part unnumbered" id="part-one-the-first-part">Part One: The
First Part</h2>
1 Like

Yep. My Makefile is something like

test:
  printf "# My part title \n ## My chapter title" | pandoc -f markdown -d myconfig.yaml -t latex

Not exactly that, but the idea’s the same.

I’m pretty sure in this case I’ve got to specify the “unnumbered” class. The logical divisions, though, sound like they’ll require a template. I’ve already looked over the pandoc -D latex lightly, and it’s not scary.

There are certain markdown classes I think are floated through. I remember being shocked that [foo]{.smallcaps} did what I expected (and wanted!) so it’s a matter of documentation for myself “how to solve this.”

  • Style hints feel like markdown
  • Doc structure feels like pandoc
  • LaTeX and pdflatex are the consumers

Get the upstream right, and downstream will follow.


My only motivation for my own title page was inserting an imprint graphic on it, which may require a pandoc template solution. I’m hoping not, since I’d like that imprint on my epub as well, and don’t want to also customize an epub template.

Ah! It appears to be a difference between the book and memoir documentclass. book looks like it correctly honors the documented no-numbering options and variables e.g. numbersections: false or --number-sections=false, but memoir does not if there’s \frontmatter at play. \mainmatter resets the numbers. That’s the reason for my doubled-up header situation.

Edit: revised as I better understand what’s up.

So, I have answers!

For structure:

  • I’ve made text documents in Scrivener that I manually assign to an As-Is type of layout
  • The documents contain either #!mainmatter or #!backmatter and I added string substitution to convert them to LaTeX \mainmatter and \backmatter on the way through. I’m not 100% sure this is necessary but it gives me a visual cue where it’s going to appear in the output
  • Cloned pandoc’s -D latex dump to a custom one without those commands, so they all come from $body

For styling:

  • Include the appropriate “Front Matter” and “Back Matter” folders within Scrivener with the necessary magic docs inside, and place my special stuff in here as before: my full copyright page, my dedication page, etc.
  • In my pandoc template, fix up the Title page to include my imprint graphic
  • Also in the template and with much copypasta from StackOverflow and Reddit examples, define new environments for my custom stuff (the copyright, the dedication, etc.)
  • Finally, add a new Lua filter that looks for the :::{.fenced-div-classname} and converts the div markers into LaTeX begin/end pairs. I used divs heavily for CSS targets for the ebook, and now I can re-use for PDF

When it’s all typed out like that, I guess I’ve done a lot! But I couldn’t have done it without the tips in this thread, so thank y’all very much! I’m down to tweaking the LaTeX environments and looking into specific options for ToC and footnote spacing, which is the kind of obsessive fine-tuning I prefer to “why is my dedication glued to the end of the copyright page?” drama.

1 Like