Compiling chapters to individual files

TedC · June 2, 2021, 8:52am

Hi there,

We’re launching a book I recently finished to BETA readers this week. My team put together a clever automated email and feedback system, which is very exciting. However, they’ve asked me to provide all 33 chapters of the book as individual PDF and Mobi files. I can, of course, compile each chapter manually, but does Scrivener have some slick way to automate the process?

Thanks,
Ted

rms · June 2, 2021, 8:59am

There may be a “slick” way, but I’m not aware and may be wrong. Others can suggest.

To help you get going in case no slick way, if I were tasked with this, I’d make one big PDF, then copy it 32 more times renaming each with the relevant chapter. Then with Preview (or other PDF editor) delete the non-relevant pages for each chapter’s version that are in their individually named files for 33 chapters. Can’t take that long and probably faster than rendering each in Scrivener. Or, may be a wash.

For the Mobi’s, I’m guessing you are stuck doing each individually until a slick way arives.

AmberV · June 2, 2021, 11:22am

It basically boils down to this:

If you use Scrivener like a word processor, and compile out formatted documents using its built-in converters, like Kindle and PDF—then no. You’re going to have to figure out a way of doing that yourself after you compile. For PDF, there are good tools for that (including the Mac’s own free Preview.app). For Mobi, that’s going to be more awkward, you’d probably end up using something like Calibre in conjunction with a compile format Scrivener can produce that is easy to chop up. Ebooks on their own are not easy to chop up. So I would recommend maybe RTF for this. Calibre makes bulk conversion a snap once you get it set up, so there may be a little initial investment, but once you have things the way you want them, it should be fairly simple to pump out updates. The hardest part will probably be snipping the files up by hand in Preview/LibreOffice.
If on the other hand you use Scrivener as an authoring platform for some manner of plain-text markup, like Markdown, then the answer is yes. Because really at that point the answer is YES to just about anything you can imagine doing—because plain-text and Markdown files can be automatically post-processed using full programming languages and system tools. The compiler is, essentially, fully programmable.

To that end, here is a file splitter example that I put together for someone. In this case the request to make multiple LaTeX documents bound together by a single index file at the top—but given how Markdown (and other markup languages) can be transformed into almost any kind of format you would ever need—including ebooks—it would be trivial to change this example project to make 33 .mobi files.

Now something worth considering is that with Scrivener you don’t have to start down the more flexible path with your writing methods, in order to tap into most of what it can do. In fact, I may not recall correctly, but I think that example project may have even presumed a rich text workflow at the top, being converted to Markdown and then from there into multiple LaTeX files. Even if not, it’s one single checkbox to try it out, and you may have to tweak less than you’d think in order to get it working well (a heavier reliance upon styles over ad hoc formatting, which is generally meaningless to semantic systems like Markdown, than you may be used to, for example).

TedC · June 2, 2021, 11:42am

rms and AmberV,

Thanks for the replies!

I’m looking at the file splitter ruby. I think I can probably get it setup and working. But it looks to me like it will produce split Markdown or Latex files. How do I then get those into my final format? Is that also a Calibre job? Or am I missing something key in the ruby script?

Thanks!

AmberV · June 2, 2021, 12:28pm

So firstly what I would recommend is looking into Pandoc as your conversion tool, rather than MultiMarkdown. It’s a robust system, but it is well-documented and most importantly it is capable of producing ePub files. Those ePub files could then in turn be processed through KindleGen (which can be found buried in Kindle Previewer, which is exactly what Scrivener uses). Assuming a standard install location, the path to the kindlegen script is:

'/Applications/Kindle Previewer 3.app/Contents/lib/fc/bin/kindlegen'

Now as for the structure of the Ruby script itself, there are two main blocks within it:

The first block extends down to line 26, and that’s the basic file splitter routine. You probably wouldn’t have to mess with that much if at all. It builds out the files to a temp folder, and adds the filenames to an array for further processing.
The actual conversion (in this case to LaTeX) is handled in the shorter loop starting on line 29. In that, the main thing to change is line 34, which currently converts each text file to LaTeX format.

For Mobi would need at least two commands: the first would use Pandoc to generate the .epub file, and the second would use kindlegen to generate the .mobi file. It would take a little experimentation to get the flags set the way you want, and you may also want to experiment with the documented support files Pandoc can use to pretty it up a bit with CSS and such.

I find it is best to get these commands fine-tuned in Terminal, and once I get a result I like, I implement them into the script and give it a test in the compiler.

Pandoc also does PDF, though it does require a LaTeX workflow in place, and unless you’re already familiar and comfortable with that—or are happy with extremely vanilla output—I would recommend a different approach. Maybe using Pandoc to create .docx files, and then using some other tool to convert them to PDF, maybe Calibre can do that, but I bet there are some good solutions out there for that—it may not be something you can fully automate, but having 33 .docx files all ready to go would be a big leap ahead of having to compile 33 times, or spend an hour in Preview splitting up a PDF file.

TedC · June 3, 2021, 8:49am

Thanks AmberV, I appreciate all the help.

Your test file seems to work perfectly. I compile it, and it comes out split before I even execute the ruby script. (Is it being called as part of the compile somehow?)

However, I’m not able to get your compile preset into my project. I’ve used the export and import feature, tried dragging it from one project to the other, and even replicating each setting pane by pane. I can’t seem to do it. Am I missing an obvious step?

TedC · June 3, 2021, 8:56am

I was looking in PDF export instead of in MultiMarkDown.

Now, I’m getting a compile error:

/var/folders/m1/qbytygbn28d6pw0qm0rh076w0000gn/T/my-script:11:in `split': invalid byte sequence in US-ASCII (ArgumentError)
	from /var/folders/m1/qbytygbn28d6pw0qm0rh076w0000gn/T/my-script:11:in `<main>'

Is my-script the splitter.rb script? Or is it something else?

AmberV · June 3, 2021, 9:33am

The solution to that can be found further down in the thread.

TedC · June 3, 2021, 10:35am

Thanks for all the help, AmberV. I’ve got it working now.

I ran into the same issue as the person at the bottom of the other thread once I installed RonJeffries fix.

/System/Library/Frameworks/Ruby.framework/Versions/2.6/usr/lib/ruby/2.6.0/delegate.rb:349:in `write': U+00E9 from UTF-8 to US-ASCII (Encoding::UndefinedConversionError)
	from /System/Library/Frameworks/Ruby.framework/Versions/2.6/usr/lib/ruby/2.6.0/delegate.rb:349:in `print'
	from /System/Library/Frameworks/Ruby.framework/Versions/2.6/usr/lib/ruby/2.6.0/delegate.rb:349:in `block in delegating_block'
	from /var/folders/m1/qbytygbn28d6pw0qm0rh076w0000gn/T/my-script:25:in `block in <main>'
	from /var/folders/m1/qbytygbn28d6pw0qm0rh076w0000gn/T/my-script:12:in `each'
	from /var/folders/m1/qbytygbn28d6pw0qm0rh076w0000gn/T/my-script:12:in `<main>'

But simply disabling the script and running it command line gets me going.

I also had a hang up with my chapter titles. They all included punctuation like colons and apostrophes. My fix was simply to change the prefix to:

>>>> Chapter-<$n:fileID>.tex

Not as elegant, but it gets the job done. Again, thanks so much for all the help!

AmberV · June 3, 2021, 10:48am

Let’s see, try commenting out the previously suggested line, and place this one toward the very top of the script, instead:

Encoding.default_external = Encoding::UTF_8

That is what I am currently using in my script that post-processes the user manual Markdown file, which contains quite a lot of Unicode, and I don’t run into any issues with it.

Disclaimer: I am using Ruby 3.0.1, installed via homebrew. I’m not positive of the compatibility with 2.6—but I’m pretty sure I switched to using this directive before upgrading my local system.

Let me know how it goes, I should update the sample project so that other people do not continually run into this issue.

TedC · June 3, 2021, 11:06am

That fixed the error!

However, I seem to be losing bold, italics and tab spacing. Is that a setting I’m skipping over somewhere?

AmberV · June 3, 2021, 11:08am

Great! Thanks for the confirmation that it works fine in the older system installed Ruby, I’ll get that sample fixed with that adjustment.

As for basic formatting, make sure you have the Convert rich text to MultiMarkdown setting ticked, in the General Options tab of the compile overview screen (it works for Pandoc dialect as well, no worries, the label just doesn’t update if you have Pandoc syntax flag ticked in the Processing compile format pane).

That said, tab stops are completely meaningless to Markdown. What is it you are attempting to accomplish with them, in a semantic sense (as that is how you have to think with Markdown)?

TedC · June 3, 2021, 11:14am

Yep. That was the problem for bolds and italics.
I just think the paragraphs are easier to read when they start with a tab in. Maybe that will all be sorted by my eventual output, but I don’t see any indicators in the .tex files.

AmberV · June 3, 2021, 11:26am

Oh yeah, with ePub you’re going to want to use CSS to format paragraph indenting. There are instructions for how to set that up in the Pandoc documentation—it’s pretty easy to do, just put your .css file somewhere related to your output workflow, and supply the full path to it in your command-line as instructed. While you’re in that part of the docs, note you can also set up ePub metadata for stuff like author and title.

As a hint, you might want to examine Scrivener’s built-in Ebook compile format’s CSS pane. It has a pretty good setup for paragraph indent handling, which will suppress first-line indents automatically where it is traditional to do so (after major headings, etc.). In fact a lot of that stock CSS may be nice to dump into your custom .css files, though some if may require a little tweaking since Scrivener’s HTML output is different from Pandoc’s, internally.

As for DOCX/PDF, yeah that kind of detailing should be handled for you. With Pandoc’s DOCX, look up instructions on how to create a default stylesheet. It’s pretty easy to do, and you can add personal touches so it doesn’t look like basic default Word—but I think the default template may be indented (it’s been a while since I’ve used defaults!).

A general rule of thumb: with Markdown you don’t worry about these kinds of detailing issues while writing. A paragraph is simply a paragraph, as indicated by a clear line of space around it. How that paragraph is formatted some day is a matter for the conversion and stylesheet mechanisms.

TedC · June 4, 2021, 10:18am

Hey AmberV,

I now have 33 lovely epubs, and a preset that can reproduce them at will.
My final output line looks like this in case it’s useful to someone else:

pandoc -f markdown_mmd -t epub3 --epub-cover-image=</path/to/cover.jpg> --css=</path/to/epub.css -o #{filename} #{tmpfile.path}

I used the Scrivener epub.css without any edits at all.

I’m wondering about metadata editing. I see, for example, that each of my documents (in addition to having title, author, etc) has this line:

base header level: 2

It’s not a big deal, but I would love to get rid of it. I also think I could have pandoc produce a very acceptable pdf, if I could only add one additional line of latex:

\setlength\parindent{24}

Is there a way to add it into the header in Scrivener?

AmberV · June 4, 2021, 12:18pm

Great! Thanks for posting the solution for anyone else that may need it.

So one thing that you may want to fix, it looks like you are using -f markdown_mmd, which is what you would need if you were generating MultiMarkdown from Scrivener, but if you tick the Use Pandoc syntax checkbox at the top of the Processing pane, you probably don’t want it set that way.

The “base header level” metadata should be inert with Pandoc. It does have a similar mechanism (basically this setting would treat top level headings “#” as second level headings “##” for purposes of conversion, meaning top level becomes <h2>), but it is driven by a command-line flag. As for where it is coming from, I’m not sure, but the two places in Scrivener it would come from are the Metadata tab in compile overview, or the same named pane in the Format designer.

It’s not a big deal, but I would love to get rid of it. I also think I could have pandoc produce a very acceptable pdf, if I could only add one additional line of latex…

I agree, I much prefer indented paragraphs to spacing. This is fortunately very easy to fix. Just edit the compile format, and in the Metadata pane, add a key for indent and set its value to true.

If you’re curious as to what that actually does, particularly if you want to change the indent amount, or simply make it so Pandoc doesn’t generate LaTeX that way for any documents going forward, read on.

Pandoc customisation

So where does ‘indent’ come from and how can you tweak it? Pandoc is very customisable! Here’s a simple way to try it out, which will override the default LaTeX output (including PDF) for all Pandoc output. You can read more about this in the Templates chapter—if you wanted different output templates for different projects, then you can do that as well.

$ mkdir -p ~/.pandoc/templates
$ pandoc -D latex > ~/.pandoc/templates/default.latex

In the resulting file, you’ll note a lot of Pandoc-specific conditional template code that is pretty easy to read around. Basically for this specific thing you just want to find where it is overriding the default LaTeX look and remove that bit. What you’re looking for should look like this:

$if(indent)$
$else$
\makeatletter
\@ifundefined{KOMAClassName}{% if non-KOMA class
  \IfFileExists{parskip.sty}{%
    \usepackage{parskip}
  }{% else
    \setlength{\parindent}{0pt}
    \setlength{\parskip}{6pt plus 2pt minus 1pt}}
}{% if KOMA class
  \KOMAoptions{parskip=half}}
\makeatother
$endif$

So that is how you can look for a document variable, (indent: true in this case) and modify the output of the preamble. You could also just delete this entire section and leave paragraph policy default no matter what.

But that’s going to be useful if you want to tweak any aspect of the output, naturally. There are further things you can do, such as creating your own filter (Lua scripting), which gives you complete control over the syntax level output as well—i.e. what happens when you italicise something in Scrivener can be changed from \emph{text} to your own macro, etc.

Brix · September 2, 2021, 11:06am

I want each folder / file to be compile in a single document, e.g. odt (instead of all together in a single file). How could I achieve that?