Compiling To LaTeX -- One file per section?

I think you might do better if you posted in the forum dedicated to MultiMarkdown and LaTeX.

Before I try to parse all this for myself, can I ask — Is this the general method I should think about using if my use case is this:

I have a Scrivener project with several documents with rich text and that I want to be separate markdown documents on my machine that I can later compile with Pandoc into separate pdfs. Each document in the Scrivener project has its own metadata section at the top delimited by two instances of ---.

What is my best route to ending up with a markdown file for every document in the Scrivener project?

Well one thing to consider right off the top, if you’re heading toward Pandoc with the intention of generating individual HTML files, is to consider installing Pandoc 3, and taking a look at its new ‘chunkedhtml’ output method. In conjunction with the --split-level=n flag, which lets you choose which heading level to split into documents. That may be all you need. HTML is all it supports at the moment though. I hope to see that broaden into a more general-purpose capability in the future.

Otherwise, let me know what further questions you might have, after going through the thread, because that is covered and demonstrated with samples and quite a few follow-up discussions on customisation and usage.

Thanks! Actually, my use case is using the following string to compile the individual md files to pdfs:

for i in *.md ; do echo "$i" && pandoc -s $i --template=letter.template.tex --pdf-engine=xelatex -o $i.pdf ; done

All right, yeah you’ll find a line in the sample project’s script where you can slot that in, if you want to essentially just get straight to a series of PDFs out of the compiler. If not it can be commented out.

The one other tweak I can think of that you might need to do is how it handles metadata. It presumes each file will be using the same metadata block, as I recall, and so it captures that initially and then prepends it to each output file. It should be a simple matter to remove that insertion and let the binder items themselves drive their own metadata.

Thanks so much. I downloaded the sample project, and it split the documents under ‘Red book’ and ‘Black book’ just as you intended, as separate tex files. But I couldn’t seem to make the changes needed to compile and split several project documents into markdown files. Would I need to edit the script to accomplish this?

Yes, this was a very single-purpose demonstration, I think a better script+project example would push more of the configuration into the project and make it easier to switch post-processing on or off—perhaps by using the extension more intelligently, allowing for mixed content out of one compile command. I.e. the extension printed by the Section Layout prefix tab could inform the script whether post-processing should be used for that specific chunk, and if so, to what file type.

I’ve never had time to come back to this and make it a more universally useful tool though.

But for your specific purpose, try the attached script instead:

splitter.rb (1.0 KB)

  • The metadata block capture and prepending into each sub-file is removed, since you stipulate putting metadata into each section that is meant to become a file.
  • The post-processing command is removed from the final loop (where footnotes and image references are added to each file), and replaced with a simple command to move the temp file into the working folder.

As for what to change in the project:

  1. Edit the “MultiFile Output” format.
  2. Change the Section Layout: New File: Prefix tab to end in .md instead of .tex.
  3. Likewise fix the “references.tex” entry in the Text Layout pane.
  4. In Processing, tick Use Pandoc syntax and update the embedded script with the above content, or switch to external execution.
1 Like

Thank you! Worked like a charm. I just had to add ‘Emphasis’ to the format’s style list.

Final question:

Is there any way to run my original pandoc processing commands after the ruby script runs? A reminder what those commands were:

for i in *.md ; do echo "$i" && pandoc -s $i --template=letter.template.tex --pdf-engine=xelatex -o $i.pdf ; done

That should work as-is right? Just drop it in as the last line of the script and put backticks around it to make it a system call.

How I would do it though is more like the first script, where the Pandoc call is placed into the final loop, replacing the original sample MultiMarkdown call, rather than (or maybe in addition to) moving the .md file from the tmp folder to the output folder. You’d have to rework things a bit though, since the filename variable includes the extension. Something like this:

file_chunks.each_pair do |filename, tmpfile|
  references.rewind
  tmpfile.print references.readlines.join("\n")
  tmpfile.close
  base_name = filename.gsub(/\.(\w+)?$/, '')
  `pandoc -f markdown -s --template=letter.template.tex --pdf-engine=xelatex -o #{base_name}.pdf #{tmpfile.path}`
  File.rename(tmpfile.path, filename)
end

Of note, I’m hinting markdown input since the tmpfile naming scheme confuses Pandoc. It does the right thing, falling back to Markdown, but better to let it know that’s the source format.

But like I say, I don’t see why the brute force method wouldn’t work either.

1 Like

Hi,

I would like to create a website using Quarto. Quarto can assemble Pandoc Markdown files into a website, starting from an index (similar to the Binder list of files) and a series of separate .md or .qmd files. The index tells the Quarto compiler how the separate files are to be arranged in the website, including their hierarchy.

If Scrivener could batch-compile each individual document into a separate .md file, and save the Binder list as a text file including the names of the files with different indents, all the files needed to create a Quarto website would be there.

Export would unfortunately not work, since Scrivener needs Compile for replacements, transformation and scripts.

Paolo

Have a look at the workflow given above. This one was built specifically to handle splitting a LaTeX project into multiple .tex files, so for a more general-purpose .md file splitter, see this post as well.

The main limitation I can think of at the moment is cross-referencing between sections that get split up.

This indeed worked for me. I used the proposed MD workflow, and the project is compiled to two separate .md files.

By adding them to a Quarto website, and manually entering the website structure in the _quarto.yml file, I got a website mirroring the Scrivener project:

Issues:

  • I don’t know where that hashtag after the index items comes from. It’s probably added by the script, while Quarto only wants the left hashtag for headings.

  • The website structure is still to be made by hand. Not a major issue, with the help of the workflow to export the Binder structure as a text file. All considered, this is a work that to be made once, and only modified when the Scrivener project’s structure is modified. Small updates to the project wouldn’t involve touching the structure manually entered in the _quarto.yml file.

  • As written somewhere in this thread, cross-references should be broken, and require manual rebuilding. For long projects, this can be a real pain. Unless I’ve missed a workaround.

Paolo

1 Like

Glad to hear it’s a functional start.

Building the ToC YAML wouldn’t be too difficult. I could have a look at that if you haven’t figured it out yet, but just building a string in that first loop, as filenames are calculated, should be sufficient. The result of that would then need to be written out to some master file I suppose. I’m not familiar enough with Quarto off the top of my head to know what that looks like.

I don’t know where that hashtag after the index items comes from. It’s probably added by the script, while Quarto only wants the left hashtag for headings.

The script doesn’t touch text, other than to locate split markers and remove them.

That’s a setting you can disable in Scrivener. Previously it would always balance hashes on the left and right of the heading text, and still does when creating a blank new Format, or copying from any of the MultiMarkdown formats. But a release or two ago I did change all of the Pandoc specific formats to only use left side hashes. It’s not that Pandoc cannot handle them, but they can conflict with easily adding attributes to the line, a thing MMD doesn’t have syntax for. For example, ### Heading Title {.work-in-progress}. Quarto does use these, so there might be some bad interaction going on there.

To fix it, edit the compile Format, and in the Section Layouts pane, click the options button in the top-right corner and disable Add closing hashes to titles.

  • As written somewhere in this thread, cross-references should be broken, and require manual rebuilding. For long projects, this can be a real pain. Unless I’ve missed a workaround.

Is that actually a problem with Quarto though? Since it encourages this multi-file approach, doesn’t its cross-referencing syntax handle that problem automatically? For example, their syntax is @fig-someimage, which presumably would create something like, <a href="targetfilename.html#fig-someimage">Reference text</a>.

Otherwise, it’s not a massive burden to form your links correctly in the first place—it’s something you’d be doing if you were using Markdown to create a site anyway, the only difference here is that you’re using some automation to go from a single-document output a many-file output rather than working in many files from the start. The main alteration you’d probably want to change is to remove the auto-numbering placeholder from the filename output in the Section Layout that does that, so that your file names can be predictable; “BlackBook.html” instead of “2-BlackBook.html”. Considering how linked placeholders work in Scrivener, you could even keep your syntax agile in the editor, with something like this:

[Refer to <$title>](<$titlenospaces>.html#fig-someimage)

…where both of these placeholders are linked to the file generating level of the outline this figure descends from.

The alternative is to implement CommonMark/Pandoc compatible parsing for anything that might have an anchor generated for it, including Pandoc’s attribute strings, and build an array of that as the script scans the file, then also have parsing for any form of syntax used to create links, so that the calculated filenames can be prefixed to the URLs found within them—and as well avoid damaging any URLs that shouldn’t be prefixed.

That’s quite a lot of work for a little simple script like this! Especially when it can be solved by just doing things correctly to begin with rather than taking shortcuts with the link URL (or using a processing system capable of handling cross-file link resolution). Last I checked, there are no Ruby gems for full-scale Pandoc markdown parsing. Plenty of stuff for integrating with it, wrappers for it and the like, but understandably there isn’t as much of a desire to essentially replace it.

1 Like

@AmberV, for Quarto, there is no need to fix cross-reference markup this way. All one has to do is use its proper markup for the element being referenced.

If I had the following:

## Text {#sec-text}

Lorem...

This would work anywhere else in the project (regardless of Book/Website).

So on and so on [@sec-text].

The same goes for Figures, Tables, Listings, and Equations, apart from Conjectures, Corollaries, Definitions, Examples, Exercises, Lemmas, Propositions, and Theorems.

2 Likes

Thanks for the confirmation. I figured they would be doing some magic, or if not already, it would be high on the development list, considering the design of the system.

One more reason to look into this system for production work in my opinion. Referencing has been “okay” in both MMD and Pandoc for some time, but with weaknesses in areas like multi-file management and uniformity of the syntax (like how MMD uses [...](#...) for images and [...][...] for tables).

2 Likes

Indeed. I think the folks developing Quarto were smart to take advantage of existing programs (e.g. Pandoc) and existing filters (e.g. Pandoc Crossref).

To my surprise, one extra (unrelated) reason to use it, completely new to me until recently, is revealjs. This must be the best system that I have come across so far to create slide presentations. It is beautiful and cross-references work like they would in a Quarto Book or Website. I couldn’t ask for more out of it.

1 Like

Maybe the most interesting documentation on this issue is the auto-generation of a ToC in the Quarto guide:

Quarto Websites

I’m not sure I entirely got it. Something I know is that I would like the ToC or Sidebar to be populated with the name of the actual heading, and not of the file. This latter can’t be translated (for lack of clarity in reading the structure), the former can and is.

Paolo

If you add a title parameter, it will be used instead of the filename.

---
title: My Title
---
1 Like

There are two ingredients that could make the ‘auto’ option better, from what I see, both of which could be added to the “New File” Section Layout in the Prefix tab:

>>>> <$n:fileID>-<$title_no_spaces>.qmd

---
title: "<$title>"
order: <$n#fileID>
---

A few tips:

  • If you name your Quarto website source folder with “-mmd” on the end off it, then compiling with this folder as your target will cause all graphics and split .qmd files to deposit directly into it, rather than into a subfolder.
  • You will probably also want to disable the Title checkbox on the “New File” layout, as Quarto inserts an h1 itself. I got doubled up titles at first. h2 should be the highest hash heading output by Scrivener.
  • As in the original instructions, make sure to update the extension in the Text Layout compile format pane, so that footnotes, links and image references are merged into each split file.
  • The splitter.rb script should be modified to change the FILE_EXTENSION constant to ‘qmd’, and the conditional around line 16 that checks the file extension should be modified from filename[-3..-1] to filename[-4..-1] as the original only checked for ‘md’. I should make that better so that it uses regex or something instead of a static file extension length. This is all kind of optional I think? It seemed to recognise .md files as well, so you could maybe skip all of that and leave the compile format using .md as it was. I like to as the Romans do when in Rome, though.
  • Make sure the Delete source file after processing option is enabled in the Processing compile format pane, otherwise the pre-split .md file will end up in the website.

I think that’s pretty much everything. I was able to rebuild the site with zero intermediate steps once I had a little test Quarto container set up.

1 Like

A post was merged into an existing topic: Compile creating just one paragraph in LaTex Memoir book