Compiling To LaTeX -- One file per section?

ptram · September 2, 2023, 11:18pm

Hi,

I would like to create a website using Quarto. Quarto can assemble Pandoc Markdown files into a website, starting from an index (similar to the Binder list of files) and a series of separate .md or .qmd files. The index tells the Quarto compiler how the separate files are to be arranged in the website, including their hierarchy.

If Scrivener could batch-compile each individual document into a separate .md file, and save the Binder list as a text file including the names of the files with different indents, all the files needed to create a Quarto website would be there.

Export would unfortunately not work, since Scrivener needs Compile for replacements, transformation and scripts.

Paolo

AmberV · September 2, 2023, 11:25pm

Have a look at the workflow given above. This one was built specifically to handle splitting a LaTeX project into multiple .tex files, so for a more general-purpose .md file splitter, see this post as well.

The main limitation I can think of at the moment is cross-referencing between sections that get split up.

ptram · September 3, 2023, 11:08am

This indeed worked for me. I used the proposed MD workflow, and the project is compiled to two separate .md files.

By adding them to a Quarto website, and manually entering the website structure in the _quarto.yml file, I got a website mirroring the Scrivener project:

Issues:

I don’t know where that hashtag after the index items comes from. It’s probably added by the script, while Quarto only wants the left hashtag for headings.
The website structure is still to be made by hand. Not a major issue, with the help of the workflow to export the Binder structure as a text file. All considered, this is a work that to be made once, and only modified when the Scrivener project’s structure is modified. Small updates to the project wouldn’t involve touching the structure manually entered in the _quarto.yml file.
As written somewhere in this thread, cross-references should be broken, and require manual rebuilding. For long projects, this can be a real pain. Unless I’ve missed a workaround.

Paolo

AmberV · September 3, 2023, 12:06pm

Glad to hear it’s a functional start.

Building the ToC YAML wouldn’t be too difficult. I could have a look at that if you haven’t figured it out yet, but just building a string in that first loop, as filenames are calculated, should be sufficient. The result of that would then need to be written out to some master file I suppose. I’m not familiar enough with Quarto off the top of my head to know what that looks like.

I don’t know where that hashtag after the index items comes from. It’s probably added by the script, while Quarto only wants the left hashtag for headings.

The script doesn’t touch text, other than to locate split markers and remove them.

That’s a setting you can disable in Scrivener. Previously it would always balance hashes on the left and right of the heading text, and still does when creating a blank new Format, or copying from any of the MultiMarkdown formats. But a release or two ago I did change all of the Pandoc specific formats to only use left side hashes. It’s not that Pandoc cannot handle them, but they can conflict with easily adding attributes to the line, a thing MMD doesn’t have syntax for. For example, ### Heading Title {.work-in-progress}. Quarto does use these, so there might be some bad interaction going on there.

To fix it, edit the compile Format, and in the Section Layouts pane, click the options button in the top-right corner and disable Add closing hashes to titles.

As written somewhere in this thread, cross-references should be broken, and require manual rebuilding. For long projects, this can be a real pain. Unless I’ve missed a workaround.

Is that actually a problem with Quarto though? Since it encourages this multi-file approach, doesn’t its cross-referencing syntax handle that problem automatically? For example, their syntax is @fig-someimage, which presumably would create something like, <a href="targetfilename.html#fig-someimage">Reference text</a>.

Otherwise, it’s not a massive burden to form your links correctly in the first place—it’s something you’d be doing if you were using Markdown to create a site anyway, the only difference here is that you’re using some automation to go from a single-document output a many-file output rather than working in many files from the start. The main alteration you’d probably want to change is to remove the auto-numbering placeholder from the filename output in the Section Layout that does that, so that your file names can be predictable; “BlackBook.html” instead of “2-BlackBook.html”. Considering how linked placeholders work in Scrivener, you could even keep your syntax agile in the editor, with something like this:

[Refer to <$title>](<$titlenospaces>.html#fig-someimage)

…where both of these placeholders are linked to the file generating level of the outline this figure descends from.

The alternative is to implement CommonMark/Pandoc compatible parsing for anything that might have an anchor generated for it, including Pandoc’s attribute strings, and build an array of that as the script scans the file, then also have parsing for any form of syntax used to create links, so that the calculated filenames can be prefixed to the URLs found within them—and as well avoid damaging any URLs that shouldn’t be prefixed.

That’s quite a lot of work for a little simple script like this! Especially when it can be solved by just doing things correctly to begin with rather than taking shortcuts with the link URL (or using a processing system capable of handling cross-file link resolution). Last I checked, there are no Ruby gems for full-scale Pandoc markdown parsing. Plenty of stuff for integrating with it, wrappers for it and the like, but understandably there isn’t as much of a desire to essentially replace it.

bernardo_vasconcelos · September 3, 2023, 2:09pm

@AmberV, for Quarto, there is no need to fix cross-reference markup this way. All one has to do is use its proper markup for the element being referenced.

If I had the following:

## Text {#sec-text}

Lorem...

This would work anywhere else in the project (regardless of Book/Website).

So on and so on [@sec-text].

The same goes for Figures, Tables, Listings, and Equations, apart from Conjectures, Corollaries, Definitions, Examples, Exercises, Lemmas, Propositions, and Theorems.

AmberV · September 3, 2023, 2:13pm

Thanks for the confirmation. I figured they would be doing some magic, or if not already, it would be high on the development list, considering the design of the system.

One more reason to look into this system for production work in my opinion. Referencing has been “okay” in both MMD and Pandoc for some time, but with weaknesses in areas like multi-file management and uniformity of the syntax (like how MMD uses [...](#...) for images and [...][...] for tables).

bernardo_vasconcelos · September 3, 2023, 4:04pm

Indeed. I think the folks developing Quarto were smart to take advantage of existing programs (e.g. Pandoc) and existing filters (e.g. Pandoc Crossref).

To my surprise, one extra (unrelated) reason to use it, completely new to me until recently, is revealjs. This must be the best system that I have come across so far to create slide presentations. It is beautiful and cross-references work like they would in a Quarto Book or Website. I couldn’t ask for more out of it.

ptram · September 3, 2023, 6:02pm

Maybe the most interesting documentation on this issue is the auto-generation of a ToC in the Quarto guide:

Quarto Websites

I’m not sure I entirely got it. Something I know is that I would like the ToC or Sidebar to be populated with the name of the actual heading, and not of the file. This latter can’t be translated (for lack of clarity in reading the structure), the former can and is.

Paolo

bernardo_vasconcelos · September 4, 2023, 12:22pm

If you add a title parameter, it will be used instead of the filename.

---
title: My Title
---

AmberV · September 4, 2023, 1:33pm

There are two ingredients that could make the ‘auto’ option better, from what I see, both of which could be added to the “New File” Section Layout in the Prefix tab:

>>>> <$n:fileID>-<$title_no_spaces>.qmd

---
title: "<$title>"
order: <$n#fileID>
---

A few tips:

If you name your Quarto website source folder with “-mmd” on the end off it, then compiling with this folder as your target will cause all graphics and split .qmd files to deposit directly into it, rather than into a subfolder.
You will probably also want to disable the Title checkbox on the “New File” layout, as Quarto inserts an h1 itself. I got doubled up titles at first. h2 should be the highest hash heading output by Scrivener.
As in the original instructions, make sure to update the extension in the Text Layout compile format pane, so that footnotes, links and image references are merged into each split file.
The splitter.rb script should be modified to change the FILE_EXTENSION constant to ‘qmd’, and the conditional around line 16 that checks the file extension should be modified from filename[-3..-1] to filename[-4..-1] as the original only checked for ‘md’. I should make that better so that it uses regex or something instead of a static file extension length. This is all kind of optional I think? It seemed to recognise .md files as well, so you could maybe skip all of that and leave the compile format using .md as it was. I like to as the Romans do when in Rome, though.
Make sure the Delete source file after processing option is enabled in the Processing compile format pane, otherwise the pre-split .md file will end up in the website.

I think that’s pretty much everything. I was able to rebuild the site with zero intermediate steps once I had a little test Quarto container set up.

AmberV · September 4, 2023, 8:50pm

A post was merged into an existing topic: Compile creating just one paragraph in LaTex Memoir book