Scrivener → Squarto

UPDATE 2024-09-23:

It took me awhile, but I figured out how to add a feature to Squarto so that it would extract the Style info from the .scriv package – specifically, from three sources,

  • .scriv/Files/styles.xml
  • .scriv/Settings/Compile Formats/Squarto.scrformat
  • .scriv/Files/Data/⟨UUID⟩/content.styles – each .rtf file will have a corresponding one if any Styles were applied to this document via Scrivener’s editor.

It involves a bunch of juggling to create the proper data structure because the styles.xml file encodes the style UUIDs, the .scrformat file encodes which prefix/suffix to use in replacement of each style (set up within Scrivener’s Compile settings when editing the compile format), and the individual content.styles files are essentially a comma-separated value (CSV) file of the UUIDs utilized in a corresponding content.rtf, with the index position of that UUID being the number that appears in Scrivener’s style markup tags – e.g., <$Scr_Cs::0> refers to the UUID in the 0th position.

Long story short, what this means is that Squarto will now compile Scrivener Styles, so I can now use Scrivener’s Styles to eliminate the need to write routine Markdown tags. But Squarto also allows me to write in raw Markdown (it is respected by default).

The best part is that the styles are not “hardcoded” – rather, the style info is extracted from the .scriv package at the time that Squarto is run, so if you add/delete/edit a Style via Scrivener, or if you change the prefix/suffix in the compile format settings, the next run of Squarto will detect and implement the updated info automatically.

I’m pretty excited about the progress so far. Now that this is working, I am going to fine-tune my Styles so that they look better within Scrivener.

One thing I’m having trouble with: Converting Emoji. Interestingly, other Unicode symbols work, but Emoji cause an error (so Squarto catches the error and replaces the Emoji with a ? symbol). It would be nice to get Emojis to work though. Luckily, Quarto does have extensions for fancy icons, like Fontawesome and Academicons.

Next on my to-do list:
To work on processing of figures/images that are stored in the Binder or embedded (cut-n-pasted) into one of the text documents.
To get nested text documents to be collated into the parent as a single chapter.
To refactor Squarto into a CLI that accepts the .scriv package filename as an argument
To set up Squarto to run via the File > Compile command (so I don’t have to run it separately at the command line).

Again, a big THANK YOU to Keith Blount and the rest of the Scrivener development team for making the .scriv file so parsable. It is creative, effective, and very forward-thinking.

1 Like

Cool!!! How do you parse the actual RTF, do you first extract styles tags like <$Scr_Cs::0> then run it through an existing RTF library or do all processing yourself? Keith’s choice to use plain text for everything definitely pays off for both flexibility for off-target use and future proofing, everything in a project bundle is so clearly parseable even if Scrivener the app disappeared tomorrow out projects would be readable…

@AmberV – maybe this ↑ post can be broken out to a shiny new thread about Squarto, as we are slowly veering away from the original topic of this thread?

The general algorithm is:

  • Find content.rtf file in .scriv/Files/Data/UUID
  • Move content.rtf file into the proper location in the Quarto Book directory tree
  • Convert from .rtf → plaintext, adding back the relevant Markdown tags based on the Styles info, and saving as .qmd (with the filename being a slugified version of the name used in the Binder)
  • Post-process the .qmd file to clean up any artifacts (such as whitespace preferences, etc.)

I initially used pypandoc to handle the .rtf → plaintext conversion, but pypandoc doesn’t respect whitespace as-written during the conversion to plaintext. (Pandoc collapses newlines.) Preservation of whitespace (especially newlines) is pretty important in Quarto when dealing with paragraphs, divs, embedded LaTeX, etc. So I found an algorithm in GitHub for .rtf → .txt conversion that worked better, and I incorporated that.

I continue to iterate on Squarto. It’s a fun project with a lot of potential. It’s essentially a Quarto-specific compiler for Scrivener. But (thanks to your suggestion from last week!!) I’ve experimented with ways to use Scrivener’s UI for setting up Compile settings and Styles. My favorite part of Squarto, though, is the very first thing I worked out: the very visual method of “laying out” the book structure in the Binder. It works so well, and is a million times better than creating .qmd files by hand and adding the filenames manually into the _quarto.yml file.

If we split this off into a Squarto thread, I’ll be happy to explain my approach and the code for anybody interested. Sooner or later (once I refactor it a bit into a Python library with a CLI), I will post it to GitHub. It’s working reasonably well already, but it’s clearly “alpha” software and a work-in-progress.

As for this original thread, “Thoughts for academic Markdown in a future version of Scrivener,” my experiments with Squarto have helped me clarify what the Scrivener 4 Compiler would have to do:

  • The Compiler should have a “DAG” (directed acyclic graph) interface that basically allows you to specify a pipeline or workflow for each element in the Binder: step 1 → step 2 → step 3 → etc. So instead of the Compiler going from “top to bottom” (simply aggregating the text documents in the Binder), it should allow specification of processing of each text document separately. Some of these text documents may then get aggregated with ones below it, but some of them may be totally separate things (for example, being saved as separate files).
  • On that note, the Compiler should allow saving files stored in the Binder to a specific directory.
  • Don’t limit frontmatter and backmatter to one element in the Binder – rather, allow Compiling anything in the Binder, even if outside of DraftFolder;
  • Facilitate specification (via a settings pane) of the location of paths to Pandoc, Quarto, .csl and .bib files, etc., and maybe even a preferred virtual environment.
  • Allow running pre-processing and post-processing scripts in a full shell that respects paths and virtual environments.

The first bullet point (DAG interface) would be the biggest change. The others are tweaks on Scrivener’s existing compiler.

1 Like

Okay I’ve split things off from the indicated point, let me know if I should move anything from there, or back, or if there would be a better thread title.

3 Likes

A quick note on the name: if you want it to be used in Italy, I think it can’t work too well.

Without going into details, I would suggest you to look for the meaning of the verb “squartare” (of which “squarto” is the first person singular, indicative present).

Paolo

Squartare means to quarter or divide into pieces, which in fact was what I had in mind as a play on words when I named it (since it chops up the Scrivener file into pieces).

Are you perhaps referring to slang, of which I am unaware?

(Essendo, infatti, italo-americano, pensavo di capire abbastanza bene l’italiano classico — ma, si vede, non l’italiano moderno!)

Not slang, but an historically connotation making a particular meaning the prevalent one.

From a technical point of view, squartare is a term related to butchering, but the other meaning makes it no longer used (separare is now the current use in official documents, but also in everyday speaking in the industry).

The other meaning, started as an analogy, but now so deeply embedded in the limbic lobe to be the prevalent one, is the name of a supplice in use in various times and various places, in particular as a punishment for treason.

So, the risk is that this system will only be used to publish horror stories!

Paolo

1 Like

Ah, what comes to mind is being “drawn and quartered”, certainly a brutal act. This sense is perhaps antiquated in its historical meaning and only has symbolic or metaphorical usage nowadays, e.g. other phrases such as “hung out to dry” or “stabbed in the back” or “buried alive”.

I took squartare in its literal sense of dividing into quarters, or pieces.

Somewhere out there in the world of Scrivener users, there surely exists an author writing a story in which a character is being drawn and quartered! LOL.

Remonds me of British Leyland (UK car manufacturer) who called one of their models Metro thinking it would be acceptedable around the world forgetting that in France — at the time — the Paris Metro was considered a joke.

@nontroppo,
I’ve been brainstorming how to implement your suggestion from earlier re: allowing the Scrivener Compiler to perform the .rtf > .md conversion (this way we can use all of the bells and whistles that Scrivener has to offer) but then to split the aggregated .md file back into the components of the Binder. My goal is to still use my Squarto compiler to create the folder/file tree (by reading the Binder) and to dynamically populate the _quarto.yml file with the parts/chapters/appendices info, but I would replace my existing content.rtf > .qmd conversion function with a simple cut-n-paste of the corresponding section of Scrivener’s aggregated .md file.[1] (The trickiest part will be to associate the parsed sections with the target .qmd files, but this seems doable since the order of the Binder is preserved in the output, and Squarto keeps track of the Binder in that same order.)

In the compile format settings for Squarto, I figured out how to specify separators before/between each text document:

This results in a nice, clean, aggregated .md document with the separators in the appropriate places. The one thing that I can’t figure out is how to not add a separator for nested text documents. For example, for these two nested documents:

CleanShot 2024-09-29--20-11-01

I get this in the aggregated .md output:

As you can see, I get the nice !!!!! ----- ----- !!!!! separator between text sections. This is desirable when I am at a chapter-level text document. But I would prefer not to have a separator between the parent text document and first child (‘Test nested 1’), or between children (e.g., between ‘Test nested 1’ and ‘Test nested 2’). If I can manage to exclude the separator for children, then my Python script would parse the parent with its children, together, which is the desired functionality for nested text documents. (This would allow the user to work on a lengthy chapter in sections, but when compiling, it all gets exported as one chapter.qmd for Quarto.)

Here’s what I would want it to look like…

Do you have any suggestions for how to not add separators for the nested text documents?

(Oh! One caveat is that the solution cannot require manual configuration for the user – it has to be auto-detected by Scrivener based on the structure of the Binder. The workflow that I envision is that the user will only have to worry about the organization of the Binder and the contents of each chapter, but not have to tweak the Compile settings just because a new part or chapter or nested text document happened to be added. In other words, whatever solution we come up with in the compile format settings, it needs to be auto-applied every time a nested text document is created by the user.)

THANKS for the continued support! It’s nice to find a kindred spirit who likes hacking away at making tools.

Best,
–Alexander.


  1. While there are advantages to my DIY .rtf > .qmd conversion, such as implementing my own filters, there would also be advantages to allowing Scrivener to do its thing, so I want to test out both methods! ↩︎

Here is a sample script that would parse the aggregated .md file created by Scrivener Compile, using the !!!!! ------ SOME TEXT HERE ------ !!!!! separators. I used regex re.split() instead of string .split() because (at least for now) I used different wording in the separators for those that appear before or between text sections.

def parse_compiled_md_into_list_of_texts():

    COMPILED_MD_FILE = '/Users/cavalierex/CODE/Quarto/squarto/test_compiled_via_scrivener.md'

    with open(COMPILED_MD_FILE, 'r') as f:
        compiled_md = f.read()

    text_list = re.split(pattern=r'\n!!!!!.*!!!!!\n', string=compiled_md)

    # Remove empty strings '' that may be present due to leading and ending whitespace
    text_list = [item for item in text_list if item != '']

    for text in text_list:
        match = re.search(pattern=r'^.*', string=text, flags=re.MULTILINE)
        if match:
            first_line = match.group(0)
            pprint(first_line)

The printout is of the first line in each text section.

So you can see that the order is what I would expect from the Binder, which will permit me to match each text section snippet with the corresponding Binder filename.qmd – except that I need ‘Nested 1’ and ‘Nested 2’ (children of the nested text documents) to remain with the parent (‘Introduction’) as one text section, so it can be properly matched against the file for the chapter, introduction.qmd.

As usual in Scrivener there is more than one way to roast your chicken. But Section Types are the key!

The easiest is to use Section Types to control the separators directly:

Here I made a section type that has a run in header, I don’t want separators so I customise that using the section layouts to use returns. To be honest I never use Separators, as I prefer the next option but that is just a preference, not a recommendation… Section Types can be applied by default to different levels of the Binder hierarchy, this will give you the automatic relatonships:

With Section Types set you can also/alternatively inject text with the Prefix and Suffix fields of the Section Layout panel:

So you can use Suffix for each section layout to either inject or not inject your split marker text.

See if either of these will work for you, in general Section Types are powerful addition to the Binder (you can also map which Section Types are default for which Binder hierarchy items, see the Project settings for that) :nerd_face:

Thanks for this instructive reply. This is what I meant before about the “nooks and crannies” of the Scrivener UI. The options are too spread out sometimes.

Since the “separator method” was not going to work, I tried your “Section Type” + “Section Layout” method. This will certainly work in the long-run. But presently there is some trouble getting the section type+layout to be auto-assigned according to ‘Structure-Based’ rules in Compile.

I set up the section type+layout as you instructed. In the project settings window, you can see how the Sub-section (nested text file) type is appropriately applied to the nested documents that are conveniently highlighted in the Binder. (Thank you KB for adding that nice little touch!)

But when I go to Compile, the same exact text files are not auto-assigned to Sub-section, as they should be.

Instead, I have to manually change the section type…

If I compile after manually changing the section type, then it compiles correctly…

And the Python parser works correctly – notice the ‘Introduction’ appears by itself, without the nested documents being listed. (The text for the children ‘Nested 1’ and ‘Nested 2’ were captured as part of the parent ‘Introduction’.)

So there is one challenge left…
Any idea how to make the auto-assignment via ‘Structure-Based’ Compile work correctly?

Also, is there is a way to include the # Section Header # in the prefix/suffix of the Section Layout? If so, that would be a tremendous boon! The Section Header name via Compile is the name of the element in the Binder (not any H1 header appearing in the text document), and this corresponds (in a slugified way) to the filename.qmd that the chapter snippet will be inserted into. So that would be even better than my “ordered list” approach (which depends on the assumption that the order of the parsed snippets would be the same as the order of the Binder elements extracted from the XML).

This is the value that the “Title” in the Section Layout definition will use. You can then use the Title prefix/suffix to wrap it in whatever delimiters you want. Or you can assign a Style to it.

@kewms, Thank you for that suggestion! I had placed the prefix/suffix in the tabs named “Prefix” and “Suffix” – not in the “Title Options” tab, where another prefix/suffix option exists.

For other Scrivener users who may be confused, the prefix/suffix in the “Title Options” tab will only append something before/after the title, and this will appear at the top of a section.

In contrast, the “Prefix” tab will allow you to place arbitrary text at the very top, before the section content – and also before the title (with its own prefix/suffix that was set up in Title Options). The “Suffix” tab allows you to place arbitrary text at the very bottom, after the section content.

So now, I entered the “Title Prefix” and “Title Suffix” as seen in the screenshot, and I deleted the “Prefix” and “Suffix”. I only wanted a delimiter above each text section that included the title from the Binder. So it appears as ┏━━━━━┫Title┣ in the compiled .md file, as seen below.

And these delimiters are easily parsed by the Python code in Squarto, capturing dictionaries with {slugified_binder_title, binder_title, first_line, and content}. The entire Scrivener document is stored in a list of these dictionaries. So to refer to a specific element from the Binder, let’s say the 5th element,[1] the content (parsed_texts[4]['content']) is written to the corresponding file (named parsed_texts[4]['slugified_binder_title'] plus .qmd extension).

SQUARTO STATUS CHECK 2024-09-30:

  • This little diversion has now enabled Squarto to parse the Scrivener compiled .md file into chapters that correspond to the individual items in the Binder; moreover, grouped (nested) text files are aggregated into one chapter named after the topmost group title.[2] Thanks to @nontroppo for suggesting that I explore this option.
  • Doing this is a different approach for “getting the content” from that which I described earlier for Squarto, in which the content.rtf > filename.qmd conversion was managed by Squarto. With this new approach of parsing the compiled .md file, we allow for Scrivener to handle the .rtf > .md conversion (so the user can take advantage of any of the other features Scrivener offers in its Compile workflow) but still get the content to where it needs to be for Quarto, in a folder/file hierarchy that is also outlined in the _quarto.yml file.
  • Ultimately, I think Squarto will allow for both approaches. If it is run as a post processing script from within Scrivener’s Compile window, it will start by parsing the compiled .md file. On the other hand, if Squarto is run from the command line directly on a .scriv file (without opening Scrivener), it will process everything automatically by itself, without needing Scrivener to do the compilation.

  1. Remember, Python lists are 0-indexed. ↩︎

  2. Unfortunately, Scrivener keeps assigning the nested text documents to Section Type “Chapter” instead of “Sub-section”, even though set up correctly in Project Settings. Thanks to @kewms for having me check this. ↩︎

1 Like

Make sure that you haven’t manually assigned a Section Type to the problem documents. Manual assignments override the automated ones.

:sparkles: SQUARTO DEVELOPMENT LOG 2024-09-30: :sparkles:

Binder functionality:
✓ Use the Binder as the visual control for parts/chapters/appendices
✓ Parse the .scrivx file to capture the Binder structure in code
✓ Create Quarto folder/file hierarchy based on Binder structure
✓ Use slugified version of Binder title to create filename.qmd.
✓ Dynamically populate _quarto.yml with the folder/file pathnames for a Book

Extracting content (the original DIY way):
✓ Navigate the /Files/Data directory to find content.rtf, content.styles, etc.
Convert .rtf > plaintext (for .qmd) using Pandoc (doesn’t handle unicode/Emoji)
Convert .rtf > plaintext (for .qmd) using my own converter (doesn’t handle unicode well)
✓ Convert .rtf > plaintext (for .qmd) using TextUtil (handles unicode/Emoji well)
✓ Postprocess the plaintext to convert Scrivener styles to Markdown
❏ Work on method for collating nested text documents into a single chapter.

Extracting content (using the Scrivener Compile .md):
✓ Have Scrivener compile a single, aggregated .md file with all Style substitutions
✓ Use Section Type and Section Layout (with Title Prefix) to add a delimiter before each chapter
❏ Fix the auto-assignment bug in Compiler: For some reason, Structure-Based auto-assigns nested text documents as “Chapter” instead of “Sub-section”, even though correctly defined in Project Settings.
✓ Collate nested text documents into a single chapter (using Section Type and Section Layout).
✓ Parse the compiled .md file into separate snippets for each chapter.

Quarto:
✓ Confirmed that quarto preview and quarto render generate .html web site
✓ Confirmed that quarto render generates a good .pdf file

To-do:
❏ Work on processing of figures/images that are stored in the Binder or embedded (cut-n-pasted) into one of the text documents.
❏ Refactor Squarto as an installable package that can be imported into other programs
❏ Refactor Squarto into a CLI program (with options for DIY processing vs parsing the compiled .md file)
❏ Set up the Scrivener format settings so that File > Compile runs Squarto as a post processor

1 Like

Darn! This is still a problem. Despite having set up the Section Type assignments correctly, the Structure-Based auto-assignment is defaulting to “Chapter” instead of to “Sub-section”.