Inserting Paragraph Breaks with Compile

Rule17 · September 1, 2018, 2:47am

I’m hoping this is the right thread for someone who is new to Pandoc and YAML to ask what might be a stupid question.

I have written a YAML that works:

[code]—
title: ‘<$level1_title>’
abstract: |
Prepared for: <$custom:Clientfullname>

Draft version: <$custom:Draft #>

Draft date: <$longModifiedDate>

Prepared by: <$fullname>

Project started: <$longCreatedDate>

Words: <$wc>

ALL COPY IS UNPROOFED UNTIL THE FINAL VERSION

<$img:Taleistlogo;w=150>
—[/code]

The problem is when I try to use replacements to insert double carriage returns between paragraphs when outputting with Pandoc to Word. (If I don’t, then Pandoc’s reference.docx sees only line breaks not paragraph breaks, which affects lists, etc.)

If I use replacements to insert a full line between paragraphs for Markdown then it breaks the YAML.

In this version, I use replacements to put in a clean line, so all the paragraphs are correctly formatted. However, Pandoc doesn’t recognise the YAML I used as front matter.

https://www.dropbox.com/s/dncr4soejxb0ao0/yaml%20with%20replacements%20turned%20on.docx?dl=1

In this version, everything is the same except I don’t run the text through replacements to insert an extra carriage return. The result is that the YAML shows up perfectly before the table of contents, but there are only line breaks between my paragraphs.

https://www.dropbox.com/s/3p5d4ya17khdglh/YAML%20with%20replacements%20turned%20off.docx?dl=1

I confess that I learn this sort of thing by trial and error, reading forums, cobbling things together, etc. I’ve spent hours over a couple of weekends trying to work this out because I don’t like asking questions that I could find the answer to with a little work.

As is probably obvious, I’m not a script person, so I know enough perhaps to be dangerous. In short, my reach might be exceeding my grasp here, so any help would be much appreciated.

To my mind, I’m looking for a way to exclude my YAML page from being included during the replacement stage.

nontroppo · September 1, 2018, 1:06pm

Have you tried to use compile metadata key pairs instead of a YAML document? Scrivener will put the correct — delimiters for you during compile, and for simple metadata it should be ok. Out of curiosity, why not simply use [return][return] to define paragraphs for Pandoc in the Scrivener editor, it would simplify your Pandoc workflow…

EDIT: another trick would be to make a regex replacement that only added [return][return] to sentences that ended in punctuation, that should leave YAML untouched? An example using positive lookbehind to search for punctuation: (?<=[ .?!\…])(\n)

Rule17 · September 3, 2018, 1:18am

Firstly, thanks so much, nontroppo, for taking the time to reply. As I said, I appreciate I’ve knowingly swum out of my depth then imposed on more knowledgeable people for help.

I’m afraid I don’t know what that means. It sounds great, though, so if you had a moment to elaborate… (But I will also Google.)

This I understand, so I know it increases the imposition, but I’ve tried it.

Within Scrivener, I don’t like the way [return][return] looks when I’m trying to read in the editor. I don’t like the gulf between the paragraphs; it interferes with the flow. I was going to say that was a superficial reason, but we’re all writers here, so I’m probably safe in thinking spacing is important to flow and meaning. So not superficial but subjective and [return][return] isn’t to my taste.

Thank you for this. I’m trying to teach myself RegEx and I’m getting better, but my expressions are still clumsy. Wrapping my head around lookbehinds is an ambition! I do have one that was working for quotations by doing a lookbehind for “>”. I will give this a go; you’re right that I don’t need to end anything in my YAML title page with punctuation, so as long as [return][return] doesn’t mess up lists, I should be fine.

EDIT: I tried this and it worked a charm on this simple document, except that the last item in the list didn’t end with punctuation:

nontroppo · September 3, 2018, 2:06am

You can enter meta-data which gets converted into your YAML within the compiler (this can be global to any compile output as shown in the screenshot [§23.4.2 of the user manual] and/or you can edit it for specific compile formats):

…generates this:

---
title: Test
author: Joanna Doe
abstract: |
  **Prepared for:** <$custom:Clientfullname> \
  **Draft version:** <$custom:Draft #> \
  **Draft date:** <$longModifiedDate> \
  **Prepared by:** Joanna Doe \
  **Project started:** <$longCreatedDate> \
  **Words:** <$wc> \
  **ALL COPY IS UNPROOFED UNTIL THE FINAL VERSION** \
  _© Taleist 2018. All rights reserved._ \
---

I think that should work for you. Regarding the spacing:

Use Scrivener lists which put space before/after at compile…
Only use [return][return] around lists, shouldn’t disrupt your flow too much.

The Regex I linked to will work fine, BUT if your list item ends in punctuation it will apply to that too, which will cause pandoc to wrap list items with paragraphs (don’t know how that looks in docx, you’d have to check).

AND to learn Regexes, you really should play with regex101, the most amazing online regex editor, here is your regex to play with:

regex101.com/r/qU1X2m/1

Rule17 · September 3, 2018, 10:18am

You are a star, nontroppo. Thank you!

I’m outputting a lovely title page using your method. May I just check that I can’t include placeholders like <$wc> and expect them to be replaced during compile?

When I had my YAML metadata file at the head of the document to be compiled, the placeholders were replaced. When I use the same syntax in the Metadata compile format pane, the title page is created by the placeholder hasn’t been replaced, i.e. it still says <$wc>.

If that’s the way it is, so be it. I just wanted to be sure I wasn’t doing something wrong.

nontroppo · September 3, 2018, 3:38pm

Hm, I don’t really know if this is expected behaviour or a bug.

Other fields are expanded, e.g. <$date> or <$title> — but <$wc> or some others aren’t. IF I use a document called meta-data in the binder (a special name that gets treated as document metadata), then <$wc> also fails to get converted, but if I rename the SAME document as front-matter then it transforms <$wc>.into the actual wordcount

I assume Keith has some placeholders he disables in metadata, although I don’t know why <$wc> is among them as it would seem a fair place to put a wordcount to me?

Find a test.scriv project that contains a document called meta-data where <$wc> doesn’t expand, and an abstract2 item in compile metadata with the same effect.

test.scriv.zip (59 KB)

Rule17 · September 3, 2018, 11:40pm

Thanks again. It seems your test file has been downloaded a few times, so you might be helping more people than me, which is brilliant.

I can’t wait to go through this, and I’ve added RegEx 101 to the “learn something new” section of my weekly planner.

I’m very grateful for your help and the time.

nontroppo · September 4, 2018, 1:08am

You’re welcome, and lets hope AmberV or KB can have a look and see if they can explain the reason some placeholders expand differently depending on how they are assigned. Happy writing!

AmberV · September 4, 2018, 9:05am

Another approach to handling the matter of paragraph delineation is to use one of Scrivener’s tools for spreading apart paragraphs (refer to the attached project for a working sample):

In the Transformations pane of the Format designer, set Convert to plain text: Paragraph spacing.

Now at this point, if the sort of paragraph formatting you prefer to write with uses a small amount of spacing between paragraphs (anything over half a line height will do), then that might be the only adjustment you need. But if you prefer indents to mark paragraphs in the editor, then you’ll need to take an additional step.
In Section Layouts, adjust the layout(s) used to format body text. At the bottom of the “Formatting” tab, enable Override text and notes formatting.
Use the paragraph spacing tool in the mock editor to add at least one empty line between paragraphs. (Note that since the writing of this original post, all of the Markdown-based compile formats have this formatting applied already, so all you need to do is switch the override setting on for each layout that needs it.)

So with this adjustment we are using Scrivener’s rich text foundation to transform the formatting of the document, specifically targeting those areas that contain body text paragraphs, and then telling the engine to convert that visual formatting into literal whitespace in the plain-text output.

We can thus have sections of the text blocked out as being protected from that transformation. In the provided example project, you will find a “Table” section type, with a simple pipe table example in the binder. The way the compiler is set up is to leave Table type documents alone—simply output them as-is.

Another example I provided is found in the document called “section one”, toward the top of the file. We have here a simple definition list structure that we would not want to be pulled apart by line spreading. While at first glance this may look like it will be a problem from the editor’s point of view, if you compile you’ll note the def list comes out properly formatted. What’s going on here? If you go back into the editor and check the Format Bar, you’ll note that these paragraphs have been styled. The style is doing nothing special other than making them look a little prettier in the editor, which is mostly cosmetic save for where they insert a little strategic whitespace between elements, whitespace that will become literal whitespace where needed. But more importantly the fact that they are styled at all means they will not inherit Section Layout formatting. Their lines will not be broken apart.

So that gives you two different tools to override paragraph spreading: Type-based and text level Style overrides.

Likewise the “Metadata” document will have its formatting left alone given its special stature as being somewhat aloof from the output. (And that is also why its placeholder expansion isn’t quite the same; as I understand it this document is basically removed from the compiler internally and only added later on. We only recently got other types of document placeholders working within it. The word count placeholder might be an oversight, or it might be technically difficult to work in.)

Replacements cannot have that kind of specificity—clever use of regular expressions aside—they target the whole output, whereas formatting can be surgically applied and overruled as needed, and so generally that is the approach I would advise.

It does have its weaknesses. Say you wrote your pipe table using your normal paragraph formatting that has a cosmetic empty line between paragraphs and just let it be. Well Scrivener damage the table in that case. It does require you to think a little bit about what a document looks like, since its cosmetics will impact its structure—and while you may not like the double-return paragraph look cosmetically, you may not want to dive that deep in the muddy waters of formatting == structure.

Rule17 · September 5, 2018, 6:05am

Thank you, AmberV.

It says something about me that I’m inordinately excited about the idea of playing with these files at the weekend when I have a clear run of time. I suspect, however, that in this forum I’m among people who understand.

Off topic, but I love Scrivener with a passion. It’s the only software I ever worry how I would do with out it. Everything else is replaceable but our copywriting business and process is built around it. And I know I’m only scratching the surface of what it can do — it took me years, for instance, to start with Pandoc and that’s been a revelation on its own.

AmberV · September 5, 2018, 9:36am

Let me know if you find any improvements to the approach. Of course if the RegEx is working fine for you it may just be a point of curiosity, but I figured it would be worth it to have these dedicated alternatives “documented” on the forum as well for anyone else that comes along. Would you mind if I split off this conversation off to one about paragraph spacing specifically? I just don’t know if people will find these solutions with the header being about YAML.

Quite right.

Thanks for the kind words; it’s always nice to hear how Scrivener gets used. Markdown fundamentally changed how I approach writing as well, so I certainly do understand how that goes.

Rule17 · September 5, 2018, 10:00am

Splitting makes sense to me. You’re right: I came to the problem through trying to set up my YAML, but the problem is paragraph spacing.

Inspired · February 25, 2022, 4:23am

Hi there,
I have a document in which I’ve used Multimarkdown for a few basic things (headings mostly).
When I compile the document to Word docx format, I have selected “Convert Multimarkdown to rich text in notes and text”. This works, in that it creates headings in the output file, except that carriage returns get dropped out of regular paragraph text. So all paragraphs between headings end up in one large paragraph.

What am I over looking?

AmberV · February 25, 2022, 12:34pm

Please refer to this post above, for the best approach of salvaging non-Markdown paragraph formatting into something Pandoc or the MultiMarkdown conversion engine can understand.

By the way, I’d consider installing Pandoc and using its DOCX conversion, instead of using Scrivener’s rather limited conversion. Restart Scrivener after installing Pandoc, and then at the bottom of the file type selection at the top of the compile overview window, you’ll find a new set of Markdown conversion options available to you. Select the Word-specific compile format in the left sidebar, and edit it with the given instructions above. You will still need to have the compiler transform your single-return paragraph text to Markdown-friendly spacing.

Inspired · March 1, 2022, 8:08pm

Hi Amber,

I am having difficulty applying that post to my situation. That’s most likely due to my lack of familiarity with the compile system in Scrivener. So please bare with me.

In the Transformations pane of the Format Designer, the Convert to plain text: Paragraph spacing doesn’t show up when I have got DOCX as the selected file type (at the top of the left sidebar in the Format Designer). I assume the selected filetype is the output file time, since the input is simply the Scrivener document I am working on.

If I select MMD as the filetype, I have the Convert to plain text: Paragraph spacing option, but I don’t want to output MMD. So this is where I am having trouble making sense of how to apply your guidelines.

I will look into the Pandoc option. Although for now, I have a very basic requirement for DOCX output … simply so my editor can work over the text copy and make comments, etc. I get the impression the effort needed to set up Pandoc might be excess to my requirements.

UPDATE: Installing Pandoc was easier than I was expecting. Looking over its options now.
UPDATE 2: The Pandoc output for MMD to DOCX is great. Perfect. Thanks for that suggestion.

Thanks.

AmberV · March 1, 2022, 10:35pm

Yeah, just to emphasise, the techniques here don’t really work if you are using the checkbox to convert MultiMarkdown content in the editor into one of Scrivener’s native file types, like its DOCX or PDF output. That checkbox assumes the content is already Markdown and not a hybrid between word processing and Markdown. Those file types don’t have plain-text whitespace transformation options because 99.9% of the time you really wouldn’t want that. If you want paragraph spacing in a DOCX, you add paragraph spacing as a formatting element to the text override, you wouldn’t want empty lines between paragraphs instead.

The only really good way of using some Markdown and some word processing conventions is to stick with a Markdown-based compile workflow, either through MMD or Pandoc. It’s a different way of working in that you aren’t using Scrivener to design the document at all. You’ll get a fairly stock looking “default Word” .docx from Pandoc. That’s something you do have control over though, through Pandoc, not Scrivener.

You can have it export a stylesheet template file, which you then open in Word and format how you want. You put that template file into the right place for Pandoc to see it, and it will use that for the default look. Basically:

Open terminal, and type in:
```
pandoc --version
```
This will print some information, the thing you are looking for is the path after “User data directory”. Copy the path completely, and use these commands to first create the path if necessary, and then change directories into it:
```
mkdir -p <PASTE PATH>
cd <PASTE PATH>
```
Type in the following command, which will create a fresh copy of the default DOCX stylesheet, into this folder:
```
pandoc -o reference.docx --print-default-data-file reference.docx
open .
```
The second command will open Pandoc’s data folder in Finder, and you should see the ‘reference.docx’ file in there.
Edit that file with Word, following the advice given in the previously linked to section of Pandoc’s documentation.

And that’s it. Now when you compile from Scrivener, it will come out looking like the sample formatting in that document.

You can also create your own document designs and use them per-compile, you aren’t stuck with one default look, but that’s perhaps getting a little out of scope for this.