Criticmarkup use to compile a DOCX file

luisto · August 19, 2021, 2:25pm

Apologies if I’m posting a question on compiling to DOCX to a forum that focuses on compiling TO Markdown, but I’m guessing this is the right forum for this question (hopefully I’m not wrong).

I’m using Markdown to write my book because of the portability it gives me across several different devices and tools, and I’m very happy with it… except for the fact that Scrivener’s Compile option appears to not support Criticmarkup when compiling to a DOCX file (I’ve tried the “Convert to Markdown/Multimarkdown” options but, {>> comment notation won’t be recognized). In this forum I’ve read several suggestions around supporting Criticmarkup comments but they all assume I’m compiling to Markdown, not to DOCX (or maybe I’m not understanding them correctly).

So, the questions:

Is there an easy way to have Scrivener process Criticmarkup when compiling to Word/DOCX formats?
I’ve tried to use the Replacements tab in Compile to delete Criticmarkup comments with a regular expression (here’s the expression for comments, by the way: {>>([\s\S]*)<<} ), but I’ve discovered that these replacements don’t appear to work when the “Convert to Markdown” options are set up. I know the Regex is correct because it finds comments in Project Find, but it won’t delete them as part of a Compile. Are Convert to Markdown options in DOCX conflicting with Regex replacement options in compile?

My current workaround is to use link syntax to hide comments (like this: [//]: # (my comment goes here) ). It works, but doesn’t support multiline, requires it to be on its own line, and is clumsy to write.

Any suggestions on how to get Criticmarkup comments processed in Scrivener for DOCX files would be appreciated.

Luis

AmberV · August 19, 2021, 3:17pm

I think to fully answer your questions I would need to know a little more about what your expectations are, but I can answer a few things that do seem to represent some points of confusion:

First and foremost, Scrivener not only supports compiling to Markdown, as a plain-text file format, but has integration with Markdown conversion engines to produce final formats, like HTML, LaTeX, ODT, ePub and more. In fact I would say most of us here are using it to produce some other kind of file than a raw .md file.

Beyond that, it supports custom command-line execution and automation, which allows integration with preferred Markdown conversion engines, or extended options that the software itself does not support.

So it’s not right to say compiling to Markdown is all Scrivener does, in the grand scheme of things. Where you would have a technical point is in saying Scrivener’s code itself, what the compiler does, is create a Markdown file and nothing further. But I would say that is splitting hairs a bit. That’s like saying Scrivener, when used like a word processor, doesn’t compile to DOCX or PDF at all, because technically it only compiles to RTF, and then uses another third-party conversion engine to make the final file types.

This distinction doesn’t really matter too much when you click the Compile button and are asked where to save a .docx file, and get a .docx file. Likewise it’s really only good to know the above, in a troubleshooting sense and in understanding questions like this, to know where Scrivener’s job stops, and where it picks up internally via conversion.
So with that all in mind, Scrivener wouldn’t support CriticMarkup directly, in the same way it doesn’t really support Markdown parsing directly. It defers all of that to conversion engines that are built specifically to handle that job. The default engine it uses, the one you’ll see access to at the bottom of the Compile for dropdown in the compiler, is MultiMarkdown. MMD itself supports CriticMarkup, which you can read about in its documentation. It’s a good read, as it not only goes over the options available, and what to expect, but a little into why not to expect anything else (like CM annotations not turning into ODT comments).

It also supports Pandoc, if you install it and restart the software. However to my knowledge Pandoc does not support CM directly. I assume you are using Pandoc, since you mention compiling to DOCX directly, and the MultiMarkdown engine only supports ODT, not DOCX. You shouldn’t even be seeing a DOCX Markdown option unless you have Pandoc installed.

Thus: you’re may be running into limitations with Pandoc, from the sounds of it. You may find some interesting ideas in this Pandoc support ticket. In short, the designer doesn’t believe combining revision markings with document processing is a good idea, since the former should be producing a source file for the latter—or exactly what I’m saying here: you need a preprocessor to produce a valid, cohesive Markdown document as the first step in the tool-chain, and then invoke Pandoc.
While Scrivener doesn’t parse CriticMarkup, it can generate it. All of the built-in compile formats we provide as examples will do so. Scrivener’s inline annotations will turn into {>> comments <<}, and its inspector comments will {==highlight the editor text==}{>> followed by a comment <<}. In addition to that, it can make use of styles to generate the other types of CM syntax. It’s important to stress that these are all matters of configuration though, not software features. We could set things up so it produces an entirely different kind of revision tracking markup, if that’s what was wanted. In fact you could probably even set it up to support native .docx change tracking codes, if you want to get really deep into injecting your own DOCX XML into the output (if you edit the ODT or DOCX built-in compile Formats, you’ll see where I do something very similar for building an indexing using raw ODT or DOCX XML syntax).
To reiterate a point made above: if you have a CM processor you’d like to use, or a Markdown converter that handles CM better, then take a look at the Processing compile option pane, documented in §24.18, of the user manual PDF.

So with that background out of the way, maybe it has already answered some or all of your questions, here are some follow-up questions I have:

Is there an easy way to have Scrivener process Criticmarkup when compiling to Word/DOCX formats?

What do you mean by “process” here? Strip them out? Accept all revisions? I suspect either way, if the default behaviour is not working for you, you’ll need to supply your own command line arguments in Processing.

I’ve tried to use the Replacements tab in Compile to delete Criticmarkup comments with a regular expression (here’s the expression for comments, by the way: {>>([\s\S]*)<<} ), but I’ve discovered that these replacements don’t appear to work when the “Convert to Markdown” options are set up.

Do you mean the setting that converts rich text to MultiMarkdown? If so, you don’t want that setting. That’s for people that aren’t writing in Markdown at all, and are just looking to gain access to its high quality output. If you feed Markdown through that, it will essentially nullify it all, by escaping special characters and making a mess of stuff elsewhere.

As some of CM’s characters are “special”, I would suspect that’s why search and replace is having a hard time, because internally it’s actually \{\>\> Blah blah \<\<\}. Again, it assumes the author has no idea what markup is, and really meant to type in those characters as printed.

With that setting off though, I have no troubles stripping CM comments out of the text with RegEx. I tried with the pattern \{>>.*?<<\}, which could probably be improved, but worked fine in a simple test.

My current workaround is to use link syntax to hide comments (like this: [//]: # (my comment goes here) ). It works, but doesn’t support multiline, requires it to be on its own line, and is clumsy to write.

Yeah, I’ve never been a fan of that convention either. It’s also messy to my eyes, and plus I like to insert annotations anywhere, not just on their own discrete paragraphs.

So to close, again it would be helpful to know what you want as a result in the .docx file. I think if you’re wanting to stick with Pandoc, you’ll need to find some kind of CM pre-processor though. With MultiMarkdown and ODT you can just throw in the -r or -a argument on your custom command-line, to reject or accept all CM markings, respectively. As noted in the MMD documentation, that also nukes all highlights and comments.

nontroppo · August 19, 2021, 3:49pm

A pandoc preprocessor for CM: GitHub - ickc/pancritic: using criticmarkup in the pandoc markdown source — but it mostly focuses on HTML and TeX, not sure what the output to DOCX may look like.

I’ve used a preprocessor written in Ruby, converts CM to HTML during compilation from Scrivener via pandocomatic, and this gets converted to DOCX outputs via pandoc, maybe worth testing?

github.com

iandol/dotpandoc/blob/master/preprocessors/criticmarkup.rb

#!/usr/bin/env ruby
#encoding: utf-8
Encoding.default_external = Encoding::UTF_8
Encoding.default_internal = Encoding::UTF_8

# the job of this preproceccer is to convert criticmarkup to HTML
# which pandoc can parse to e.g. Word output
input = $stdin.read
output = input.gsub(/{~~/, '<del>')
output.gsub!(/~~}/, '</ins>')
output.gsub!(/(?<=\S)~>(?=\S)/, '</del><ins>')
output.gsub!(/\{\+\+/, '<ins>')
output.gsub!(/\+\+\}/, '</ins>')
output.gsub!(/\{--/,   '<del>')
output.gsub!(/--\}/,   '</del>')
output.gsub!(/\{\=\=/, '<mark>')
output.gsub!(/\=\=\}/, '</mark>')
output.gsub!(/\{\>\>/, '<span class="comment" title="')
output.gsub!(/\<\<\}/, '">†</span>')

This file has been truncated. show original

See how I use it via pandocomatic: dotpandoc/pandocomatic.yaml at master · iandol/dotpandoc · GitHub

luisto · August 19, 2021, 4:14pm

AmberV and nontroppo, thank you for your replies. I’m amazed at both how quickly and how thoroughly you’ve replied to my question. This is truly an amazing community

AmberV, in response to your overarching question, I need to explain a bit more of what I’m doing:

I’m writing my novel using plan-text files with markdown, and I only use the italics and bold features. I’m doing this because I’m using the sync folders feature (and copy/paste every once in a while) between Windows, Android, Linux and I’ve become frustrated with how italics and bold make a mess in non-text formats. Markdown solved these issues permanently, and I’m very happy with it.
I am NOT using Pandoc, Scrivomatic or anything like that. I haven’t installed anything on my computer other than Scrivener itself. I’m simply using the compiler with Compile For: Microsoft Word (.docx) selected in the dropdown, and the Convert MultiMarkdown to rich text in notes and text option checkbox enabled. That’s it. The resulting Word files have italics and bold perfectly formatted.
I was wondering if there’s an easy way to use the Scrivener out-of-the-box (box? ) functionality to use comments in my markdown text files and get them ignored when Scrivener compiles. My two ideas were a) maybe I can use Criticmarkup syntax and get Scrivener to process it, or b) use the Regex replacements option in the Compiler to remove the comments. Neither one has worked though.
As far as I understand, the only out-of-the-box way to get comments is to use the [//]: syntax. OR, I can install Pandoc/scrivomatic and get a more flexible/robust but more complex conversion process going.

Hopefully this provides the missing context.

AmberV · August 19, 2021, 4:59pm

Okay! I suspect there may be an easier approach for you to take, if you’re willing to use Scrivener’s native annotation feature instead of CM. A lot of the above is more advanced, and showcases Scrivener’s flexibility, but for what you’re wanting to do, I don’t think you need it.

The good thing about this approach is that it has a plain-text syntax when used in conjunction with the external folder sync feature. Go into the Sharing: Sync options tab, and make sure the option at the bottom is enabled to Convert text inside (( )) and {{ }} to inline notes when syncing plain text files. The former convention is what you would use for comments (the latter is for footnotes).

So that’s how you would mark notes outside of Scrivener. Inside of Scrivener, you would use the Insert ▸ Inline Annotation command.

By default these will not compile. As I mentioned above, you can compile them, and they could end up as CM comments if you really wanted, or other commenting conventions (like HTML).

Now as for the rest, I would definitely suggest either trying the native MMD to ODT conversion, or if you really need DOCX and don’t want to make the conversion in LibreOffice, then give Pandoc a try. It has an easy to use installer, and after you run that and restart Scrivener, you’ll find a native Pandoc Markdown to DOCX compile option in the main menu.

That result is going to be a lot better than using Scrivener’s native DOCX output (recall what I said about people wanting to convert to Markdown to gain access to its better quality output). And on top of that, the option you are currently making use of is… pretty basic. Essentially Scrivener passes your document through MultiMarkdown to generate an HTML file, then converts that to RTF, and then converts that to DOCX. That’s a lot of conversion, and a lot of chance for stuff to get lost in the translation. The only small advantage to doing so is that Scrivener’s “regular” compile tools are available to you, which may make formatting a little more straight-forward. But making your DOCX look nice with Pandoc, if you don’t like the default look, isn’t terribly difficult to do. You can download its documentation from that webpage and make a nice template for yourself.

I don’t think you would need to go the Scrivomatic route. It’s a great system, but aimed more at academic writing than novel writing. For simple prose with a little bold and italic here and there, stock Pandoc—or even just ODT without any external installation—should suffice!