Using pandoc to convert tutorial project to a docx file

jjs357 · May 27, 2024, 3:20pm

Discovered this pandoc script:

pandoc --extract-media ./myMediaFolder EEK-Non-Fiction-Start.docx -o ./CopyEEK.docx

which will extract the media files named as they are inside the source .docx file, which is a Zip archive at its core. The media file names are generated as well, so the names are different from the source names I used inside of Scrivener.

But this could work for me with the editorial process I will need to follow.

jjs357 · May 27, 2024, 3:24pm

Thanks again for the reply. The Part Header type (Header) needs to be different from the Chapter Header where the Part is unnumbered. But when I start the compile sequence, the Header and Chapter Header are both connected to the same header style and I do not yet see how to change that connection.

Maybe a little more research is required.

jjs357 · May 27, 2024, 4:00pm

I am getting close to the effect I want but as I move to the next Part heading (Part 2) which I want as unnumbered, the chapter heading that follows is numbered correctly, but the first headed subsection retains the numbering that the last subsection of Part 1 ended with. See:

Seems like I want a reset declared somewhere, but where?

jjs357 · May 27, 2024, 5:51pm

If I reset both the level 1 $hn AND the chapter number $n in the Part Text suffix:

Then everything resets so at least the sub level numbers match:

BUT, I want to let the chapter auto-numbers keep increasing and only reset the subheader numbers.

Have not gotten that to work yet.

jjs357 · May 27, 2024, 7:04pm

Well, I guess it was simple-minded operator error – can’t mix <$n> and <$hn>. When I made both chapter and section titles use <$hn_level1> I got the behavior I was looking to achieve.
E.g.

and the same for the headed subsection, I got the increase in chapter number past the unnumbered Part header boundary and the reset of the subsection number.
E.g.

Hoping this self-discussion might help others down the road.

ptram · May 28, 2024, 8:46am

Unfortunately not. With the progressive slipping into irrelevance, word processors have remained to the same concepts of the eighties, and i doubt they will adopt a more modern approach inspired to page layout or web site development programs.

Images are embedded, converted into their own format, and if they include a command to relink the embedded images they save them in their new, arbitrary format and with arbitrary names.

These images are usually recompressed, and with colors coded to RGB. No fidelity to the original is even considered. No way to recover images intended for professional printing. A word processor document has to be considered just as an intermediate for text and styles, to which images will have to be relinked by the page designer at the publishing house.

A compromise I’m experimenting is that to insert both a Markdown image link, and a visible image dragged/linked from the desktop. The first one will be the actual image, the second one a preview to be used while editing.

The Markdown image link and the preview image have different paragraph styles applied. In the Compile format I tell Scrivener to remove all elements with the preview image style.

In the above example, “ipath” is a variable for the full image path, that will be replaced during compile.

The original images will be the ones that I will then use for the resulting Web site or PDF created with a page layout program. Links in the Web site will, luckily, remain the same as written in Scrivener; not so for PDF, if the intermediate exchange format has to remain something like RTF or DOCX.

I’m hoping (but not believing) that Affinity Publisher will allow importing Markdown files directly. The only intermediate formats generated by Pandoc, that can preserve image links, are HTML and ICML. ICML requires a bit of work to be converted to IDML and be read by Publisher (InDesign can already), and is a bit basic. HTML, I’ve still to try extensively.

Paolo

AmberV · May 28, 2024, 3:38pm

With the progressive slipping into irrelevance, word processors have remained to the same concepts of the eighties, and i doubt they will adopt a more modern approach inspired to page layout or web site development programs.

OpenDocument format supports that, and with Scrivener you can achieve an embed-free output. You can test it easily enough by switching your compiler to MMD → FODT and observing the results. A neat and tidy folder with all of your original images (if you linked them in Scrivener, if you embedded them then… well they are toast), with their original names, and an .fodt file that references them.

Of course that is a convenient output solution not a mandatory arrangement. You can link to the images from anywhere, and LibreOffice handles them simply.

I’ve heard Affinity Publisher is going to be adding ODT support at some point, so you might not have to struggle with this mediocre file format forever.

ptram · May 28, 2024, 5:20pm

Thank you for the information about exporting ODT from Scrivener. Publisher including this file format would be huge. Unfortunately, there is a lot to fight to make them add supported file formats, but one can hope.

Paolo

AmberV · May 28, 2024, 5:33pm

Yeah, developers ignoring competitive formats and just implementing “Word” because that’s what most people want is part of the problem. There are better formats out there, and word processors that have gone beyond the '80s in their design. Honestly LibreOffice almost feels like a DTP, it uses different jargon and interface to describe the master page concept and full stylesheet driven design. For example I wouldn’t mess with manually numbering at all, as in the above, because the stylesheet drives numbering when you load Scrivener’s output into the template. That is all around a superior approach because moving a chapter around late in the editing process won’t require manually renumbering hundreds of headings, as though one had typed them in by hand (which is what Scrivener is doing… just faster than a human can).

jjs357 · May 28, 2024, 7:51pm

Thanks @AmberV and @ptram for your replies.
I am working with the company Elektor on a publishing project and it may be that their workflow is not really dependent on a Word docx delivery. That’s what another author that Elektor works with uses, but he has to let the editor be able to correlate figure number references, that are generated in Scrivener and shown in the .docx file, with the source image files. He also compiles an HTML output that retains references to the files as originally named and the editor combs through the HTML to see what’s changed as new document drafts are made and delivered.
I am just starting the writing project and I am liking Scrivener’s approach to things so far, especially for auto-numbering and figure citations. Maybe my editor (waiting for an introduction) will be open to fodt instead. This has the advantage of a good Word-like document interface: formatted text and embedded figures with all numbers as compiled from Scrivener AND the source image files as siblings to the fodt file.

nontroppo · May 31, 2024, 1:22am

For Pandoc > ODT it would be quite easy to make a Lua filter to link rather than embed images in a document. The XML is really simple for ODT, and in fact Pandoc already uses links for OpenDocument output (which ODT uses under the hood):

github.com/jgm/pandoc

Option to Link Images Rather Than Embed Them For ODT

opened 02:06AM - 29 May 24 UTC

iandol

enhancement

**Describe your proposed improvement and the problem it solves.** For many fo…rmatting workflows, editors or publishers prefer not to embed figures. ODT allows you to easily embed or link images, and in fact the `opendocument` writer already supports linking: ```xml pandoc -t opendocument ![](placeholder.png) <text:p text:style-name="Text_20_body"> <draw:frame draw:name="img1"> <draw:image xlink:href="placeholder.png" xlink:type="simple" xlink:show="embed" xlink:actuate="onLoad" /> </draw:frame> </text:p> ``` BUT `odt` forces the image to be embedded, so the same markdown becomes something like: ```xml <text:p text:style-name="Standard"> <draw:frame draw:style-name="fr3" draw:name="Image1" text:anchor-type="as-char" svg:width="4.704cm" svg:height="1.552cm" draw:z-index="0"> <draw:image draw:mime-type="image/png"> <office:binary-data>iVBORw0KGgoAAAANSUhEUgAAAMgAAABCCAMAAAAlrWkSAAABUFBMVEXL//jK/vfK/fbJ/fbJ /PXI/PXI+/TH+/TH+vPG+fLG+PHF+PHF9/HF9/DE9/DE9u/D9u/D9e/D9e7D9O7C9O3C8+3C 8+zB8uvA8uvA8evA8eq/8Oq/8Om/7+m+7+m+7+i+7ui97ue97ee97ea87Oa87OW76+W76+S7 6uS66uS66uO66eO56eK56OK46OG45+G35uC35t+25d+25N61492049204ty04duz4duz4dqz 4Nqy4Nqx3tiw3dew3dav3Nau29Wu2tSt2dOs2NKr19Gr1tCq1c+p1c+p1M+o082n0s2n0syn 0cym0cum0Mul0Mqlz8qlz8mkzsmkzsikzcijzcijzcejzMeizMaiy8WhysWhysSgycSgycOf yMOfyMKfx8Kex8GexsGdxsCdxcCdxb+cxL+cxL6cw76bw76bwr2awr2awryawbyZwbuZwLud UsQ6AAACBElEQVR42u3VA7J0OxgF0N38r23btm3bNs+ef/ElaVSzjPPdlzWJhf8ny7Isy7Ks 0nn8DYdOCf6CHvIA8sH/RLIb8s1SefRDuvxfajOQbpvGTz5ka2XYFkTz3jGiBZJNMerWC7my v6i8rFCbgFzr1AaCr1S+siFVA7UzYJDaGgTKbJ7cuKNRCeCc2sfBfG+JB1LkdcxsPzNqGUoV o35OlwbLfXC34t75/XfG+ciAtsI4zuXKSG0AbuVZZ5IRGJmfTPRaBNfybDLe/YoHIb1nP4zz VgYX8+4yzLlaGa0PIoanpG/+4JNhnxVwNd8hjRUfUmtyqH3VwOUCJzSGkFLOI7WfBrhe8IJG P1LIuqP22wIBMq5odCNJxjU1px0iZN5Qc9qQ4N8ljR4IkX1P7bcJcYLnNAYgRt4jte8AYi3T GIYgBS9UfhFnkdoYRBmmcos4Y1QuIcsMlT3E6aLyAllWqCwjTiU1H0Q5pDINzT+8UAQtk1oR RHmI5J4z905yuwnKD5UmSOJxqNShet1hyPWQD7dUBiFJDrWJc8Z4m72gMgNJapjWKiTpZVqH kGSSjNpuzJ59Y9QjJFli2NdiIRTf0DXDHA8E2aXxMBZERNMOQ3IhyA2Vo04PYhUtfVGphSDf /F0tR5J/449kH+TIeJ3JREqe7uNpyOHzIb0gLMuyLMuyLFn+A7VPEV2OysTaAAAAAElFTkSu QmCC </office:binary-data> </draw:image> </draw:frame> </text:p> ``` It would be great if there was a command-line option to allow to link to images (i.e. preserve the opendocument way for odt). This way we could generate ODT files with figures that were linked. The same technically applies to DOCX (Word does allow linking, but of course the syntax is much more complex). **Describe alternatives you've considered.** I imagine a Lua filter could do this, and I suspect it is a viable workaround?

For DOCX I did check the XML source, and it was horrible as usual, but I suspect just as for my Index builder filter, it could be made to work too…