Scrivener to ICML

ptram · September 28, 2022, 4:41pm

Hi,

This is a followup from another thread, where I can no longer post new replies (Literature & Latte Forums). I’ve already crowded my posts, so I think it is better to start anew.

I would like to try with prefix/suffix to convert the Scrivener styles to Pandoc syntax that will then be converted to ICML.

I’ve tried with the Pandoc syntax for custom styles, both without and with what I think is the newline character (‘\n’). Prefix:

::: {custom-style="My Paragraph Style"}

Suffix:

:::

But it doesn’t work. Prefix and suffix are added into the compiled text. If I check “Treat as raw markup”, they simply overwrite part of the included text.

Paolo

AmberV · September 28, 2022, 5:22pm

But it doesn’t work. Prefix and suffix are added into the compiled text. If I check “Treat as raw markup”, they simply overwrite part of the included text.

I don’t know what that last part means, but this would be the expected Markdown result:


...end of previous paragraph.

::: {custom-style="My Paragraph Style"}
The custom paragraph style.
:::

Beginning of next paragraph...

If it doesn’t look like that, then the newlines aren’t entered right. You want one after the prefix (⌥↩ to insert) and one before the suffix, as the default is to presume you want prefixes and suffixes a part of the styled line.

And yes, you definitely want raw markup enabled. Just be aware that means the entirety, not just the suffix and prefix text. The interior, or the text marked with the style in the editor, needs to be valid Markdown at that point, including proper paragraph spacing, as this style setting is chiefly for passing through raw markup from the editor without processing it from rich text.

ptram · September 28, 2022, 5:45pm

I confirm the newlines weren’t correctly entered, and it looks like they are needed. I entered them as you shown, and checked the “Treat as raw markup” option for that style.

The result is that the ICML file now contains that style, that is imported in InDesign as a paragraph style.

The only issue is that a “> Paragraph” extension is added to the style name. I wonder if this is something that depends on how I set Scrivener, or of the Pandoc converter. I would like to be able to remove it, or applying a stylesheet in InDesign wouldn’t work.

Character styles are converting just fine. What I mark with the following type of prefix/suffix:

[My Text]{custom-style="My Character Style"}

translates perfectly in InDesign.

Ioa, do you know where Pandoc filters are placed, in a Mac? Maybe I can edit the ICML one on my side.

Paolo

AmberV · September 28, 2022, 6:27pm

It’ll be a good idea to bookmark Pandoc’s issue board for stuff like that. Scrivener has no idea what you are doing by typing letters between quotes in the prefix field, nor does it care why. Evidently this output is some kind of naming convention, perhaps one the original author prefers. They are active on the board, so you may get an answer as to why it is this way.

Ioa, do you know where Pandoc filters are placed, in a Mac? Maybe I can edit the ICML one on my side.

I don’t know if it has one, looking at the source. It may only be a Haskell file. Looks like a bit of a rabbit hole if you aren’t familiar with the language and recompiling software.

ptram · September 29, 2022, 9:46am

Thank you for the link! Unfortunately, by reading the full discussion, I understand that the ICML converter is all but complete, and only converts a few basic things. Paragraph styles, characters styles, but not object styles, and just some basic parts of the table styles.

I fear that this stops my hope to be able to use InDesign/Publisher just for the final touchups of a project made in Scrivener. There is too much work to do there (reapplying object styles, resizing and repositioning images), that at this point one has to stay in InDesign/Publisher as soon as mixing text, images and tables starts to be due.

Scrivener and the page layout programs don’t like to talk. This means that one has to do the work in Scrivener, and arrange things so that the output is good enough as is, or in the page layout program, and have to renounce to the structuring, restructuring, revising features of Scrivener.

Paolo

nontroppo · September 29, 2022, 10:20am

Pandoc filters can be in any written language (I use Ruby, you can use Python, Perl, Bash etc), or you can use Lua via a built-in intepreter (much faster and more control). Pandoc also allows you to write custom writers using Lua. Both custom filters and writers can be anywhere on the hard disk and are specified using command flags, e.g. pandoc --filter=/path/to/my/filter for a general filter, pandoc --lua-filter=/path/to/filter.lua for lua filters and pandoc --to=/path/to/writer.lua for custom writers.

I do not understand anything about InDesign and what the difference is between paragraph and object styles. Generally, there are two types of structure, blocks and inlines. A paragraph is a block, a span is an inline, a figure is also a block, as is a code section, emphasis is inline etc. Anything that can be represented as a series of blocks and inlines, and for example HTML + CSS can do anything in terms of how to layout these blocks. How does InDesign handle this?

Regarding your specific problem with > Paragraph you could write a script to edit the ICML and remove this. I can quickly convert one of my existing ruby scripts to do this if you want…

ptram · September 29, 2022, 10:30am

Object styles in InDesign have properties not needed for paragraphs (and vice-versa). For example, you don’t need paragraph typographic composing for images or floating frames, and you don’t need kerning for images.

Also, paragraphs don’t need properties for anchoring mode, fitting of the contained image with the containing frame, way of pushing away the wrapping text, stroke and fill size, color and special effects.

Not having a way to deal with object styles ends up to the most basic of page designs. If you need something more captivating, there is no way to make a more creative use of the page elements.

Paolo

nontroppo · September 29, 2022, 11:23am

Indeed the fact that InDesign doesn’t use a cascading model (like the generic <div> in HTML) makes it hard to translate to both objects and paragraphs. Now again if the difference is trivial in terms of the XML (just the tag name is different), we could easily rewrite it. Do you have a minimal example of ICML of an empty object? The IDML spec (Download Limit Exceeded Appendix B 1.2) says that ICML is a subset of IDML, and can contain identical elements to IDML.

The other side of this coin is whether scripting can transform content in InDesign. InDesign appears to have a complex and fully-featured scripting interface. I just downloaded the SDK (http://www.adobe.com/devnet/indesign/sdk.html) and there are many samples of what appear to be complex document modifications. It appears scripting needs to be in applescript, javascript or vbscript. As you are not a coder this may not be of help, but I expect there is a solution to transform between paragraph and object blocks in the API, you could perhaps find help on the adobe forums or see if you can hire a specialist if scripting InDesign.

AmberV · September 29, 2022, 1:07pm

Yeah, I guess it depends on what your tolerance is for such things. I know a great many people do just that, and those who do it a lot have probably automated that to some degree. That is how publishing houses do it, automating from DOCX. Given the breadth of how one can do this, I wouldn’t necessarily say that Scrivener is in a worse position for that than most other tools. I.e. you’re going to find yourself in this position from most writing platforms, given most writing platforms are not hand-crafted stepping stones to InDesign.

That said if you are going to be doing a lot of this, I would say the basis you are working from is probably better than any other way of using Scrivener, so it makes sense to build off of where you coming from rather than switching gears entirely. For one thing you can easily decide to use Pandoc’s DOCX instead, rather than shoring up ICML—with the idea being of not creating a perfect DOCX file but finding a ready-made solution for reducing DTP workload, as that is a much more common route after all.

Scrivener and the page layout programs don’t like to talk. This means that one has to do the work in Scrivener, and arrange things so that the output is good enough as is, or in the page layout program, and have to renounce to the structuring, restructuring, revising features of Scrivener.

Well, that depends on definitions. If one is not wanting to create a system at all and expecting there to be something in place for them—then that is fair. But as I said before: what does? The differences in writing and designing are so vast that the models for visualising data have little in common. So conversion becomes not only a problem of translating language (RTF, XML/HTML, TEX etc.), but of transforming the structure of content between radically different models. What role would an object have in a program like Scrivener, for instance? What about master pages? We can conceive of parallels, particularly contrived ones, but how useful would they be to everything and everyone?

That is always going to be the problem with generalised converters, like this ICML writer. It can handle broad strokes, but could it do a lot more, to the extent where one hardly even need open InDesign? Probably not without becoming so specifically tuned to a design that it becomes useless to anyone else, or even to anything else than one book.

Case in point, my user manual project settings reduce my post-compile workflow to 100% proofing tasks. I don’t have to touch a single thing to implement the final typesetting. But could someone take that system and use it to create another design entirely? Not without learning how it was designed and changing a great deal about it—it wouldn’t even work well for another book, seeing as how much of that automation is so very specific to the content itself, not even the house design.

So while that illustrates one end of the extreme in terms of automation, it does also illustrate that Scrivener and page layout can talk, and at times to a level of specificity that means we might as well consider Scrivener to be the agent directly orchestrating the production of the PDF files—it’s just got a lot of libraries working for it in the back end (which if we’re going to be real, is how Scrivener gets just about anything done at all other than DOCX, which does has a purely native L&L writer).

Here’s the thing though, to loop back to the beginning of this though: I don’t have to do much by way of concession in how I structure the work in Scrivener for this—that is the advantage of creating an automation that is so very specific. Almost everything I do in the project benefits me as the writer, it is not for the design. I can get away with that because I’m have control over both variables.

This is true, I meant that more from the standpoint of wanting to change one aspect of the output (removing the extra string suffix from the style name) rather than starting from scratch. Indeed if you want to create your own writer, Pandoc supports most of the popular scripting languages, which is fantastic for those already familiar with one of those. You just have to learn the “API” so to speak.

As you are not a coder this may not be of help, but I expect there is a solution to transform between paragraph and object blocks in the API, you could perhaps find help on the adobe forums or see if you can hire a specialist if scripting InDesign.

Indeed that’s probably the most common route that referred to above: getting a DOCX in some form that an existing automation works well with, or at least is a better starting point for the pros that do this on the regular, and I do think the Pandoc approach would be superior to any other route Scrivener offers, for that—but I can’t say that for sure. What I can be more sure of is that most books get made from very primitive inputs, word processing based, that come from authors and editors.

I do believe one could go as far as I described above, with InDesign instead of LaTeX (as I use), but it would probably take at least an equivalent amount of work, i.e. programming.

ptram · September 30, 2022, 9:11am

Try this one:

TestI_CML_from_IDCS6.icml.zip (12.7 KB)

It is nothing more than this thing:

Paolo

ptram · September 30, 2022, 9:23am

To be clear: the workflow I’m trying to achieve is not the classic one needed for publishing a work that will not have to change. Not a novel. In this case, you get a manuscript from the writer, and the publisher makes it a printable book. The final book is the one crystallized in the page layout program.

My use case is the one of a user manual, that has to be frequently updated and translated. The goal is to keep the master document in Scrivener, with export/import of snippets to be translated to RTF or Markdown.

This is something that seems to work as is for producing ebooks. I suspect it may work for web help, using the .html files generated for the ebook.

It doesn’t work if generating a PDF is still needed. I suspect PDF are going to be a thing of the past for user manuals, but in most cases they are still asked by the companies. Some printed documents are still required by the import laws. And many still feel ‘safer’ with something with the shape of a traditional book.

Paolo

nontroppo · September 30, 2022, 11:45am

Yikes! Now the good news I suppose is the text in the anchored object is just a paragraph as exported with ICML, and the object and text frame are not part of a hierarchy (they don’t contain the paragraph), but use a ParentStory attribute to “include” the actual text. But wow, the XML markup is pretty impenetrable, object + frame takes up 6511 characters with multiple elements and a myriad of properties. No idea which elements/properties are essential and which are optional. No simple fix with a few regexes will solve this one…

nontroppo · September 30, 2022, 12:07pm

I think ultimately this really does force you to using something like PrinceXML or LaTeX rather than a GUI app, where you have much more precise control given these tools use text-specified layouts (HTML+CSS or TeX). If I had to do this, I would use Scrivener + Pandoc, utilising a Quarto template for very professional online HTML, a tweaked CSS for EPub, and a tweaked PDF output using PrinceXML / DocRaptor (I think paged CSS is good enough for most requirements, and CSS is far more logical to understand compared to LaTeX). PrinceXML is ~$500 for a single licence, but if you use DocRaptor you can use it as an online service for much less. LaTex definately wins on the price front though…

AmberV · September 30, 2022, 1:02pm

Understood! I guess what I was trying to get at though is that even in traditional publishing, where output is relatively static and revisions infrequent, you need automation. No publishing house is going to spend weeks on each book laboriously turning a TNR 12pt DOCX file into a hard cover form factor and then turn around and do another layout by hand to mass trade or whatever.

I do know where you are coming from and what you’re trying to do, and I believe the problems we face, as people producing many frequent revisions to a complicated document, and with higher standards than accepting what the macOS/Qt PDF printer with simplistic formatting can afford, is not too dissimilar to the more static publishing model. I.e. making one single book dozens of times isn’t terribly different, mechanically speaking, from making dozens of books once or twice.

So there is something to be learned from the field, and more importantly what can be expected from the software in the “12pt TNR” arena, like Scrivener and Word, even if the spread of how it is meant to be used is different.

This is something that seems to work as is for producing ebooks. I suspect it may work for web help, using the .html files generated for the ebook.

Ebooks can get away with it because they are little more than self-contained static, and fairly basic (like 1999 levels of basic), web pages running in a very stripped down specialised browser. That’s simple, in terms of the technology cap, and there are good frameworks for producing these basic HTML and CSS files (with scattered XML as glue and metadata). Programmers have no end of tools there, while tools for making InDesign files are effectively zero.

Meanwhile the source material for ebooks only has to concern itself with basic instruction. There is, for example, no consideration for optimising hyphenation, worrying about page colour matters like text rivers, awkward widows & orphans; there are no master pages, no complex style sheet conversions, no objects, and little by way of sophisticated layouts, in part because your design must be flexible enough to display on a business card sized phone screen, be monochromatic enough to be legible on eInk, displayed in 52pt, etc.

Creating a static, paginated design, is always going to be a lot more complicated than piping a bunch of paragraphs through a web browser (ebook reader) and having it crudely sorting out pagination, justification and other matters on the fly. The result is uglier, but the input is simpler and making tools that produce simpler input is thus much easier to do. Creating an effective ebook generator is something one person can sit down and do in a few months. Hell, a person with web design background can sit down in a coding editor and make an ebook from scratch at nearly the same speed it takes them to write the book, it’s so simple…

Anyway, sorry to get a bit abstract in the thread, but it seemed all right considering a broad topic like Scrivener to ICML, or really any publication medium. I think it is okay to accept that without considerable automation or considerable manual labour, expecting a miracle conversion system isn’t realistic at this point in time, and maybe never will be, considering the conceptual (let alone technical) problems with doing so.

@nontroppo : I think ultimately this really does force you to using something like PrinceXML or LaTeX rather than a GUI app, where you have much more precise control given these tools use text-specified layouts (HTML+CSS or TeX).

I couldn’t agree more, because what is interesting about these particular approaches is that we are moving the very difficult page layout problem into a space that much more closely resembles the much easier ebook design approach. While LaTeX has less support than HTML+CSS, it’s straight-forward enough that even a Scrivener project template can be set up to generate it—and both of the major Markdown conversion tools have excellent support for it.

It’s also interestingly conducive to final polishing right in Scrivener. I can spot a problematic justification solution in the output PDF and insert the overrides I might need. Perhaps I’ll force hyphenate a word or force a line break. That’s the kind of thing you’d generally have to manually proof the PDF for every single time you compile, with an eagle eye and an ever growing checklist, to fix in a traditional DTP workflow. But with a text-based workflow I just plug the \\ code or whatever into the text editor and I’m done forever with that problem line.

With a platform such as that, one is much closer to pure one-click compile automation in theory and practice, than I think they could ever be with the GUI-tool approach, with all of the proprietary and complicated file formats that come along with it.