Stopping Markdown Output for Figure Sizes

nontroppo · September 28, 2023, 8:03am

I am currently using PDFs for images embedded in the editor from the Binder. This in general works fine. BUT for some reason Scrivener is always outputting {width=x and height=x} in unitless values in the image links and this causes problems (images end up tiny in Quarto to Word / LaTeX, and Typst breaks entirely before a final PDF can be rendered). This affects PDFs, SVGs and PNGs alike, with PNGs at least we can manually change the values with the Scale Image… dialog, but this dialog doesn’t apply to PDF / SVG etc.

![My figure caption][fig1]

[fig1]: MyFig.pdf {width=595 height=326}

In my pandocomatic workflow I made a preprocessing script to fix this (forcibly remove all size info), though this script doesn’t work in Quarto. But it would be much better if we could control this somehow. For example only include sizes if a user has resized an image in the editor view, don’t apply to images that don’t have a size dialog etc.

@AmberV — is there any trick to stop this?

nontroppo · September 28, 2023, 8:27am

To slightly fix my own issue, as I do use a custom ruby preprocessor script to get Quarto to work, so I added this regex to strip the sizes out:

# This regex removes {sizes} from image links, e.g. [Fig1]: Fig1.pdf {width=596 height=233}
text.gsub!(/^(\[[\d\w\s]+\]:[^\{]+)(\{.+\})/, '\1')

Full script here: https://github.com/iandol/scrivomatic/blob/master/quarto-run.rb#L72

AmberV · September 28, 2023, 11:54am

Yes, I’ve sometimes used post-processing to strip them out, but not often as I tend to never resize images in the editor (which ideally one shouldn’t do anyway, as that changes the DPI).

As for why they are unitless, I went by the Pandoc documentation on this, which declares that if a unit is left off, it assumes pixels, with the note that this may be used as a basis for conversion to a more appropriate measurement (converting px to in based on image DPI, for LaTeX, is one example given). This thus seemed the safest to me based on the wide latitude Scrivener’s output may be used for, and the fact that a more technically accurate unit might not be broadly supported (points, in this case; not too friendly with ePub, etc.).

If the images are coming out too small, you might need to set the DPI fallback, like --dpi 72, because the default is 96, which would make them smaller than intended in that scenario anyway. I’m definitely getting the correct scales in all outputs I’ve checked, if I have the DPI set to match the images.

For example only include sizes if a user has resized an image in the editor view, don’t apply to images that don’t have a size dialog etc.

Well that should be working the way you expect it to. I just ran a quick test with six images, one resized in the editor and the other left alone, for each of disk linked, binder linked and fully embedded. In each case I got:

[TestImage]: TestImage.png

[TestImage]: TestImage.png {width=265 height=170}

Note we are even optimising the output, detecting the same data is being used but with different visual display settings. MMD output is similar, though with its variation on how to do this (including the addition of ‘px’, which it requires).

Let me know if you can track down a condition where that isn’t working as it should be.

nontroppo · September 28, 2023, 11:18pm

I attach a test.scriv project where I always get figure sizes. I have a PNG and PDF, I drag-from-binder and have “Link to image” enabled (binder linked). I have not applied the scale image dialog. The PNG is 400DPI, the PDF has no DPI.

The project is pandoc markdown output, no post-processing enabled. The .md file ends up with:

…

![{#fig-png} (**a**) Interesting.  (**b**) Boring.][PNG]


…

![{#fig-pdf} (**a**) Interesting.  (**b**) Boring.][PDF]

…

[PNG]: PNG.png {width=595 height=233}

[PDF]: PDF.pdf {width=596 height=233}

test.scriv.zip (237.0 KB)

macOS Sonoma + Latest Scrivener (which is random crashing quite a bit, but I can’t pin down a cause, often when changing a setting)

AmberV · September 29, 2023, 12:44pm

macOS Sonoma + Latest Scrivener (which is random crashing quite a bit, but I can’t pin down a cause, often when changing a setting)

You might want to try saving your preferences as a preset and do a full .prefs file reset. It could be something got messed up in the upgrade. A messed up setting could cause a problem either when changing the .prefs file or when accessing a piece of interface using that setting.

Okay, I see what is going on. If you look at the RTF file itself in a text editor, you’ll see that the linked image is referenced along with its display size, which is stipulated in points. So the PNG for example is 595pts wide, but at 400 DPI the pixel width is 3308. Thus given the data it has to work with, it appears “resized”, even though you haven’t touched it.

My testing was done with screenshots, which use the native resolution where points = pixels, and so that is how I got the result I did. So from this the current implementation is a bit limited in that it only works with screen resolution graphics. I think this could be generally improved by having the compiler’s reckoning of an image’s scale be determined by the same algorithm that generates its display size into the RTF code. That should be a more accurate understanding of whether the display data in Scrivener is representing the display metadata for the image itself on the disk (binder or direct link). I think that’s probably an easy enough tweak, and it doesn’t involving changing how the image is stored or anything.

But, for now post-processing remains the answer on that score.

Units in Output

Do you think it might be a good idea for us to be more literal with what we are putting as size data into the Pandoc output, as we do for MMD? From the reading of the documentation, it seemed like it would be effectively treated correctly to let it handle things with fallbacks, but since it is using 96DPI as a base for pixel conversion to print measurements, that isn’t so good.

Here is a test I ran:

![400 unsized exampe](400dpi.png)

![400 scaled example](400dpi.png){width=595 height=233}

![400 points](400dpi.png){width=595pt height=233pt}

LaTeX output:

\begin{figure}
\centering
\includegraphics{400dpi.png}
\caption{400 unsized exampe}
\end{figure}

\begin{figure}
\centering
\includegraphics[width=6.19792in,height=2.42708in]{400dpi.png}
\caption{400 scaled example}
\end{figure}

\begin{figure}
\centering
\includegraphics[width=8.26389in,height=3.23611in]{400dpi.png}
\caption{400 points}
\end{figure}

When the unit is stipulated as points, we get an accurate conversion of how wide 595pts are on paper. Which is of course the intention behind Scrivener using that number.

But, on the other hand, when I convert to HTML, the use of points as units causes it to convert to the same inch measurements, which are not recommended in CSS.

So the other solution is for us to add --dpi 72 to the internal command lines, which causes “595” to equate to 8.3in without having to stipulate a unit. In the above example if I add that to the conversion command, both the second and third images come out correctly, in LaTeX and DOCX.

That doesn’t help people using the Processing pane, and having to somehow know that they should almost always use that because of how Scrivener works. I could certainly add it to the “Basic Pandoc” compile format starter, which would help. And I guess we could maybe add it into an empty arguments field when the Pandoc Syntax checkbox is enabled. Maybe that’s good enough?

In your experience, is that a good solution though? From my testing it looks safe, and is producing the right results.

nontroppo · September 30, 2023, 5:47am

OK!

Yes, it’s a hard call given the differences across all the formats Pandoc supports – I suppose DOCX/ODT / LaTeX / HTML could be considered the major outputs to optimise for. I think your idea to set DPI as a default may be helpful, or at least should improve things. Those that use custom pandoc post-processing will have to deal with this themselves — I suppose specifying why Scrivener does what it does should help us make decisions and it is probably better to underspecify than overspecify to give the post-processing maximum flexibility?

AmberV · September 30, 2023, 10:42am

Yeah, to clarify on that last point, I don’t mean that we should force the dpi argument, basically prepending it behind the scenes, when Pandoc Syntax is ticked, I mean adding that as editable text to the arguments field as a hint. One can freely remove it or change it after that point.

Hmm, though in sleeping on it, maybe a better solution is to reveal a checkbox, enabled by default, when Pandoc syntax is enabled, that toggles it on and off, with some help text to explain what it will do and why that’s generally a good thing if you’re using Pandoc with Scrivener. Then, when it is enabled, it would prepend it behind the scenes. If someone isn’t using Pandoc, or wants to specify their own fallback, they can disable it and handle it however in the Arguments field.

That feels cleaner to me than adding text you have to understand, and then delete if you understand and don’t want it—and maybe you aren’t going to use Pandoc anyway but some wrapper script, etc.

nontroppo · October 2, 2023, 12:19am

Agree, that sounds an ideal solution to me!