Compiling to DOCX remove hyphens from linked image names

ptram · February 27, 2024, 3:59pm

Hi,

I’m using the latest version of Scrivener on Mac Monterey. When exporting to DOCX, hyphens from the name of linked images is removed.

So, for example, if I inserted an image linked from a file named “my-image.jpg”, the name is changed to “myimage.jpg” in the compiled DOCX file.

This is a problem, since the hyphen let one understand the name of the file in a way that would not be possible without it. This become particularly relevant when dealing with a high number of images, each one using parts of the name to be distinguished from the others.

Paolo

xiamenese · February 27, 2024, 5:02pm

Have you tried compiling to RTF and opening that in Word? Does it still happen? Doing that would help identify if it is KB’s DOCX exporter that is removing the hyphens or whether it is Word doing it.

Mark

ptram · February 27, 2024, 5:43pm

I didn’t!

Examining the compiled RTF file, I can see that the original image file name is preserved. However, the file path is removed. Apparently, the file can only be embedded, not linked.

As for checking if a path exists from inside Word for Mac, I admit I’ve not found a way.

As for Apple Pages: while it opens DOCX documents quite well (even if with character styles flattened to text attributes), it doesn’t include images when opening an RTF file, and all styles are flattened.

I also compiled as RTFD, and could find the linked images copied inside it, and with the original name. When opening the RTFD file, Pages has the images, but still the text styles are flattened. Word can’t open this type of file.

When only opening the RTF part of the RTFD file, both Word and Pages open the text without styles and images.

So, the naming issue seems to only exist in the DOCX compiler. But DOCX looks like the best format for excanging with word processors.

Paolo

ptram · March 6, 2024, 4:06pm

I would kindly ask @KB if he plans to change this behavior in a future update. This would completely change the way I plan using this feature.

Paolo

rms · March 6, 2024, 4:45pm

I’ve tried to replicate this is. See the attached Scrivener project and compiled DOCX attached. Using default settings to compile, the DOCZ file includes the images without regard to any linked back to the file names where they originate.

what did you you do differently?

Untitled.docx.zip (337.7 KB)
Untitled-bak-2024-03-06T16-54.zip (162.9 KB)

ptram · March 6, 2024, 8:28pm

As far as I can see, the linked images path is broken and the files are renamed also in your compiled DOCX file.

Your linked images, in Scrivener, have this path and name:

And this is how the image appears in the DOCX code:

</w:rPr><w:drawing><wp:inline><wp:extent cx="5943600" cy="4495800"/><wp:docPr id="1025" name="Screenshot_20240302_at_135059.png"/><wp:cNvGraphicFramePr/><a:graphic xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main"><a:graphicData uri="http://schemas.openxmlformats.org/drawingml/2006/picture"><pic:pic xmlns:pic="http://schemas.openxmlformats.org/drawingml/2006/picture"><pic:nvPicPr><pic:cNvPr id="1025" name="Screenshot_20240302_at_135059.png"/>

etc.

There are images embedded, and stored in the “media” section of the DOCX:

Names are changed compared to your original ones. Spaces have been converted to underscores, while dots and hyphens have been stripped out.

Paolo

rms · March 7, 2024, 6:27am

Ah, this now explains what you are talking about. In the ~17 years that DOCX has existed, I never dug into the XML like you seem to need to. I guess the work-around would be to not use hyphens in image files if how it works now breaks things for you.

ptram · March 7, 2024, 10:09am

If there is no other solution, this is a possible workaround. But I would like to first check for a solution in the compiler, since

a) removing any separation in the file name is not viable, making the thousand images in a project impossible to be recognized, and

b) words separated by underscores cannot be navigated with the Opt-Arrow key commands, making all the editing process more fatiguing.

I’m hoping this is something that can be done and is just some sort of bug.

Paolo

ptram · March 16, 2024, 11:35pm

By the way, I’ve seen that some authoritative source suggests to avoid underscores to separate words, preferring hyphens. They also suggest to avoid suppressing separation between words.

Image & File Naming Conventions : BSPH Website Help Desk.

It looks like I’ve always, by instinct, followed these recommendations!

Paolo

kewms · March 17, 2024, 4:47am

What is your ultimate goal here? In particular, why are you relying on a machine-readable name, like the path to the image, rather than a human-readable name, like an image caption? If the image itself appears in the DOCX file, why do you care what Word calls it?

ptram · March 17, 2024, 9:58pm

What I want to do is to keep the images linked, ready to use for the various source or output formats (Scrivener, Word, InDesign/Affinity Publisher, the web), easy to replace at once for all the outputs.

The images will be managed on disk, and a readable name will help finding them and identifying their content. I’m continually updating the images, while the product development continues, and while proofreading of my drafts are asking for further editing.
When importing the DOCX file into InDesign or Affinity Publisher, the images will have to be made linked, to avoid making the page layout file too heavy. When making the embedded images linked, ID and AfPub can relink to the existing, original files, or create a new set of images with the different name. I would want to only maintain a single set of images, to only update a single instance when changes happen.
I rarely use image captions. Most of my work consists in describing images in detail, so the whole text is the caption. This doesn’t allow for easy identification of the corresponding image file, since many of them are very similar, maybe only different for a modal dialog, or the like. I need a clear path to identify them from the name (something like "home-main-perform-page_menu-option_a.png). In any case, the project I’m working on has currently 1068 image files, and a clear archiving system would be essential.

Paolo

kewms · March 17, 2024, 10:04pm

But the DOCX file incorporates the images into itself. Test this to be sure, but I don’t think changing the image on disk (outside the DOCX file) is going to change what Word sees.

ptram · March 17, 2024, 10:27pm

Yes, but if you import the DOCX file into InDesign or Affinity Publisher, you can convert the images from embedded to linked.

It’s an easy procedure, where you choose to make the images linked, and the program asks you to select the folder where the original files are stored. ID/AfPub compare the name saved with the DOCX file, and relink the image to the file on disk.

At this point, the ID/AfPub file is again using linked images. No longer embedded ones.

As I wrote above, however, if the images embedded in the DOCX file have a name different from the original, relinking will not be possible, and the page layout program will have to generate a new set of files.

To give some more details about my goals: I want to single source a project in Scrivener, and publish it into several output channels:

Website, to be done with the help of Quarto. Images will remain linked (a Markdown link will exist in my Scrivener project, with a condition to make it available when compiling to Markdown).
InDesign or Affinity Publisher, to be done with an intermediate DOCX file. Scrivener will compile a DOCX file with embedded images. When it will be imported into ID/AfPub, I’ll be able to make the images linked again, and build a page layout project that will look as if originally made in that page layout program.

Having a page layout version is important for two reasons:

It will be required by some partners who still want to work with a page layout project.
It will be the best way to make a PDF file, with all the control a page layout program can offer. I would love to be able to use Typst, but I feel it will not be ready soon. In any case, I’ll need a page layout version for the reason described above.

Paolo

ptram · March 19, 2024, 5:18pm

I don’t know if this chart makes my workflow clearer or even more intricate. (But I wanted to experiment with Mermaid…).

ptram · March 22, 2024, 4:29pm

I know that L&L don’t answer on future plans, and I don’t want to be annoying with my requests. But since this will have a huge impact on my future workflow, I have to try and ask if a future version of Scrivener will preserve the name of the images linked to files when compiling or exporting.

The DOCX compiler is the fastest, more complete way to go from Scrivener to a page layout program, and keep the two in sync. The difference is therefore that:

if the names will be preserved, all I’ll have to do to make all the images linked to the original folder is to choose that folder; but
if the names will continue to be changed, each image will have to be relinked to the original file after importing the DOCX file into the page layout document.

The difference is in being able to make Scrivener the brain of a project even after multiple versions; or have to abandon it as soon as the project takes the shape of an early draft.

My hearth knows what it would love.

Paolo

AmberV · March 22, 2024, 6:40pm

I was unable to find an easy answer to my main question, but if I had to speculate, I bet there are pretty strict rules about what characters can be used for assets, and Scrivener is following them (why would it change things otherwise?). I tested or examined the products of several other tools and word processors:

Pandoc completely rewrites the image name using a serialised pattern, ‘rId[num].[ext]’. This naming scheme corresponds to the internal ID pattern used by OpenXML to map file assets to central unique identifiers, used elsewhere by other files in the archive (the document.xml file does not refer to images by name).
LibreOffice does very similar, though it uses a less opaque serialising technique, renaming everything to “image[num].[ext]”.
AbiWord uses a serialised naming scheme similar Pandoc’s.
WordPad.exe not only serialises to a name similar to what LibreOffice uses, but converts the image’s file format to WMF. Stay classy, Microsoft.
MS Word, as best as I can tell, also serialises using the ‘image[num]’ pattern, but does preserve the original file format.
Scrivener (macOS native) preserves the intent of the given image name, which in turn is set by default from the original filename, but presumably in order to avoid breaking XML and potentially other specification requirements, strips out some characters.
Scrivener (macOS Aspose / Windows Aspose) serialises the image names using the ‘image[num]’ pattern.

Overall, it does not seem any software at all would be conducive to the workflow you are proposing, and if anything Scrivener-macOS-native is probably the most permissive.

Meanwhile are you linking to images in Scrivener? Are your file names procedural in the sense that they have a predictable naming structure (rather than chaotic as in some are stock system default screenshot names, etc.)?

If the stars align, you could probably get away with using a bulk file renaming tool to conform your image names to something more DOCX-internal-friendly, let them all break in Scrivener, and use project replace with regular expressions to fix them all back. Those broken image link messages in the editor are text. The intent is that you can fix the path right in the editor, reload, and fix the link. Search and replace would do that at a much greater scale.

Obviously I would only embark on such an experiment with duplicate data—both the image folder and the .scriv.

ptram · March 22, 2024, 11:46pm

Ioa, thank you very much for taking the time to examine this issue!

I know that my image naming schema, however coherent, can’t be generalized. Too many modes and cases make it unlikely.

I can see two different solutions, to still generate materials for creating a PDF and editable materials for translating:

A) Directly from a word processor. If the main output channel has to be a Website, the PDF can be a little less graphically refined than in the past. So, the workflow could be:

Scrivener → DOCX → Word → PDF

My partners who want to only deliver a PDF file will receive a Word file, with embedded images, and work on it. It’s an “industry standard format”, so nobody can legitimately contest it.

Updates will happen with a new Word file, where I’ll highlight the text and the images to be replaced. No renaming involved: just copy & paste from my updated source file to their targets.

B) Page layout, derived from the DOCX file compiled from Scrivener. InDesign/Affinity Publisher will generate the linked image files from the embedded images.

Since Scrivener is so kind to preserve most of the original file names, I’ll just have to duplicate the original ones, strip the hyphens and other characters that are not converted by the compiler, and supply these replacement files to my partners.

Compared to the current procedure, where all the work is made in InDesign, the only difference will be this renaming thing.

And there is also the option of leaving all the images embedded in the page layout program. I don’t know how much this will slow my mammoth projects, but it is still an option.

Paolo

FamilyPuzzleSolver · March 22, 2024, 11:54pm

So I was reading the manual on an unrelated topic, but I wonder if it could be used as a workaround in @ptram Paolo’s situation.

I was looking at inline annotations etc. The one thing that struck me was the ability to retain urls within the inline annotations.

Is it possible to use these annotations with the caption - and print it for the first phase that goes to all the other apps? Since they can be set to print (or not) when compiled would using inline annotations help at least for that workflow to photo editing and file naming consistancy?

ptram · March 23, 2024, 8:54pm

The process I’m after has to be as automatic as possible. So, adding names manually wouldn’t be of much help.

In any case, the name and path of the original files will be in the original Scrivener project, as the Markdown link that will be included when compiling for the Website, but will be excluded when compiling to Word.

Paolo