Multiple Word document export

I was recently trying to adapt the splitter script from here:

to produce multiple .docx files instead of .tex files.
I specified the file extension in the script: FILE_EXTENSION = 'docx' and changed the pandoc line accordingly pandoc -t docx --bibliography=$HOME/.pandoc/Bibliography.bib -N --reference-doc=$HOME/.pandoc/templates/refdoc.docx -o #{filename} #{tmpfile.path} yet I don’t get any output at all.

My log says " Initiating with Ruby 2.3.7
Pandoc: /usr/local/bin/pandoc | V: 2.7.2"
and the rest remains empty too.

Is there anything else I need to modify?

Any of the markdown experts able to help out here?

It would help if you can create a sample project with all the files needed to reproduce this, I don’t have much time at the moment, but it should be working…

Here’s what I’ve tried … (251 KB)

OK, there was an issue with the splitter script that expected file extension to be 3 letters long (tex was OK but docx broke). I’ve added better log output to make any errors here more obvious and cleaned up a bit.

In addition, for this workflow to work you must edit the compile format too to give the proper file names for the splitter script to understand, they were hard encoded to tex before. I’ve added a custom metadata (fileType) and used placeholders to use this instead, and removed the file extension for references:
Screenshot 2019-05-06 at 09.37.46_SMALL.png

Screenshot 2019-05-06 at 09.38.51_SMALL.png

Screenshot 2019-05-06 at 09.38.20_SMALL.png

This now compiles to multiple docx files:

=== ------------------------------------------------------ ===
=== Splitter V1.0.1   Report @ 2019-05-06 09:33:12 +0800   ===
=== ------------------------------------------------------ ===
 Working directory: /Users/ian/Desktop/Compile_mmd
 Initiating with Ruby 2.3.7
Pandoc: /usr/local/bin/pandoc | V: 2.7.2
 Input document is 142 lines long...
 Input is comprised of 4 sections (including metadata and references)...

:: Running: pandoc -t docx --bibliography=core.bib --csl=csl/apa.csl -M reference-section-title=References -N --reference-doc=templates/custom.docx -o 1-RedBook.docx /var/folders/my/y9spf62925l2l97hn695ry200000gn/T/1-RedBook.docx20190506-72739-1ojrfnv ::
:::: [WARNING] Note with key 'cf3' defined at line 88 column 1 but not used.
:::: [WARNING] Note with key 'cf4' defined at line 92 column 1 but not used.
:: exit status: pid 72750 exit 0 

:: Running: pandoc -t docx --bibliography=core.bib --csl=csl/apa.csl -M reference-section-title=References -N --reference-doc=templates/custom.docx -o 2-BlackBook.docx /var/folders/my/y9spf62925l2l97hn695ry200000gn/T/2-BlackBook.docx20190506-72739-a39eak ::
:::: [WARNING] Note with key 'cf1' defined at line 80 column 1 but not used.
:::: [WARNING] Note with key 'cf2' defined at line 84 column 1 but not used.
:: exit status: pid 72752 exit 0 

Splitter.rb V1.0.1:

#!/usr/bin/env ruby -wU
# encoding: utf-8
# frozen_string_literal: true
Encoding.default_external = Encoding::UTF_8
Encoding.default_internal = Encoding::UTF_8
require 'tempfile'
require 'open3' # ruby standard library class to handle stderr and stdout

VERSION = '1.0.1'
SPLIT_MARKER = '>>>> '
CMD = 'pandoc -t docx --bibliography=core.bib --csl=csl/apa.csl -M reference-section-title=References -N --reference-doc=templates/custom.docx -o'
matchExtension = /\.#{FILE_EXTENSION}$/

puts "\n=== ------------------------------------------------------ ==="
puts "=== Splitter V#{VERSION}   Report @ " + + '   ==='
puts '=== ------------------------------------------------------ ==='
puts ' Working directory: ' + `pwd`
puts " Initiating with Ruby #{RUBY_VERSION}"
puts `echo "Pandoc: $(which pandoc) | V: $(pandoc -v | sed -nE '1 s/^pandoc // gp')"`

# Split the input file by markers, writing the contents into temp files. Access temp file with file_chunk['name_of_file.txt']
file_chunks = {}
metadata_block = nil
input = ARGF.readlines
puts " Input document is #{input.length} lines long..."
chunks = input.join.split(SPLIT_MARKER)
puts " Input is comprised of #{chunks.length} sections (including metadata and references)..."
chunks.each do |chunk|
	# Store the first chunk as MMD metadata, to be added to each temp file
	unless metadata_block
		metadata_block = chunk
	next if chunk.length < 1

	chunk.gsub!(/<gl>(.+?)<\/gl>/) { $1.gsub(/([^-=\s]+)/, '<gl>\1</gl>') } # make sure sequences of glosses have enclosing tags

	filename, *lines = chunk.split("\n")
	unless filename =~ matchExtension || filename == 'references'
		puts ' !File extension is incorrect!'
	tf =
	# Add metadata to top of each temp file; unless we're in the reference list
	tf.print metadata_block unless filename == 'references'
	tf.print lines.join("\n")
	file_chunks[filename] = tf
if file_chunks.empty?
	puts ' Could not properly split file, check errors!'
# Pull out the references.txt temp file. We will append it to the bottom of each document that we process. It contains figure and footnote references. Pandoc will ignore any that do not apply to the section, so this can be done blindly.
references = file_chunks.delete('references')
file_chunks.each_pair do |filename, tmpfile|
	tmpfile.print references.readlines.join("\n")
	cmd = "#{CMD} #{filename} #{tmpfile.path}"
	puts "\n:: Running: #{cmd} ::\n"
	Open3.popen2e(cmd) do |_stdin, oe, thread|
		while (line = oe.gets)
			puts ':::: ' + line.chomp
		exit_status = thread.value
		puts ":: exit status: #{exit_status} \n"
		puts '!!!---RETURNED non-zero value---!!!' unless exit_status.success?

Another small issue in your test project, the compile metadata names for “Title” and “Author” should be lowercase for Pandoc… (236 KB)

Thank you so very much!
This is perfect.

I ended up using [code]

instead of the custom fileType metadata, because the goal is to be able to export the same sections as separate tex files or as separate docx files, so I would have had to change section type each time and here it’s more economical to have two different compile formats.

But thank you also for improving the script more generally (I’ve adapted the tex version accordingly). I’ve learned a ton – :slight_smile: