Kindlegen error about multiple titles with Pandoc epub export

Error from Kindle Previewer when attempting to open an ePub file compiled from Scrivener:

[code]*************************************************************
Amazon kindlegen(MAC OSX) V2.9 build 0830-03578f
A command line e-book compiler
Copyright Amazon.com and its Affiliates 2015


Info:I9026:option: (hidden) amazon creator tool or pipeline
Error(opfparser):E20006: There are more than one title defined in OPF metadata. But none of them is refined with “title-type” as “main” title. Refer http://idpf.org/epub/30/spec/epub30-publications.html#sec-opf-dctitle for more info.
Error(prcgen):E21011: The book title was not set. Please set the title before generating the mobi.
[/code]

Content of content.opf from the ePub:

<?xml version="1.0" encoding="utf-8"?> <package version="3.0" unique-identifier="epub-id-1" prefix="ibooks: http://vocabulary.itunes.apple.com/rdf/ibooks/vocabulary-extensions-1.0/" xmlns="http://www.idpf.org/2007/opf"> <metadata xmlns:opf="http://www.idpf.org/2007/opf" xmlns:dc="http://purl.org/dc/elements/1.1/"> <dc:identifier id="epub-id-1">urn:uuid:D2A497E8-EB7F-4759-9658-2A5CB9E29D9A</dc:identifier> <dc:title id="epub-title-1">NovelTemplate</dc:title> <dc:title id="epub-title-2">NovelTemplate</dc:title> <dc:date id="epub-date">2018-05-17T14:37:40Z</dc:date> <dc:language>en</dc:language> <dc:creator id="epub-creator-1">PK</dc:creator> <meta property="role" scheme="marc:relators" refines="#epub-creator-1">aut</meta> <dc:creator id="epub-creator-2">PK</dc:creator> <meta property="dcterms:modified">2018-05-17T14:37:40Z</meta> </metadata> <manifest> <item id="ncx" href="toc.ncx" media-type="application/x-dtbncx+xml"/> <item id="nav" href="Text/nav.xhtml" media-type="application/xhtml+xml" properties="nav"/> <item id="style" href="Styles/stylesheet1.css" media-type="text/css"/> <item id="title_page_xhtml" href="Text/title_page.xhtml" media-type="application/xhtml+xml"/> <item id="ch001_xhtml" href="Text/ch001.xhtml" media-type="application/xhtml+xml"/> <item id="ch002_xhtml" href="Text/ch002.xhtml" media-type="application/xhtml+xml"/> <item id="ch003_xhtml" href="Text/ch003.xhtml" media-type="application/xhtml+xml"/> <item id="ch004_xhtml" href="Text/ch004.xhtml" media-type="application/xhtml+xml"/> <item id="ch005_xhtml" href="Text/ch005.xhtml" media-type="application/xhtml+xml"/> </manifest> <spine toc="ncx"> <itemref idref="title_page_xhtml" linear="yes"/> <itemref idref="ch001_xhtml"/> <itemref idref="ch002_xhtml"/> <itemref idref="ch003_xhtml"/> <itemref idref="ch004_xhtml"/> <itemref idref="ch005_xhtml"/> </spine> <guide> <reference type="toc" title="NovelTemplate" href="Text/nav.xhtml"/> </guide> </package>

The title appears twice, as kindlegen complains:

<dc:title id="epub-title-1">NovelTemplate</dc:title> <dc:title id="epub-title-2">NovelTemplate</dc:title>

How to reproduce

  1. Create basic novel.
  2. On the Compile screen, set the “Title” and “Authors” and leave other metadata blank.
  3. Under Metadata on the Project Formats screen, make sure everything is blank.
  4. Compile from Pandoc to ePub. (problem occurs with both ePub 2 + 3)

Versions:

Scrivener: 3.0.2
OS X: 10.13.4
Pandoc: 2.2.1, installed via package
Kindle Previewer: 3.22.0

This seems to be a bug in Pandoc. You can check that Scrivener’s side of things is all fine by ticking “Save source files in a folder with exported file” in the Compile options. If you open the .xml and .txt file that this results in, you will see that there is no problem. These are the files that Scrivener is passing to Pandoc.

A Google search confirms that this is indeed a Pandoc issue:

groups.google.com/forum/#!topic … iR_wXqLHxk

It might be worth trying to convert the ePub using KindleGen to see if that works (although I would guess that Kindle Previewer uses KindleGen internally).

This is not a bug in Pandoc. If you compile to Multimarkdown (pandoc flavour), and use pandoc directly:

pandoc -t epub -o test.epub test.md

Then only one dc:title is created. What seems to be happening is that the metadata is not being correctly assigned to the correct yaml values. So for example this metadata:

Screen Shot 2018-05-18 at 14.49.44_SMALL.png

… gets converted to this invalid metadata (Pandoc requires YAML formatting):

% TEST
% Joanna Doe

Whereas when I compile to Pandoc flavoured Multimarkdown I get this by default:

---
Title: TEST  
Author: Joanna Doe
---

This is better but still not valid. Pandoc metadata is case sensitive and it should be:

---
title: TEST  
author: Joanna Doe
---

This then generates an EPUB without problems. It should be very easy for Keith to fix this as it already works for Pandoc output via the Multimarkdown option, just with the proviso that by default the title and author should be lowercase. This case for title/author can be modified by the user manually, but I suppose it would be nice if Scrivener could set this by default. Because MMD also supports YAML style metadata, I did ask previously if it wouldn’t be easier if Keith just used YAML by default, this would mean just one type of metadata style to convert irrespective of the compile type…

??? Since when did Pandoc require YAML formatting. Scrivener has always output Pandoc using the percentages for metadata, as per the docs:

pandoc.org/MANUAL.html#metadata-blocks

According to the docs, either the percent-based metadata (pandoc_title_block) or YAML metadata blocks (yaml_metadata_block) can be used, so I’m not sure you are right that this isn’t a bug in Pandoc.

EDIT: Never mind, I see that the documentation specifies a different (YAML) sort of metadata for ePub, which is a bit silly.

Actually, it turns out that this has nothing at all to do with the format of the metadata provided. The problem is that Scrivener is including a metadata block at all, because Scrivener is already providing the metadata as an XML using --epub-metadata (the same XML metadata as is generated for non-Pandoc epub files). So because Scrivener is providing a metadata block inside the Pandoc file too, Pandoc is merging that information into the provided metadata.xml file, resulting in duplicate entries. So I have fixed this for the next update by simply not providing any Pandoc metadata when exporting to epub since the Metadata.xml file is already being fed in.

This really does seem to be a bug in Pandoc. If I omit the metadata from the top of the text, then no title page is generated. But if I provide that metadata to generate a title page, then it also gets merged into the metadata.xml file I provide and results in this problem. That doesn’t seem like intentional behaviour to me.

Keith, you are correct I’d forgotten that % metadata was a default extension (I assumed it was an MMD compatibility extension). Everyone I know using Pandoc uses YAML as it is much more flexible and extensible, and integrates with the Pandoc templates and output more closely. Anyway, glad that you figured out the real issue.

I’m not sure if this is helpful, but this is from the Pandoc manual (https://pandoc.org/MANUAL.html#extension-yaml_metadata_block):

The “document” here appears to refer to everything being compiled in one pass (meaning that multiple files are treated as one document). So, this does appear to be a bug in Pandoc-- according to the documentation, when the same value is set multiple times, Pandoc should prioritize one of the metadata blocks and use that value, not duplicate the values.

I’ve worked around the issue as follows:

  1. In the metadata.xml file passed in to --epub-metadata, I know longer include the title or author elements for Pandoc > ePub.

  2. I add the title and authors in the metadata block at the top of the Pandoc .txt file (using YAML for epub format here in case multiple authors are added).

This ensures that the title page gets created from the metadata block and that there is no duplication in the OPF file. All this is done for 3.0.3, which should be released next week (fingers crossed).

All the best,
Keith