Word DOCX Import and Split by outline drops document titles when page breaks are present

melindakraft · September 5, 2019, 1:07am

I have a Word DOCX file with page breaks between each chapter. Each chapter starts with an H1 heading followed by normal text formatting content and ends with a page break.

Import and split with outline view (remove first lines is not checked), the first few chapters came in normally, then the rest came in with no titles. What I found is that it has something to do with the way the page break sits in the document. When the page break is inserted while the cursor is sitting at the beginning of the next chapter title, it appears on the previous page as expected, but it doesn’t span the page (when paragraph view is on.

When I put the page break at the end of the last paragraph before the next heading, Word puts the break all the way across the page and inserts a paragraph at the top of the next page, which I manually deleted (so the heading was on the first line).

.

The result of an import and split is that the headings that follow this second page break example come in with no titles. I’m attaching my Word docx (Dorothy and the Wizard in Oz, which is in the public domain) file for you to test with. https://drive.google.com/file/d/14lG3n9AedaLXOrk_1eAxsBrdCFHTWpgB/view?usp=sharing.
You will see where the first few chapters import properly, but as soon as the importer encounters that full line page break, the titles are not added with the document.

For now, I have workaround by ensuring my breaks are inserted when the cursor is sitting on the line of each chapter heading, but it is a quirky thing to be sure.

kewms · September 5, 2019, 1:20am

Another alternative might be to use Word’s search feature to replace the page breaks with some other delimiter.

What version of Scrivener do you have?

Katherine

melindakraft · September 8, 2019, 5:34pm

I am running Mac version 3.1.3.

Thank you for the other find and replace workaround.

I have no idea what the difference between the two kinds of page breaks are, only that the one seems to trick Scrivener into not properly parsing the following heading as a document title.