Split text into several files, reassemble it when compiling

M55 · March 21, 2021, 10:10am

Hi everyone,

just a quick question: in a structure like shown in this image

[attachment=0]scriv1.jpg[/attachment]

I would like to have the compile output to pull the sub-chapter content from the text files, so if Text 1 contains "…bla bla bla end of text 1 " and Text 2 contains “start of text 2 bla bla bla…”, Sub-Chapter 1 compiles to

…bla bla bla end of text 1 start of text 2 bla bla bla…

without any added breaks, spaces or new lines and the whole of Sub-Chapter 1 still formatted according to the section layout that is assigned to it.

I can’t figure out how to define the section layouts for Text and Sub-Chapter. Can anybody give me a hint or am I trying to do something that’s just not possible?

Thanks,

Marc

Merx · March 21, 2021, 10:57am

Does this similar thread help?

https://forum.literatureandlatte.com/t/compiling-without-any-separators/93147/1

Merx

M55 · March 21, 2021, 12:45pm

Thanks Merx - kind of. I was hoping to avoid the additional step, but it’s a possible workaround. My bigger problem is that the section layout settings for the “Sub-Chapter” level don’t include the “Text” level, for example “Number of words to make uppercase” has no effect, probably because the “Sub-Chapter” actually doesn’t have any words.

I guess that’s just how Scrivener works and I need to rethink the layout.

gr · March 21, 2021, 1:49pm

You can do this. You can make a custom compile that removes the minimal carriage return between text docs. In this way you can even have sentences that span documents and which will get fused on compile. The trick is to realize that the Replacements specified in compile are performed after any separators between sections are added.

Suppose you have assigned a section type T to all the text docs you want to treat – that is, whenever two such text docs are adjacent in the binder, you want them to compile as truly continuous text.

Edit compile settings as follows:

In the Separators section, find the appropriate type for the text docs you want to treat (all adjacent text files? or just adjacent text files assigned to a certain section layout?) and specify a Separator Between Sections. Set that to Custom and type something that will not otherwise occur in your text. Suppose we use the symbol ‘@’ by way of example.
In the Replacements section, add a replacement entry to Replace string: \n@\n
Leave the With field blank ( assuming you don’t want to introduce even a space character between docs). Check the RegEx option.

That’s it!

Alternative: You could instead skip step (2) above and use the Replacements tab on the right side of the Compile dialog box instead. This would be useful if you wanted to be able to toggle on/off the replacement. If you went this route, you might want to include also a replacement to replace ‘@’ with nothing. In this way you could switch the fusion on (enable the former, disable the latter replacement) or switch fusion off (disable the former, enable the latter replacement).

gr

brookter · March 21, 2021, 2:41pm

GR has explained how to join the ‘Text sections’ together (as I was typing a reply… :).

You can also make the first X words of the subchapter be upper case.

Instead of having a separate binder document for the first ‘text’, just put that content into the text of the Subchapter itself, and set it to have the first X words in upper case.
In the Separators section, have you custom separator for you Text section type/layout as BOTH ‘before’ and between’.

Here’s a dummy project illustrating the binder layout and output:
[attachment=0]Screenshot 2021-03-21 at 14.40.09.png[/attachment]

The Separator panel looks like this:

[attachment=2]Screenshot 2021-03-21 at 14.23.34.png[/attachment]

And the replacement screen like this

[attachment=1]Screenshot 2021-03-21 at 14.24.03.png[/attachment]
You only have to set all this up once.

[BTW, of course you could just have a fifth Section Type and Layout to take care of the first text in a subchapter, but there’s really no point when you can use the Subchapter itself…)

HTH.

Merx · March 21, 2021, 3:33pm

gr and brookter, many thanks.

Merx

kewms · March 21, 2021, 3:59pm

Note that you can create whatever additional Section Layouts you need.

Katherine

M55 · March 22, 2021, 8:26pm

Thanks everybody for all the replies.

It looks like that’s not even necessary and I could just use ‘first X words in uc’ in the Text sections - only the first Text section ever follows a page break, so the result looks exactly like if the actual text was contained in the Sub-Chapter files.

I tried to follow your advice, but no luck. Here’s what I did:

[attachment=0]scriv-concat.jpg[/attachment]

I think you’re correct the replacements are applied after the addition of separators, but it looks like they are also applied only after the text is converted to HTML, at least in my case. So I tried removing the complete

<p class="separator">XXXX</p>

and that did work.

Next I tried a regex that should match the structure around the XXXX separator and replace it with a concatenation of the two paragraphs around it: regex101.com/r/FYJ6vS/1 seems to work fine in the tester, but I couldn’t get the same result in Scrivener (I know that HTML and regex isn’t exactly the best idea anyway).

@Katherine: can you comment on the order of Replacements and if I butchered the regex somehow?

Thanks,

Marc

kewms · March 22, 2021, 9:28pm

Yes, Compile format Replacements are invoked after the text is converted to HTML, as one of the uses for Replacements is to save you from having to type the full HTML (or LaTeX) syntax for complex elements. See Section 23.4.4 in the Scrivener manual. (You’re actually using a Replacement in the Compile Format, discussed in Section 24.15, but the syntax is the same.)

I’m not very familiar with regex myself, so I’ll leave that part of your question to others.

Katherine

gr · March 23, 2021, 2:45am

Marc,

But why are you wildcarding so many of parts of the target string. The bit of html you are trying to replace will always be exactly the same. It is a constant. So take that exact string, flank it with newline specifications and then insert backslashes in front of any of the inner characters that need escaping. Isn’t the regex you want just this:

 <p class="separator">XXXX<\/p>

gr

Yeah, did not know you were aiming for epub. I tested my original write up only with pdf as target format.

brookter · March 23, 2021, 3:12am

Doesn’t it have to be a bit more complicated that that though? You’re removing the separator line, but you’re still leaving the

from the preceding section, and the <p class= etc from the next on, so won’t it still treat them as separate paragraphs? I think you may also need to escape the ".

I tried doing this outside compilation (as a test with vim regex) with

:%s/<\/p>\n<p class=\"separator\"XXXX<\/p>\n<p id=\"doc\d+\">//g

and it worked (though it’s probably quite fragile! and dependent on my test project setup rather than being general.) I haven’t tried putting that through the replacement dialogue yet as I should have been in bed hours ago…

M55 · March 23, 2021, 8:52am

@gr: Sorry, I should have mentioned I’m compiling to epub. My bad. The regex you posted would leave

</p></div><div class="snippet"><p class="ps2" id="doc11">

between the two Text blocks (just like Brookter said), so they wouldn’ t be merged together.

@Brookter: You’re right of course, and the regex I actually came up with was https://regex101.com/r/FYJ6vS/1/, but I couldn’t make it work in Scrivener. I agree this is messy and doesn’t look reliable at all. Using vim or sed for this might work, but I think I’ll try compiling to Markdown first. That should be a lot easier to clean up.

gr · March 23, 2021, 10:36pm

Ah yes. I forgot about the need to do the equivalent of removing the pre and post newlines!