RegEx to split paragraphs into single sentences?

Hi there :slight_smile:
Does anyone know if it would be possible to use a Regular Expression to turn all sentences of a document into their own single paragraphs.

I mean (all at once) replace period-space with period-carriage return ;
?-space with ?-carriage return ;
!-space with !-carriage return ;
…-space with …-carriage return.

And if so, would it be then possible to include “followed by a capital letter” as a condition ?
So that → Isn’t it so ? he said. ← wouldn’t split.

Thanks

“Isn’t it so? Vincent said” would Split.

Yeah. Nothing is perfect.
I can’t think of anything to do in such a case.
On the good side, I write in french, and the name comes after for us.

C’est pas comme ça ? demande alors Vincent.

This works: Search for: ([\?\!\.\…])\s([A-Z]) replace with: $1\n$2

Explanation and Test at: RegEx101

Beware: the $2 capture group may not work due to a bug in the RegEx implementation in Scrivener. You could remove ([A-Z]) and $2, but the line without a capital letter in the next sentence will then split.

1 Like

Thanks Antoni.
Ok. Just tried it out.
It only partially worked :
It introduced \n to where the splits should have been instead of a carriage return.
And it did so even where it wasn’t followed by a capital letter.

I just noticed that it actually replaced the space by \n, it did not just add it.

Now I tried with “ignore case” unchecked :
It is better, the formula now respects the “followed by a capital letter” condition. But it is still just replacing the space following the ? ! . or … by \n.

Same results using find rather than project replace.

So, basically, I just need \n to end up being a carriage return instead…

1 Like

Why? The lines are not separate paragraphs without a \n (¶) line break.

??? I have no idea what that is supposed to mean. Sorry.
Did you look at my screenshots ?

It means \n is the paragraph break character.

That worked :
I replaced \n with a carriage return

1 Like

I see. But I want my paragraph to actually be split (in the editor). Not only after compile.

I don’t know what language you’re speaking now. \n is the paragraph break. Carriage return makes it look like separate paragraphs, but they’re not.

You know what would help ? You looking at my screenshots first.

And I think you are just uselessly confusing me now. They ARE separate paragraphs.

This is what my previous screenshot should have been (don’t know how I managed to mess that up) :
image

Carriage return instead of /n in the RegEx formula makes all the splits be new paragraphs. It works.

Thanks Antoni. :slight_smile:

¶ is a paragraph break, not a carriage return.

image

Glad that worked. \n should’ve worked as well, but the RegEx implementation in Scrivener is… incomplete. :slight_smile:

If you need that space you could either move the ")"after the “\s” or add a space after $1 in the substitution.

Do you usually add a space before a question or an exclamation mark? Seems odd to me.

2 Likes

Nope. I’m fine without an extra space. But thanks for the info. :slight_smile:

That is French punctuation.
I know my test sentence is English, but might as well run the test as if French, just in case.
(And yes I do it a lot in my posts too. I find it easier to read. Or maybe I’m just too used to it. :wink: )

I think there’s confusion because of the difference between Windows and Mac where CR and LF are concerned, but here’s a simple demo of the difference, on a Mac at least. It may not be relevant to the behavior of Scrivener on Windows, but it is relevant to the meaning of the symbols and . In the 1st screenshot, you’ll see a few paragraphs separated by ¶s (as always) together with a Find/Replace dialog prepared to replace with .

In the 2nd screenshot, you see the result of performing the replace once. The paragraph after it still starts on a new line, but now it’s part of the previous paragraph, which you can see from the fact that it has no indentation.

@Vincent_Vincent 's last screenshot suggests I was wrong about the nomenclature.

I thought carriage return = line feed = .

Instead it seems that

  1. line feed =
  2. carriage return = paragraph break =

Mea culpa!

1 Like

FWIW, this variation also seems to work:

Search for:

([.?:;!…])( )([A-Z])

Replace with:

$1\n$3

… although in Scrivener it seems we need to use a paragraph break in the replace field:

Test: regex 101

In RexEx many different solutions are possible. :wink:

This would split the sentences on “:” and “;” too…

1 Like

Indeed, yes. Configurable by the user to add or remove any punctuation points they want.

I personally like the restriction of searching for a single space ( ) rather than white-space \s, which, as I understand things, matches spaces, new lines, carriage returns, and tabs.

Good to have options.

1 Like