RegEx replacement - theory and example please?

Julian_M1 · April 8, 2022, 9:07am

OK, I read the manual & referred to perldoc but I can’t seem to make a replacement with RegEx

Given this text (MWE)

“Sphinx of black quartz, judge my vow.”

UPDATED to make the missing underscores appear
And the Find regex ([^A-z])S, what is the Replace regex to replace the S with “_S_”? (keeping the text in the first group and changing the text after)

i.e. “SPHINX” → “_S_PHINX”

I tried \g1 (per perldoc), $1 (per a Q &A) in the forum, and \1 (c.f. word) but none worked…

AntoniDol · April 8, 2022, 9:48am

Regex 101

I have no idea what you want to achieve. Can you give the start and end results?

Julian_M1 · April 8, 2022, 10:25am

That will be because I didn’t check the result… and markdown disappeared my underscores - sorry about that (I’ll go back and correct)

Turn “SPHINX” into “_S_PHINX” thx!

AntoniDol · April 8, 2022, 11:20am

RegEx

Now, the RegEx code in Scrivener’s Project Replace is a bit buggy, especially concerning replacements. So this may not work in Scrivener. Maybe Document Find and Replace RegEx gives better results?

P.S. I noticed I missed a “^” in the last group of the RegEx, but it doesn’t affect the result.

Julian_M1 · April 8, 2022, 11:50am

Many thanks, but as you feared this doesn’t work in practice… I tested a regex at that site before using it to completely mess up my MS (because it “worked” differently in Scrivener i.e., didn’t), so I’d prefer to know what works and what doesn’t in the target application

In a document, your example

Find: ^(S)(phinx)([^A-Z]+[a-z]+)
Replace: _$1_$2$3

Result: Not found

And if I use my own pattern, document containing (with quotes) “Sphinx of black quartz, judge my vow.”
Find: ([^A-z])(S)
Replace: $1_$2_
Result: "S_$2_phinx

i.e. 1st group replacement worked but not the second (should have been just the S)

I don’t use regexes very often because I’m not very good with them (often not knowing what the specific regex system is, doesn’t help, but in this case it’s clear). How can I tell whether I’m making a mistake or this is buggy? Should my version have worked?

PS For others…

The regex website I referred to is https://regex101.com/ and I think Scrv uses PCRE2… repeat, I think

And issues with the QT5 implementation used by Scriv seem to be covered here: Regexp engine in Qt5 - Qt Wiki

AmberV · April 8, 2022, 12:06pm

Thanks for the report, this is another variation on a bug in the replacement parsing. Standard $1 notation does work, but specific to the bug here, fails if such a marker is found at the beginning of the replacement string. Add a space in front of it, and you’ll find it works fine.

It’s been a bit of a pain, because evidently the Qt RegEx library was built without a single shred of code for replacement logic, just pattern matching, so they have been having to reinvent the wheel over here to get that working.

By the way, would either of the following patterns work for you? If so, they are also a workaround (without awkward two-step searches to first offset the $1 from the beginning of the string with junk characters and then later strip out the junk characters):

Find

\bS

Replace

_S_

The “\b” word-boundary code would do very similar to what you’re doing, but in a more robust manner (if that is indeed the intention), and since it’s a zero-width marker you aren’t having to capture the prefix.

Otherwise, I’d go with:

(?<![A-z])S

Same idea, dodge the need to capture the prefix condition by using a negative look-behind (and since the logic is negative, we don’t want to stipulate a negative set).

AntoniDol · April 8, 2022, 12:07pm

“Obviously, in the code behind these buttons, Scrivener is treating $1 as the entire match, and is not using the capturing group for $2 at all.”

This seems to be the bug. It’s a known bug, so hopefully repaired in next version.

AntoniDol · April 8, 2022, 12:13pm

I think the quotes mess this up. Without it, it’s found in my Scrivener.

It works (somehow) in regex101.com, but not in Scrivener.

Julian_M1 · April 8, 2022, 12:48pm

Thank you @AntoniDol and @AmberV

TL;DR I may return to regexes if and when the need is greater than at present… but thanks for the thoughtful contributions.

Alas I find myself in one of those, “Why do I bother?” states of mind, when the real replacement I was trying to do didn’t work (despite success on the test site) and the MWE I invented stumbles into another issue.

@AmberV, your suggestions are no doubt good, but to the extent they apply to an artificial example I don’t need them, and to the extent that their relevance to real applications is limited, they’re not worth losing further sleep over checking, refining etc. and I am not worthy of further detail because you lost me at “negative look-behind”

@AntoniDol all the same, can you share where the quote came from?

So it goes. Didn’t somebody already say that?

Rdale · April 8, 2022, 1:41pm

If you want to make use of Regex on Windows for individual documents in the binder, you may find it best to just use Sync with External Folder, and open the external sync files in a word processor/text editor that has a robust regex available to it.

If you export to plain text, you can even experiment with scripting the changes if you have such tools installed on your PC, redirecting the output to another file and using a diff tool to examine the changes.

Julian_M1 · April 8, 2022, 2:30pm

Thanks, I wanted to use it on the whole project (e.g. global change from “a word…” to “a word …” (extra space before ellipsis), but after screwing that up, was testing regexes at document level only for safety.

A BAD THING happened to me when I used sync files… I have yet to get back on that horse

AntoniDol · April 9, 2022, 12:32am

bug-in-find-form-when-using-regular-expressions

Julian_M1 · April 10, 2022, 7:49pm

Thanks - I see from that the bug has been reproduced and added to a fix list…