Problems with RegEx search and replace

I’m trying to use RegEx search-and-replace and finding some interesting glitches. Note that this is in the regular Find-And-Replace engine. Project Replace has its own issues (and, why in the world is Project Replace different from the usual Find And Replace, anyway?)

The goal is to replace ‘OK’ at the start of a sentence with ‘Okay’, while mid-sentence it’s ‘okay’. I’m fine with using two find/replaces to do this. Of course, I might be able to do it with multiple non-regex operations, but a bug is a bug, and this looks like a bug.

Test text: ‘This is a test. OK, this is really a test.’

First try:
Find string: (\w+.) OK
Replace string: $1 Okay

What Scrivener does on Find:
Highlights ‘test. OK’
What Scrivener does on Replace:
Produces ‘$1 Okay’

This is obviously not OK.

Second test: (S is the space character here, not the letter S)
Find string: S(\w+.) OK
Replace string: S$1 Okay

What Scrivener does on Find:
Highlights ’ test. OK’
What Scrivener does on Replace:
Produces ’ test. OK Okay’
This is even worse. Yes, we verified that $1 does actually work when not at the start of a line, but the space and OK are being read into the pattern even though they’re clearly set off by parenthesis. In the broken no-leading-space version, they weren’t.

Third test: (S is the space character here, not the letter S)
Find string: (S)(\w+.)( OK)
Replace string: S$2 Okay

What Scrivener does on Find:
Highlights ’ test. OK’
What Scrivener does on Replace:
Produces ’ $2. Okay’
So, it’s not making more capturing groups? Even though there are three? Using $1 shows that, yes, everything is lumping together in one capturing group, no matter that it’s broken out as three.

So, let’s do a simple verification of that.
Find string: (is) (a) (test)
Replace string: X$1X$2X$3

What Scrivener does on Find:
Highlights ‘is a test’
What Scrivener does on Replace:
Produces ‘Xis a testX$2X$3’

Ugh. Something is completely broken! I’m using Windows 3.0.1.0 (1274647) 64-Bit, in case that matters. I’ve searched the forums and see people having obvious success with RegEx on Macs, so this, again, seems like a bug.

Now, Project Replace. We’re better here! The (is) (a) (test) test passes just fine, producing XisXaXtest. Yay!

And ‘( )(\w+.)( OK)’ with pattern ’ $2 Okay’ produces ‘test. Okay’. Yay again!

But… ‘(\w+.)( OK)’ with pattern ‘$1 Okay’ produces ‘$1 Okay’. Not good. If I add a space before the $1 (’ $1 Okay’), then it works, at the cost of adding an extra space.

Now, ’ (\w+.)( OK)’ with pattern ’ $1 Okay’ gets me ‘test. Okay’. So, yay again, and something I can use, if now with a lot less confidence since I’ve seen this break so many interesting ways.

Summing up: I think $1 is broken if it’s the first thing in a replace pattern in Project Replace. I think a great deal of regular Find-And-Replace is broken for regular expressions - but not all of it, it’s obvious that backreference patterns are a bit supported. They’re not just supported correctly.

It’d be really helpful if regular Find-and-Replace worked, as Project Replace is opaque and hard to debug. I’ll have some trepidation about turning it loose over the whole manuscript given the flaky behavior. And it’d be really helpful if Project Replace didn’t need an artificial lead character before a backreference.

I’ve tested this in several other PCRE-compatible packages (Notepad++ is a very easy test case), and everything works perfectly every time, including in interactive find-and-replace.

I can grudgingly accept the answer ‘but it’s not documented to work in Find-And-Replace’, but I’m not sure why that would be hard, given that the engine in Project Replace seems so very close to working correctly. You’ve already got the engine - just hook it up and let it go. And hopefully fixing Project Replace’s aversion to leading backreferences isn’t a big problem.

Thanks for reading this long explanation. Hopefully it’s useful, and hopefully I’m not making a dumb mistake (though, everything working fine in both another program and an online PCRE tester tell me I’m probably not).

2 Likes

Thanks for the report! There are a lot of obvious problems in here, I’ve forwarded the list to the developer to take a look at.

Thank you - I appreciate it!

Hi, sorry to gravedig here, but where can I find whether this was resolved or not? I’m having the same trouble when the capture group is at the very start of replace, both in Search & Replace as well as in Compile mode, which is very bothersome.

Sorry for the tardy response. No this ticket hasn’t been looked at yet. I did notice however that the Replace All button appears to work better than either of the other two replacement buttons, however it does not dodge all of the various problems listed above, particularly where the replacement begins with a capture string output, like $1 something.

Hi @AmberV , thank you for replying! Unfortunately that’s not a solution for me as I’ve encountered the problem more often in compiling then simply Search & Replace, where it’s mostly “replace all” as I take it. (if further details are important, here’s the case: Regex working for macOS, but not Windows if referral back is at very start of replace - #6 by xiamenese)

Yes we do have that particular scenario written up as as ticket, along with all of these above more collectively. To be clear, the $1 something problem impacts Replace All as well. I just meant to point out that it has fewer problems, not that it is no problems.

Replacements are another can of worms, with their own bugs and edge cases to watch out for. I’m not sure how much code they have in common with Find & Replace, but they do both use the same Regex engine and bespoke replacement string parsing, so they will share some issues in that regard.