Regex not acting as expected

jpottsx1 · March 30, 2023, 12:11pm

I am trying to use regex to find the first digits followed by a space character at the beginning of a line and replace it with a tab character

I’m using

^(?:\d+|\d*[½¼⅓⅔¾])\K\s

which works correctly in Regex101, but provides zero results when run using the find regex feature in Scrivener.

AmberV · March 30, 2023, 1:03pm

The \K operator is a Perlism that may not be supported by the RegEx engine we use. I’d try without that, and maybe some other formation depending on what you are trying to do with it, but I think from my reading of it you simply don’t need it here.

jpottsx1 · March 30, 2023, 1:36pm

What regex engine and or flavour should I be targeting in order to eliminate the ambiguity in generation our expressions.

xiamenese · March 30, 2023, 1:36pm

I’m no guru, but Scrivener (and other applications using the Apple Text Kit, like Nisus Writer Pro) use ICU RegEx provided by Apple, not PCRE as in RegEx 101. As far as I can see from the ICU documentation there is no “\K” metacharacter in ICU (Regular Expressions | ICU Documentation)

I tested with the following strings as interspersed new paragraphs in my test document:

12345½ Something

and

6789⅓ Anything

using the find string ^(?:\d*[½|¼|⅓|⅔|¾])\s and the replace string just a tab character entered with Opt-Tab. It worked.

The strings I was working on probably don’t match yours, but they might give you a point to start from.

As for the RegEx implementation in Scrivener, it seems to me there are some shortcomings. Some time ago there was a thread over applying Uppercase through RegEx Find and Replace, but we couldn’t get any of the Uppercase switches to work. Since for my tests above using \t in the Find field merely inserted a ‘t’ so I had to enter the tab using Opt-Tab, it seems like the Find field is not RegEx aware, which would explain the non-functioning of the Uppercase replace.

Perhaps Ioa could dig deeper if he has time.

Mark

jpottsx1 · March 30, 2023, 2:31pm

Although it selects the digits and the space, I only want the space selected. The digits at the beginning of the line are simply used to locate that fist space character which will be changed to a tab character.

I’m trying to parse out recipe ingredients an measurements into a tabular for for further formatting.

Currently the next step using regex works flawlessly:
(?<=(\b(tsp|tbsp|oz|fl oz|cup|pt|qt|gal|lb|g|kg|L|mL|ds|pn|smdg|min|dr.|Lg|Med|Sm|Petit|Square)\b))\s

xiamenese · March 30, 2023, 2:41pm

Ah, not knowing exactly what you are trying to do…

Perhaps try using the replace string $1➝ , i.e. match the bracketed string but replace the space with a Tab? But as I say, it seems to me there is a problem with the Replace field not being fully RegEx compliant.