<b33> The new RegEx implementation broke important functionality?

adrm · January 9, 2020, 8:01am

Before beta 33 I could search for Unicode characters. For example, a RegEx search for \x2028 would locate a soft line-break and \x201c would highlight all left ‘smart’ left quotes.

I have found this functionality invaluable on a regular basis.

With beta 33 this no longer seems to work, (although ASCII character searches still seem to function, e.g. \x0027 for a straight single apostrophe = ')

I hope this is considered a bug, and not ‘this brings the Windows version into line with the Mac version, so it will stay this way’.

tiho_d · January 9, 2020, 8:10am

What happens when you search with the Unicode character itself(typing it), instead of the character code?

adrm · January 9, 2020, 8:22am

That works, at least for the characters I tested.

Unfortunately, that will often not work or is inconvenient.

Inconvenient:
As far as I know, you can’t type a smart quote into the search field. Instead, you must copy/paste one in.

Impossible:
As far as I know, there is no way to input a soft line break into the search field. I believe the use of the Unicode \x2028 is required in such cases.

tiho_d · January 9, 2020, 8:40am

When you search with a straight quote using a regexp, Scrivener will indeed search for all smart quotes too. Give it a try.

Searching for soft line breaks, you may try ‘\n’, ‘\r’

I personally believe that searching with a typed Unicode text is much more convenient, than searching with Unicode codes.

Let me know if this helps.

adrm · January 9, 2020, 8:57am

Thanks for the suggestions, but at my end, neither \n nor \r (or combinations of those) locates the soft line break.

\r seems to locate end-of-paragraph, but fails to highlight the (visible) symbol. (I have reported on this issue in the past)

Also, let me offer an argument for searching for specific codes:
For example: Sometimes I want to verify that I have not inadvertently included a straight apostrophe in a manuscript that uses the smart variety.
A search that returns occurrences of both straight and smart types is useless for this purpose.

I argue that sometimes fine control is desired and appropriate, and if the previous functionality can be returned for those of us that use it, I’d be grateful.

tiho_d · January 9, 2020, 9:08am

If you want to check for mistyped straight quote, you can use direct text search, i.e. not reg-exp search, which should give you all documents with straight quotes.

I do not mean to argue that the new implementation is perfect.

What I want to say is that you might have to adjust some of your searches to the new RegEx engine. If something cannot be achieved, we will definitely try to improve it.

The new engine is fully Perl reg-ex compatible(instead of the previous Qt specific variants) and is much faster, so I believe it was a good and desired change. Obviously we have to close any gaps we might have missed.

tiho_d · January 9, 2020, 9:10am

Can you please upload a small project with a document that includes a soft line break, too. Thanks!

adrm · January 9, 2020, 9:32am

I created a new document (in beta 32) that includes a soft break after the second line in the initial scene.

Note:
I didn’t test search functionality in this file, as I assume it will work like my older working documents
Test with soft break.scriv.zip (418 KB)

MimeticMouton · January 9, 2020, 12:03pm

A RegEx search in beta 33 should still be distinguishing between straight and curly quotes; this is at least working correctly in my tests. Different muscle memory, but you can directly enter the smart quotes/apostrophes using the Windows Alt codes to type the characters with the number pad. Alt+0145 through Alt+0148 are what you’re after.

adrm · January 9, 2020, 1:27pm

I appreciate the suggestions but I’ll again offer an argument:

Up until now, I could do ALL my searching, simple or complex, with the method set to RegEx.
I also have a ‘library’ of RexEx strings for various purposes, including the codes for unusual characters/symbols.

With the new solution, if I understand you tow correctly, I’d have to switch between various methods and look up ALT-codes to achieve the same things.
If the previous ability to search for word-length Unicode is restored, we’d have a choice on how to set up our preferred workflow.

I haven’t looked it up, but I believe Unicode searches are part of most? / all? modern RegEx implementations. Or …?

devinganger · January 9, 2020, 8:06pm

If it’s a fully Perl-compatible RE library, will this syntax work?

\N{U+263D} Unicode character (example: FIRST QUARTER MOON)

(from perldoc.perl.org/perlre.html )

adrm · January 28, 2020, 10:22am

Sorry Devin, I missed your reply in the past.
I finally got around to doing some more testing.

As it turns out, adhering to the Perl syntax works, e,g, whereas \x2028 used to work as a search for a soft line break. one now has to use \x{2028}

UNFORTUNATELY, one issue remains, and I’ve flagged this before (sorry if the cause has been explained in the past):

While Scrivener does highlight the search hits inside each document correctly, the documents containing hits are NOT listed under Search Results as they would be if I wasn’t searching for Unicode characters.
This means that one has to go to every document and scroll through them from start to end to look for yellow highlights visually.

Neither are hits within Notes and (presumably) Synopsis shown or highlighted, even if Search In All is selected.

As long as Scrivener offers RegEx as a search method, I strongly feel these two issues classify as bugs that need to be addressed.