[MDEV-7127] POSIX collating elements are not supported Created: 2014-11-18  Updated: 2022-11-09  Resolved: 2022-11-09

Status: Closed
Project: MariaDB Server
Component/s: OTHER
Affects Version/s: 10.0.14
Fix Version/s: N/A

Type: Bug Priority: Minor
Reporter: Johann Assignee: Alexander Barkov
Resolution: Won't Fix Votes: 1
Labels: regexp
Environment:

Ubuntu 14.04



 Description   

When running the following queries, you will get an error, which in my case breaks replication as the master is currently MySQL 5.5.40

SELECT ' ' REGEXP '[[.space.]]';
 
SELECT '.' REGEXP '[[.period.]]';

Error:

ERROR 1139 (42000): Got error 'POSIX collating elements are not supported at offset 1' from regexp



 Comments   
Comment by Elena Stepanova [ 2014-11-18 ]

Thanks for the report.

bar,
I suppose it's a PCRE limitation, but the replication failure is very unfortunate. Is there anything we can do about it? Maybe a non-default mode or a new version which allows the syntax?

Comment by Alexander Barkov [ 2014-11-19 ]

The old Henry Spencer regex library supported a number of character names:
https://mariadb.com/kb/en/mariadb/documentation/functions-and-operators/string-functions/regular-expressions-functions/regular-expressions-overview/#character-names
This was a non-standard, non-POSIX extension in the old library.

In POSIX regex the syntax '[[.xxx.]]' is reserved for collating elements.
For some reasons, Henry Spencer reused the same syntax for its character names extension.

PCRE does not support collating elements yet (but I guess it will in the future).
Currently PCRE only recognizes this syntax and just returns an error that you can see.

There is a number of workarounds possible:

For space:

SELECT ' ' REGEXP ' ';
SELECT ' ' REGEXP '[ ]';
SELECT ' ' REGEXP '[[:space:]]';
SELECT ' ' REGEXP '\\{20}';
SELECT ' ' REGEXP '\\x{20}';

For dot:

SELECT '.' REGEXP '[.]';
SELECT '.' REGEXP '\\.';
SELECT '.' REGEXP '\\x2E';
SELECT '.' REGEXP '\\x{2E}';

How difficult would it be to change your application to use these workarounds?

These two are POSIX compliant and are supported by both libraries:

SELECT ' ' REGEXP ' ';
SELECT '.' REGEXP '[.]';

Comment by Johann [ 2014-11-19 ]

Fortunately it will not be too difficult to change in the instance where it is causing problems.

Comment by Alexander Barkov [ 2014-11-26 ]

Thanks. I reported the issue to the PCRE team. Changing priority to minor for now.
We'll escalate the bug if we have more related problems reported.

Comment by Marcus Frolander [ 2015-08-20 ]

Same issue with a module for a shopping cart. Currently looking into whether or not this can be easily swapped out in the code, although since the module isn't developed internally, that's likely not going to be an all too easy task.

Would appreciate a fix for this issue.

Comment by Marcus Frolander [ 2015-08-20 ]

Same issue with a module for a shopping cart. Currently looking into whether or not this can be easily swapped out in the code, although since the module isn't developed internally, that's likely not going to be an all too easy task.

Would appreciate a fix for this issue.

Comment by Rick James (Inactive) [ 2017-08-16 ]

SELECT ' ' REGEXP '\\{20}'

--> 0 in MariaDB 10.2.2 and Percona 5.6.22-71.0; so perhaps a poor constant?

SELECT ' ' REGEXP '\\x{20}';

--> 1 for 10.2.2, but 0 for 5.6.22 – Suggest you take note of this incompatibility.

Comment by Sergei Golubchik [ 2017-08-17 ]

Yes. MariaDB uses PCRE and what you observed are the consequences of this fact.

PCRE supports \ddd, \xhh, and \x{hhh} for specifying characters by their codes. That's why you've got a mismatch in first query and match in the second. MySQL and Percona use Henry Spencer's library which does not support \x{hhh} syntax.

Comment by Sergei Golubchik [ 2022-11-09 ]

It's not something we can fix. PCRE is no longer developed. PCRE2 is the currently supported version, but it still returns the same error, unfortunately.

Generated at Thu Feb 08 07:17:13 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.