[MDEV-8102] REGEXP function fails to match hex values when expression is stored as a variable Created: 2015-05-05 Updated: 2015-05-14 Resolved: 2015-05-14 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | OTHER |
| Affects Version/s: | 10.0.17 |
| Fix Version/s: | 10.0.20 |
| Type: | Bug | Priority: | Major |
| Reporter: | Mark Punak | Assignee: | Alexander Barkov |
| Resolution: | Not a Bug | Votes: | 0 |
| Labels: | function, regexp, verified | ||
| Environment: |
CentOS 7 |
||
| Description |
|
The regexp function works differently when evaluating a string stored in a variable than it does when evaluating a string literal. Example:
The select statement above successfully matches via the regular expression only when the string literal is used. The evaluation of the variable fails to match, even though Maria is saying the string literal and the variable have matching values. The problem appears to be related to the use of the hex values in the regular expression, as the following statement evaluates without issue:
I tested this against MySQL 5.5, where the evaluation fails for both the string literal and the variable (completely non-functional in all situations), so I decided to report the bug here instead. |
| Comments |
| Comment by Alexander Barkov [ 2015-05-07 ] | ||||||||||||||
|
A smaller test reproducing the problem:
The result is:
| ||||||||||||||
| Comment by Alexander Barkov [ 2015-05-07 ] | ||||||||||||||
|
The difference happens because user variables have IMPLICIT collation derivation and therefore have precedence over literals (which have COERCIBLE collation derivation). So this script:
chooses latin1_swedish_ci as an effective collation for the operation, because a user variable is stronger than a literal. Then, RLIKE converts the operands to utf8 (which is internal character set for PCRE when comparing text strings)
With a string literal given in the pattern it works differently:
Both operands have COERCIBLE derivation, so it chooses "binary" as a common collation for the two operands, because "binary" has precedence over a text string in case if derivations of two operands are the same. The difference between a literal and a user variable is usually very subtle, but it apparently gets quite sensitive in case of RLIKE. | ||||||||||||||
| Comment by Alexander Barkov [ 2015-05-07 ] | ||||||||||||||
|
A number of workarounds are possible:
| ||||||||||||||
| Comment by Alexander Barkov [ 2015-05-14 ] | ||||||||||||||
|
We won't do any fixes that change the behaviour.
|