[MDEV-31496] Make optimizer handle UCASE(varchar_col)=... Created: 2023-06-19 Updated: 2023-11-23 Resolved: 2023-09-12 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Optimizer |
| Fix Version/s: | 11.3.0 |
| Type: | Task | Priority: | Critical |
| Reporter: | Sergei Petrunia | Assignee: | Sergei Petrunia |
| Resolution: | Fixed | Votes: | 2 |
| Labels: | Preview_11.3 | ||
| Attachments: |
|
||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||
| Description |
|
This is based on a customer case. The issueThe customer query has a (repeating) fragment
Both t1.key_column and t2.column use collation utf8mb3_general_ci. There is an index on t1.key_column, and the above condition is highly-selective. However the optimizer is not able to use it and has to compute a cross join between t1 and t2 and then apply the WHERE. Another query uses UCASE with IN:
How to optimize thisIn general, it is not possible to use an index to make lookups based on
as FOO, foo and FOo are not sorted together in case-sensitive collations. It seems that case-insensitive collations should have this property:
Making a rewrite according to this will solve the issue. The bad news is that CI-UCASE-REMOVAL doesn't always hold. Some collations have exceptions to this rule. See examples below. The good news is that utf8mb3_general_ci and utf8mb4_general_ci do have this property. ExceptionsException #1: Czech collationSee https://collation-charts.org/mysql60/mysql604.utf8_czech_ci.html, CH, Ch and ch are considered a single letter which sorts between H and I. But the pair of characters "cH" is not treated in this way. The effect of this is:
Note that UCASE(t1.a)=UCASE(t2.a) holds while t1.a=t2.a doesn't. If it was other way around, we could use t1.a=t2.a for row retrieval and then check the UCASE(...)=UCASE(...) equality. But alas, if we try to use t1.a=t2.a to read rows we may miss rows. This makes the rewrite completely unapplicable. Exception #2: Danish collationhttps://collation-charts.org/mysql60/mysql604.utf8_danish_ci.html
Exception #3: Turkish collation
The same can be observed on utf8mb3_uca1400_as_ci collation. Further exceptionsCollations *uca1400_*as* are likely to have violations to the rule as well. Doing the rewriteWe can follow the example of Introduce Item_func_eq::varchar_upper_cmp_transformer() and do the rewrite there. Checking the criteria
if the above conditions are met, we can do the rewrite. Doing the rewrite
This seems to be more straightforward than trying to replace UPPER(col) with col. If we replace ourselves, we'll need to care about undoing the rewrite. Rewrites done by transformer will be rolled back automatically. Note that
so we will not change the semantics of the statement. There is a problem with creating Item_func_in: it will copy and then call fix_fields() for the entire IN-list. Using Item_func_in::build_clone doesn't improve the situation much. |
| Comments |
| Comment by Sergei Petrunia [ 2023-06-19 ] | |||||||||
|
First patch: https://github.com/MariaDB/server/tree/bb-10.6-MDEV-31496 | |||||||||
| Comment by Sergei Golubchik [ 2023-06-19 ] | |||||||||
| |||||||||
| Comment by Sergei Petrunia [ 2023-06-30 ] | |||||||||
|
Addressed all input from Bar:
| |||||||||
| Comment by Sergei Petrunia [ 2023-07-06 ] | |||||||||
|
Latest patch:
| |||||||||
| Comment by Pandikrishnan Gurusamy [ 2023-07-12 ] | |||||||||
|
psergei This fix will be part of the ES 10.6.15 release ? | |||||||||
| Comment by Sergei Petrunia [ 2023-07-12 ] | |||||||||
|
Patch with the latest review input addressed:
| |||||||||
| Comment by Sergei Petrunia [ 2023-07-12 ] | |||||||||
|
pandi.gurusamy we do not have that a goal (should we?) but if I get ok to push in time, it will be. | |||||||||
| Comment by Alexander Barkov [ 2023-07-13 ] | |||||||||
|
The patch | |||||||||
| Comment by Pandikrishnan Gurusamy [ 2023-07-14 ] | |||||||||
|
ralf.gebhardt As this feature is disabled by default, are we backporting this fix to 10.6.15? | |||||||||
| Comment by Sergei Petrunia [ 2023-08-16 ] | |||||||||
|
feature tree for 11.3: https://github.com/MariaDB/server/tree/bb-11.3-MDEV-31496-sargable-casefold (there, the feature is ON by default) | |||||||||
| Comment by Lena Startseva [ 2023-08-23 ] | |||||||||
|
Verification is done in branch bb-10.6- | |||||||||
| Comment by Sergei Petrunia [ 2023-08-24 ] | |||||||||
|
Updated variant for 11.3 is in bb-11.3- | |||||||||
| Comment by Sergei Petrunia [ 2023-09-12 ] | |||||||||
|
Pushed into 11.3 |