[MDEV-21265] IN predicate conversion to IN subquery should be allowed for a broader set of datatype comparison Created: 2019-12-10 Updated: 2022-01-09 Resolved: 2020-12-01 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Optimizer |
| Affects Version/s: | 10.3, 10.4, 10.5 |
| Fix Version/s: | 10.3.28, 10.4.18, 10.5.9 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Varun Gupta (Inactive) | Assignee: | Varun Gupta (Inactive) |
| Resolution: | Fixed | Votes: | 2 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||
| Description |
| Comments |
| Comment by Sergei Petrunia [ 2020-10-07 ] | ||||||||||||||||
|
Had a discussion with bar about this. Take-aways: The best solution is to get the materialized table field to use the collation that is used to do the IN-comparison. This way, current If we continue to use the original charset of the subquery's select list for materialization, then there are collations that are "subsets" of others. For example, utf8_general_ci is "a subset" of utf8m4_general_ci. The subset relationship is defined as:
Use of '<=' comparison implies equality and in-equality, so the Materialization-with-unique-key process will produce the same set of value groups that it would have produced if it was done using utf8mb4_general_ci. The character set library has a couple of functions that check for a very similar relationship. The check looks like so:
(encoding_allows_reinterpret_as() might be a too strong condition) | ||||||||||||||||
| Comment by Sergei Petrunia [ 2020-10-14 ] | ||||||||||||||||
|
Implemented the above suggestion https://gist.github.com/spetrunia/828d85cec078bf9810f485d43dab4822 and now debugging a testcase:
Let's see what happens when one tries to make a lookup for a UTF8MB4 character in a UTF8 materialized table:
Looking at how the lookup value is constructed:
Character conversion takes place, and it seems to return an appropriate error code. The lookup key returned is '?' (which is stored because the original character could not be converted). But then, I see a lookup is made with the '?' value ? This shouldn't be happening? | ||||||||||||||||
| Comment by Sergei Petrunia [ 2020-10-15 ] | ||||||||||||||||
|
... But there are WHERE condition checks for both semi-join and non-semijoin case which make it harmless (will post details here) | ||||||||||||||||
| Comment by Sergei Petrunia [ 2020-10-22 ] | ||||||||||||||||
|
Takeaways from the Tuesday call: | ||||||||||||||||
| Comment by Sergei Petrunia [ 2020-11-26 ] | ||||||||||||||||
|
Review: https://lists.launchpad.net/maria-developers/msg12472.html varun: one more item: please make the commit message mention that it backports some of the code from (and the MDEV which code was backported). | ||||||||||||||||
| Comment by Panagiotis Malakoudis [ 2022-01-09 ] | ||||||||||||||||
|
If I understand correctly, this should be fixed in 10.3.28 and 10.5.9. However, the behaviour in mariadb from debian 11 (10.5.12) is still like the original report. What am I missing here? edit: Just to clarify, I am talking about the behaviour for |