Details
-
Bug
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
10.2, 10.3, 10.4
-
None
Description
Filing the contraction problem mentioned in MDEV-18899 as a seperate issue
Contraction problem
Also, the underlying code should be checked for contraction compatibility. The code copying to column_statistics.min_value should make sure not to break contractions in the middle, otherwise max_value can be very far from the actual maximum value.
For example, consider this data in combination with Czech collation:
CONCAT(REPEAT('x',254), 'ch'))
|
'ch' is a separate letter which is sorted between 'h' and 'i':
http://collation-charts.org/mysql60/mysql604.utf8_czech_ci.html
'ch' should not be broken into parts when copying to column_statistics.min_value:
'c' cannot be the last 255-th byte in column_statistics.min_value, because it was followed by 'h' in the original full-length data. The copying code should store only the REPEAT('x',254) part.
For column_statistics.max_value, the copying code will be even harder: it should replace 'ch' to the character which immediately follows 'ch' in the collation, which is 'i'.
An example
CREATE OR REPLACE TABLE t1 (a VARCHAR(10) CHARACTER SET utf8, comment TEXT); |
INSERT INTO t1 VALUES ('aa','This is MIN'), ('aë','This is MAX'); |
SELECT a,comment FROM t1 ORDER BY a;
|
+------+-------------+
|
| a | comment |
|
+------+-------------+
|
| aa | This is MIN |
|
| aë | This is MAX |
|
+------+-------------+
|
|
SELECT CASE WHEN a='aë' THEN 'a' ELSE a END AS a_2_byte,comment FROM t1 ORDER BY 1;
|
+----------+-------------+
|
| a_2_byte | comment |
|
+----------+-------------+
|
| a | This is MAX |
|
| aa | This is MIN |
|
+----------+-------------+
|
The limit used is 2 bytes here, instead of 255, for simplicity.
Notice, the original MIN and MAX values are 2 bytes ('aa') and 3 bytes ('aë') respectively
Now if we cut them to 2 bytes in the multi-byte safe way, we get:
- 'aa' is still 'aa'
- 'aë' becomes 'a' which makes it smaller than 'aa'
So when the cut in a multi-byte safe way min and max can change