[MDEV-30072] Wrong ORDER BY for a partitioned prefix key + NOPAD Created: 2022-11-22  Updated: 2023-04-06  Resolved: 2023-04-06

Status: Closed
Project: MariaDB Server
Component/s: Character Sets, Partitioning
Affects Version/s: 10.4, 10.5, 10.6, 10.7, 10.8, 10.9, 10.11, 11.0
Fix Version/s: 11.1.1, 10.11.3, 11.0.2, 10.4.29, 10.5.20, 10.6.13, 10.7.8, 10.8.8, 10.9.6, 10.10.4

Type: Bug Priority: Critical
Reporter: Alexander Barkov Assignee: Alexander Barkov
Resolution: Fixed Votes: 0
Labels: regression-10.4

Issue Links:
Problem/Incident
is caused by MDEV-25904 New collation functions to compare In... Closed
Relates
relates to MDEV-27670 Assertion `(cs->state & 0x20000) == 0... Closed
relates to MDEV-30095 Unexpected duplicate entry error for ... Open
relates to MDEV-30034 UNIQUE USING HASH accepts duplicate e... Closed
relates to MDEV-30048 Prefix keys for CHAR work differently... Closed
relates to MDEV-30050 Inconsistent results of DISTINCT with... Closed

 Description   

I create a table:

CREATE OR REPLACE TABLE t1
(
  id INT,
  data VARCHAR(20),
  KEY data_id (data,id)
) COLLATE utf8mb3_unicode_nopad_ci ENGINE=MyISAM
PARTITION BY RANGE COLUMNS (id)
(
  PARTITION p10 VALUES LESS THAN (20),
  PARTITION p20 VALUES LESS THAN MAXVALUE
);
INSERT INTO t1 VALUES (30, 'ss '), (10, 'ß ');

Now I run ORDER BY queries:

SELECT id FROM t1 WHERE data='ss ' ORDER BY id;

+------+
| id   |
+------+
|   10 |
|   30 |
+------+

SELECT id FROM t1 WHERE data='ss ' ORDER BY id DESC;

+------+
| id   |
+------+
|   30 |
|   10 |
+------+

Looks good so far.

Now I alter the table changing the full key to a prefix key:

ALTER TABLE t1 DROP KEY data_id, ADD KEY data_id2(data(10),id);

And return ORDER BY queries again:

SELECT id FROM t1 WHERE data='ss ' ORDER BY id;

+------+
| id   |
+------+
|   30 |
|   10 |
+------+

SELECT id FROM t1 WHERE data='ss ' ORDER BY id DESC;

+------+
| id   |
+------+
|   10 |
|   30 |
+------+

Notice the order changed to opposite.



 Comments   
Comment by Alexander Barkov [ 2022-11-22 ]

The problem was caused by MDEV-25904, by this chunk for Field_varstring (and a similar chunk for Field_blob)

@@ -7880,8 +7854,12 @@ int Field_varstring::cmp_prefix(const uchar *a_ptr, const
 uchar *b_ptr,
     a_length= uint2korr(a_ptr);
     b_length= uint2korr(b_ptr);
   }
-  return cmp_str_prefix(a_ptr+length_bytes, a_length, b_ptr+length_bytes,
-                        b_length, prefix_len, field_charset);
+  return field_charset->coll->strnncollsp_nchars(field_charset,
+                                                 a_ptr + length_bytes,
+                                                 a_length,
+                                                 b_ptr + length_bytes,
+                                                 b_length,
+                                                 prefix_len / field_charset->mbmaxlen);
 }
 

This change was OK for PAD SPACE collations and improved performance for VARCHAR/TEXT columns.

But it appeared not to be correct for NOPAD collations - it changed the behaviour to wrong.

It seems, to combine both performance and correct behavior, instead of one function strnncollsp_nchars() we need two separate functions:

  • strnncollsp_nchars_char()
  • strnncollsp_nchars_varchar()

Both should truncate too long strings.
However, only the CHAR version should (virtually) pad too short strings with trailing spaces.

Comment by Alexander Barkov [ 2022-11-23 ]

Also repeatable with TEXT:

CREATE OR REPLACE TABLE t1
(
  id INT,
  data TEXT,
  KEY data_id (data(10),id)
) COLLATE utf8mb3_unicode_nopad_ci ENGINE=MyISAM
PARTITION BY RANGE COLUMNS (id)
(
  PARTITION p10 VALUES LESS THAN (20),
  PARTITION p20 VALUES LESS THAN MAXVALUE
);
INSERT INTO t1 VALUES (30, 'ss '), (10, 'ß ');
SELECT id FROM t1 WHERE data='ss ' ORDER BY id;
SELECT id FROM t1 WHERE data='ss ' ORDER BY id DESC;

MariaDB [test]> SELECT id FROM t1 WHERE data='ss ' ORDER BY id;
+------+
| id   |
+------+
|   30 |
|   10 |
+------+
2 rows in set (0.002 sec)
 
MariaDB [test]> SELECT id FROM t1 WHERE data='ss ' ORDER BY id DESC;
+------+
| id   |
+------+
|   10 |
|   30 |
+------+
2 rows in set (0.002 sec)

Generated at Thu Feb 08 10:13:25 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.