[MDEV-30048] Prefix keys for CHAR work differently for MyISAM vs InnoDB Created: 2022-11-21  Updated: 2023-10-24  Resolved: 2023-10-24

Status: Closed
Project: MariaDB Server
Component/s: Character Sets
Affects Version/s: 10.4, 10.5, 10.6, 10.7, 10.8, 10.9, 10.10, 10.11
Fix Version/s: 10.8.8, 10.4.32, 10.5.23, 10.6.16, 10.9.8, 10.10.7, 10.11.6, 11.0.4, 11.1.3, 11.2.2, 11.3.1

Type: Bug Priority: Critical
Reporter: Alexander Barkov Assignee: Alexander Barkov
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Relates
relates to MDEV-25904 New collation functions to compare In... Closed
relates to MDEV-27670 Assertion `(cs->state & 0x20000) == 0... Closed
relates to MDEV-27768 MDEV-25440: Assertion `(cs->state & 0... Closed
relates to MDEV-30072 Wrong ORDER BY for a partitioned pref... Closed
relates to MDEV-30034 UNIQUE USING HASH accepts duplicate e... Closed
relates to MDEV-30050 Inconsistent results of DISTINCT with... Closed

 Description   

# MyISAM is wrong
DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (a CHAR(10) COLLATE utf8mb3_unicode_nopad_ci, UNIQUE KEY(a)) ENGINE=MyISAM;
INSERT INTO t1 VALUES ('ss'),('ß');

ERROR 1062 (23000): Duplicate entry 'ß' for key 'a'

# Aria is wrong
DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (a CHAR(10) COLLATE utf8mb3_unicode_nopad_ci, UNIQUE KEY(a)) ENGINE=Aria;
INSERT INTO t1 VALUES ('ss'),('ß');

ERROR 1062 (23000): Duplicate entry 'ß' for key 'a'

# InnoDB is correct
DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (a CHAR(10) COLLATE utf8mb3_unicode_nopad_ci, UNIQUE KEY(a)) ENGINE=InnoDB;
INSERT INTO t1 VALUES ('ss'),('ß');

Query OK, 2 rows affected (0.002 sec)
Records: 2  Duplicates: 0  Warnings: 0

# MEMORY is correct
DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (a CHAR(10) COLLATE utf8mb3_unicode_nopad_ci, UNIQUE KEY(a)) ENGINE=MEMORY;
INSERT INTO t1 VALUES ('ss'),('ß');

Query OK, 2 rows affected (0.001 sec)
Records: 2  Duplicates: 0  Warnings: 0

MyISAM and Aria work incorrectly.
InnoDB and HEAP work correctly.

Rationale

We compare two fixed length strings with 10 characters:

  • Two characters 'ss' followed by 8 spaces
  • One 'ß' character followed by 9 spaces

The comparison of 'ss' and 'ß' should work as follows for CHAR(10):

  • 'ss' is equal to 'ß' in this collation
  • 8 spaces are smaller than 9 spaces (because this is a NOPAD collation)
  • the result is "smaller"

Note, left join works correctly:

DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (a CHAR(10) COLLATE utf8mb3_unicode_nopad_ci, UNIQUE KEY(a)) ENGINE=InnoDB;
INSERT INTO t1 VALUES ('ss'),('ß');
select * from t1, t1 as t2 where t1.a=t2.a;

+------+------+
| a    | a    |
+------+------+
| ss   | ss   |
| ß    | ß    |
+------+------+

The same problem is repeatable with prefix keys:

# MyISAM is wrong
DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (a CHAR(10) COLLATE utf8mb3_unicode_nopad_ci, UNIQUE KEY(a(2))) ENGINE=MyISAM;
INSERT INTO t1 VALUES ('ss'),('ß');

ERROR 1062 (23000): Duplicate entry 'ß' for key 'a'

# Aria is wrong
DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (a CHAR(10) COLLATE utf8mb3_unicode_nopad_ci, UNIQUE KEY(a(2))) ENGINE=ARIA;
INSERT INTO t1 VALUES ('ss'),('ß');

ERROR 1062 (23000): Duplicate entry 'ß' for key 'a'

# InnoDB is correct
DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (a CHAR(10) COLLATE utf8mb3_unicode_nopad_ci, UNIQUE KEY(a(2))) ENGINE=InnoDB;
INSERT INTO t1 VALUES ('ss'),('ß');

Query OK, 2 rows affected (0.002 sec)
Records: 2  Duplicates: 0  Warnings: 0

# MEMORY is correct
DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (a CHAR(10) COLLATE utf8mb3_unicode_nopad_ci, UNIQUE KEY(a(2))) ENGINE=MEMORY;
INSERT INTO t1 VALUES ('ss'),('ß');

Query OK, 2 rows affected (0.000 sec)
Records: 2  Duplicates: 0  Warnings: 0



 Comments   
Comment by Alexander Barkov [ 2022-11-21 ]

Heap works like InnoDB (this fact is now also noted in above description) :

DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (a CHAR(10) COLLATE utf8mb3_unicode_nopad_ci, UNIQUE KEY(a)) ENGINE=HEAP;
INSERT INTO t1 VALUES ('ss'),('ß');

Query OK, 2 rows affected (0.000 sec)
Records: 2  Duplicates: 0  Warnings: 0

Comment by Alexander Barkov [ 2023-04-07 ]

Hello serg,

Can you please review a patch:
https://github.com/MariaDB/server/commit/56fa1da9c676ebc480b5cb43627752150010cf4c

Thanks.

Comment by Rick James [ 2023-10-17 ]

I hope this is not a repeat of the fiasco many years ago utf8_general_mysql500_ci .

Comment by Alexander Barkov [ 2023-10-18 ]

rjasdfiii, which fiasco do you mean?

Comment by Rick James [ 2023-10-18 ]

@AlexanderBarkov - Many users experienced "duplicate key" in their PRIMARY KEY due to this incompatible change.

----- 2012-04-10 5.6.5 Milestone 8 & 2012-03-21 5.1.62 & 2012-02-17 5.5.21 – Functionality Added or Changed – -----

New utf8_general_mysql500_ci and ucs2_general_mysql500_ci collations have been added that preserve the behavior of utf8_general_ci and ucs2_general_ci from versions of MySQL previous to 5.1.24. [CHECK TABLE](https://dev.mysql.com/doc/refman/5.7/en/check-table.html) produces this error:

----- 2008-11-14 5.1.30

The CHECK TABLE ... FOR UPGRADE statement did not check for incompatible collation changes made in MySQL 5.1.24 (Bug #27877). This also affects mysqlcheck and mysql_upgrade, which cause that statement to be executed. See Checking Whether Tables or Indexes Must Be Rebuilt.
Prior to this fix, a binary upgrade (performed without dumping tables with mysqldump before the upgrade and reloading the dump file after the upgrade) would corrupt tables that have indexes that use the utf8_general_ci or ucs2_general_ci collation for columns that contain 'ß' LATIN SMALL LETTER SHARP S (German). After the fix, CHECK TABLE ... FOR UPGRADE properly detects the problem and warns about tables that need repair.

However, the fix is not backward compatible and can result in a downgrading problem under these circumstances:

Perform a binary upgrade to a version of MySQL that includes the fix.

Run CHECK TABLE ... FOR UPGRADE (or mysqlcheck or mysql_upgrade) to upgrade tables.

Perform a binary downgrade to a version of MySQL that does not include the fix.

The solution is to dump tables with mysqldump before the downgrade and reload the dump file after the downgrade. Alternatively, drop and recreate affected indexes. (Bug #40053)

Comment by Sergei Golubchik [ 2023-10-23 ]

99ffd7e3934 is ok to push.

I'd still suggest to split the commit in two — a cleanup commit that introduces all functions in my_compare.h and replaces them as needed, but does not change the behavior (doesn't use ha_compare_char_fixed for HA_KEYTYPE_TEXT). And the second commit that fixes the bug.

Generated at Thu Feb 08 10:13:14 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.