Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
10.2(EOL), 10.3(EOL), 10.4(EOL), 10.5, 10.6, 10.7(EOL)
-
None
Description
InnoDB does not perform well primary key uniqueness on CHAR columns with NOPAD collations.
Basic latin letter vs equal accented letter
SET NAMES utf8mb3; |
DROP TABLE IF EXISTS t1; |
CREATE TABLE t1 (a CHAR(2), PRIMARY KEY(a)) COLLATE utf8_unicode_nopad_ci ENGINE=InnoDB ROW_FORMAT=COMPACT; |
INSERT INTO t1 VALUES ('a'),('ä'); |
Query OK, 2 rows affected (0.001 sec)
|
Records: 2 Duplicates: 0 Warnings: 0
|
Looks wrong. The expected result is to throw a duplicate key error.
Both values have only one character, so both are padded with one extra space when stored to the column.
The values 'a ' and 'ä ' are equal in this collation:
SELECT _utf8mb3 'a ' COLLATE utf8_unicode_nopad_ci AS c1 UNION SELECT _utf8mb3 'ä '; |
+------+
|
| c1 |
|
+------+
|
| a | <- notice only one row
|
+------+
|
Two letters vs equal (but space padded) expansion
SET NAMES utf8mb3; |
DROP TABLE IF EXISTS t1; |
CREATE TABLE t1 (a CHAR(2), PRIMARY KEY(a)) COLLATE utf8_unicode_nopad_ci ENGINE=InnoDB ROW_FORMAT=COMPACT; |
INSERT INTO t1 VALUES ('ss'),('ß'); |
ERROR 1062 (23000): Duplicate entry 'ß' for key 'PRIMARY'
|
Looks wrong. The expected result is to accept both values.
The 'ss' has two characters. It is stored to the column as is. The 'ß' has only one character, so it is padded with one space to 'ß '.
The values 'ss' and 'ß ' are different in this collation:
SELECT _utf8mb3 'ss' COLLATE utf8_unicode_nopad_ci AS c1 UNION SELECT _utf8mb3 'ß '; |
+------+
|
| c1 |
|
+------+
|
| ss |
|
| ß |
|
+------+
|
Basic latin letter (but followed by an ignorable character) vs equal accented letter
SET NAMES utf8mb3; |
DROP TABLE IF EXISTS t1; |
CREATE TABLE t1 (a CHAR(3), PRIMARY KEY(a)) CHARACTER SET utf8mb3 COLLATE utf8mb3_unicode_nopad_ci ENGINE=InnoDB ROW_FORMAT=COMPACT; |
INSERT INTO t1 VALUES (CONCAT('a',_utf8mb3 0x01)),('ä'); |
ERROR 1062 (23000): Duplicate entry 'ä' for key 'PRIMARY'
|
Looks wrong. The expected result is to accept both values.
The first value CONCAT('a',_utf8mb3 0x01) returns two characters, so it gets padded with a space when stored to the column. So the value which is actually stored is CONCAT('a',_utf8mb3 0x01,' ').
The 'ä' is just one character, so it gets padded with two spaces to 'ä ' when stored. These two values are different in this collation:
SELECT HEX(c1) FROM (SELECT CONCAT('a',_utf8mb3 0x01, ' ') COLLATE utf8mb3_unicode_nopad_ci AS c1 UNION SELECT 'ä ') td1; |
+----------+
|
| HEX(c1) |
|
+----------+
|
| 610120 |
|
| C3A42020 |
|
+----------+
|
SET NAMES utf8mb3; |
DROP TABLE IF EXISTS t1; |
CREATE TABLE t1 (a CHAR(2), PRIMARY KEY(a)) COLLATE utf8_unicode_nopad_ci ENGINE=InnoDB ROW_FORMAT=COMPACT; |
INSERT INTO t1 VALUES (CONCAT('a',_utf8mb3 0x01)),('ä'); |
ERROR 1062 (23000): Duplicate entry 'ä' for key 'PRIMARY'
|
Looks wrong. The expected result is to accept both values.
The first value CONCAT('a',_utf8mb3 0x01) consists of two characters. It is stored as is.
The 'ä' is just one character, so it gets padded with one spaces to 'ä ' when stored. These two values are different in this collation:
SELECT HEX(c1) FROM (SELECT CONCAT('a',_utf8mb3 0x01) COLLATE utf8mb3_unicode_nopad_ci AS c1 UNION SELECT 'ä ') td1; |
+---------+
|
| HEX(c1) |
|
+---------+
|
| 6101 |
|
| C3A420 |
|
+---------+
|
Attachments
Issue Links
- is caused by
-
MDEV-9711 NO PAD collations
- Closed
- relates to
-
MDEV-25440 Assertion `cmp_rec_rec(rec, old_rec, offsets, old_offsets, m_index) > 0' failed in PageBulk::insert
- Closed
-
MDEV-25904 New collation functions to compare InnoDB style trimmed NO PAD strings
- Closed
-
MDEV-26744 MyISAM, Aria, MEMORY: CHAR+nopad does not work well
- Open
-
MDEV-25449 Add MY_COLLATION_HANDLER::strnncollsp_nchars()
- Closed