[MDEV-26743] InnoDB: CHAR+nopad does not work well Created: 2021-10-01  Updated: 2023-11-13  Resolved: 2023-11-10

Status: Closed
Project: MariaDB Server
Component/s: Character Sets, Storage Engine - InnoDB
Affects Version/s: 10.2, 10.3, 10.4, 10.5, 10.6, 10.7
Fix Version/s: 10.4.33, 10.5.24, 10.6.17, 10.11.7, 11.0.5, 11.1.4, 11.2.3, 11.3.2, 11.4.1

Type: Bug Priority: Major
Reporter: Alexander Barkov Assignee: Alexander Barkov
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Problem/Incident
is caused by MDEV-9711 NO PAD collations Closed
Relates
relates to MDEV-25440 Assertion `cmp_rec_rec(rec, old_rec, ... Closed
relates to MDEV-25904 New collation functions to compare In... Closed
relates to MDEV-26744 MyISAM, Aria, MEMORY: CHAR+nopad does... Open
relates to MDEV-25449 Add MY_COLLATION_HANDLER::strnncollsp... Closed

 Description   

InnoDB does not perform well primary key uniqueness on CHAR columns with NOPAD collations.

Basic latin letter vs equal accented letter

SET NAMES utf8mb3;
DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (a CHAR(2), PRIMARY KEY(a)) COLLATE utf8_unicode_nopad_ci ENGINE=InnoDB ROW_FORMAT=COMPACT;
INSERT INTO t1 VALUES ('a'),('ä');

Query OK, 2 rows affected (0.001 sec)
Records: 2  Duplicates: 0  Warnings: 0

Looks wrong. The expected result is to throw a duplicate key error.
Both values have only one character, so both are padded with one extra space when stored to the column.
The values 'a ' and 'ä ' are equal in this collation:

SELECT _utf8mb3 'a ' COLLATE utf8_unicode_nopad_ci AS c1 UNION SELECT _utf8mb3 'ä ';

+------+
| c1   |
+------+
| a    | <- notice only one row
+------+

Two letters vs equal (but space padded) expansion

SET NAMES utf8mb3;
DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (a CHAR(2), PRIMARY KEY(a)) COLLATE utf8_unicode_nopad_ci ENGINE=InnoDB ROW_FORMAT=COMPACT;
INSERT INTO t1 VALUES ('ss'),('ß');

ERROR 1062 (23000): Duplicate entry 'ß' for key 'PRIMARY'

Looks wrong. The expected result is to accept both values.
The 'ss' has two characters. It is stored to the column as is. The 'ß' has only one character, so it is padded with one space to 'ß '.
The values 'ss' and 'ß ' are different in this collation:

SELECT _utf8mb3 'ss' COLLATE utf8_unicode_nopad_ci AS c1 UNION SELECT _utf8mb3 'ß ';

+------+
| c1   |
+------+
| ss   |
| ß    |
+------+

Basic latin letter (but followed by an ignorable character) vs equal accented letter

SET NAMES utf8mb3;
DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (a CHAR(3), PRIMARY KEY(a)) CHARACTER SET utf8mb3 COLLATE utf8mb3_unicode_nopad_ci ENGINE=InnoDB ROW_FORMAT=COMPACT;
INSERT INTO t1 VALUES (CONCAT('a',_utf8mb3 0x01)),('ä');

ERROR 1062 (23000): Duplicate entry 'ä' for key 'PRIMARY'

Looks wrong. The expected result is to accept both values.
The first value CONCAT('a',_utf8mb3 0x01) returns two characters, so it gets padded with a space when stored to the column. So the value which is actually stored is CONCAT('a',_utf8mb3 0x01,' ').
The 'ä' is just one character, so it gets padded with two spaces to 'ä ' when stored. These two values are different in this collation:

SELECT HEX(c1) FROM (SELECT CONCAT('a',_utf8mb3 0x01, ' ') COLLATE utf8mb3_unicode_nopad_ci AS c1 UNION SELECT 'ä  ') td1;

+----------+
| HEX(c1)  |
+----------+
| 610120   |
| C3A42020 |
+----------+

SET NAMES utf8mb3;
DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (a CHAR(2), PRIMARY KEY(a)) COLLATE utf8_unicode_nopad_ci ENGINE=InnoDB ROW_FORMAT=COMPACT;
INSERT INTO t1 VALUES (CONCAT('a',_utf8mb3 0x01)),('ä');

ERROR 1062 (23000): Duplicate entry 'ä' for key 'PRIMARY'

Looks wrong. The expected result is to accept both values.
The first value CONCAT('a',_utf8mb3 0x01) consists of two characters. It is stored as is.
The 'ä' is just one character, so it gets padded with one spaces to 'ä ' when stored. These two values are different in this collation:

SELECT HEX(c1) FROM (SELECT CONCAT('a',_utf8mb3 0x01) COLLATE utf8mb3_unicode_nopad_ci AS c1 UNION SELECT 'ä ') td1;

+---------+
| HEX(c1) |
+---------+
| 6101    |
| C3A420  |
+---------+


Generated at Thu Feb 08 09:47:36 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.