[MDEV-26744] MyISAM, Aria, MEMORY: CHAR+nopad does not work well Created: 2021-10-01  Updated: 2023-04-27

Status: Open
Project: MariaDB Server
Component/s: Character Sets, Storage Engine - Aria, Storage Engine - MyISAM
Affects Version/s: 10.2, 10.3, 10.4, 10.5, 10.6, 10.7
Fix Version/s: 10.4, 10.5, 10.6

Type: Bug Priority: Major
Reporter: Alexander Barkov Assignee: Alexander Barkov
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Relates
relates to MDEV-25440 Assertion `cmp_rec_rec(rec, old_rec, ... Closed
relates to MDEV-25449 Add MY_COLLATION_HANDLER::strnncollsp... Closed
relates to MDEV-25904 New collation functions to compare In... Closed
relates to MDEV-26743 InnoDB: CHAR+nopad does not work well Closed

 Description   

This bug is similar for MDEV-26743, but for MyISAM.

The same problem is repeatable with:

  • ENGINE=Aria
  • ENGINE=MEMORY in combination with BTREE index algorithm.

Basic latin letter vs equal accented letter

SET NAMES utf8mb3;
DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (a CHAR(2), PRIMARY KEY(a)) COLLATE utf8_unicode_nopad_ci ENGINE=MyISAM;
INSERT INTO t1 VALUES ('a'),('ä');

Query OK, 2 rows affected (0.001 sec)
Records: 2  Duplicates: 0  Warnings: 0

Looks wrong. The expected result is to throw a duplicate key error. See MDEV-26743 for details.

Two letters vs equal (but space padded) expansion

SET NAMES utf8mb3;
DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (a CHAR(2), PRIMARY KEY(a)) COLLATE utf8_unicode_nopad_ci ENGINE=MyISAM;
INSERT INTO t1 VALUES ('ss'),('ß');

ERROR 1062 (23000): Duplicate entry 'ß' for key 'PRIMARY'

Looks wrong. The expected result is to accept both values. See MDEV-26743 for details.

Basic latin letter (but followed by an ignorable character) vs equal accented letter

SET NAMES utf8mb3;
DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (a CHAR(3), PRIMARY KEY(a)) CHARACTER SET utf8mb3 COLLATE utf8mb3_unicode_nopad_ci ENGINE=MyISAM;
INSERT INTO t1 VALUES (CONCAT('a',_utf8mb3 0x01)),('ä');

ERROR 1062 (23000): Duplicate entry 'ä' for key 'PRIMARY'

Looks wrong. The expected result is to accept both values. See MDEV-26743 for details.

SET NAMES utf8mb3;
DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (a CHAR(2), PRIMARY KEY(a)) COLLATE utf8_unicode_nopad_ci ENGINE=MyISAM;
INSERT INTO t1 VALUES (CONCAT('a',_utf8mb3 0x01)),('ä');

ERROR 1062 (23000): Duplicate entry 'ä' for key 'PRIMARY'

Looks wrong. The expected result is to accept both values. See MDEV-26743 for details.



 Comments   
Comment by Alexander Barkov [ 2021-10-01 ]

Scripts to reproduce the problem with ENGINE=MEMORY with BTREE indexes:

Basic latin letter vs equal accented letter

SET NAMES utf8mb3;
DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (a CHAR(2), PRIMARY KEY(a) USING BTREE) COLLATE utf8_unicode_nopad_ci ENGINE=MEMORY;
INSERT INTO t1 VALUES ('a'),('ä');

Query OK, 2 rows affected (0.001 sec)
Records: 2  Duplicates: 0  Warnings: 0

Looks wrong. The expected result is to throw a duplicate key error. See MDEV-26743 for details.

Two letters vs equal (but space padded) expansion

SET NAMES utf8mb3;
DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (a CHAR(2), PRIMARY KEY(a) USING BTREE) COLLATE utf8_unicode_nopad_ci ENGINE=MEMORY;
INSERT INTO t1 VALUES ('ss'),('ß');

ERROR 1062 (23000): Duplicate entry 'ß' for key 'PRIMARY'

Looks wrong. The expected result is to accept both values. See MDEV-26743 for details.

Basic latin letter (but followed by an ignorable character) vs equal accented letter

SET NAMES utf8mb3;
DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (a CHAR(3), PRIMARY KEY(a) USING BTREE) CHARACTER SET utf8mb3 COLLATE utf8mb3_unicode_nopad_ci ENGINE=MEMORY;
INSERT INTO t1 VALUES (CONCAT('a',_utf8mb3 0x01)),('ä');

ERROR 1062 (23000): Duplicate entry 'ä' for key 'PRIMARY'

Looks wrong. The expected result is to accept both values. See MDEV-26743 for details.

SET NAMES utf8mb3;
DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (a CHAR(2), PRIMARY KEY(a) USING BTREE) COLLATE utf8_unicode_nopad_ci ENGINE=MEMORY;
INSERT INTO t1 VALUES (CONCAT('a',_utf8mb3 0x01)),('ä');

ERROR 1062 (23000): Duplicate entry 'ä' for key 'PRIMARY'

Looks wrong. The expected result is to accept both values. See MDEV-26743 for details.

Generated at Thu Feb 08 09:47:37 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.