[MDEV-26743] InnoDB: CHAR+nopad does not work well - Jira

XML

Word

Printable

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 10.2(EOL), 10.3(EOL), 10.4(EOL), 10.5(EOL), 10.6, 10.7(EOL)
Fix Version/s: 10.4.33, 10.5.24, 10.6.17, 10.11.7, 11.0.5, 11.1.4, 11.2.3, 11.3.2, 11.4.1
Component/s: Character Sets, Storage Engine - InnoDB
Labels:
None

Description

InnoDB does not perform well primary key uniqueness on CHAR columns with NOPAD collations.

Basic latin letter vs equal accented letter

SET NAMES utf8mb3;

DROP TABLE IF EXISTS t1;

CREATE TABLE t1 (a CHAR(2), PRIMARY KEY(a)) COLLATE utf8_unicode_nopad_ci ENGINE=InnoDB ROW_FORMAT=COMPACT;

INSERT INTO t1 VALUES ('a'),('ä');

Query OK, 2 rows affected (0.001 sec)

Records: 2  Duplicates: 0  Warnings: 0

Looks wrong. The expected result is to throw a duplicate key error.
Both values have only one character, so both are padded with one extra space when stored to the column.
The values 'a ' and 'ä ' are equal in this collation:

SELECT _utf8mb3 'a ' COLLATE utf8_unicode_nopad_ci AS c1 UNION SELECT _utf8mb3 'ä ';

+------+

| c1   |

+------+

| a    | <- notice only one row

+------+

Two letters vs equal (but space padded) expansion

SET NAMES utf8mb3;

DROP TABLE IF EXISTS t1;

CREATE TABLE t1 (a CHAR(2), PRIMARY KEY(a)) COLLATE utf8_unicode_nopad_ci ENGINE=InnoDB ROW_FORMAT=COMPACT;

INSERT INTO t1 VALUES ('ss'),('ß');

ERROR 1062 (23000): Duplicate entry 'ß' for key 'PRIMARY'

Looks wrong. The expected result is to accept both values.
The 'ss' has two characters. It is stored to the column as is. The 'ß' has only one character, so it is padded with one space to 'ß '.
The values 'ss' and 'ß ' are different in this collation:

SELECT _utf8mb3 'ss' COLLATE utf8_unicode_nopad_ci AS c1 UNION SELECT _utf8mb3 'ß ';

+------+

| c1   |

+------+

| ss   |

| ß    |

+------+

Basic latin letter (but followed by an ignorable character) vs equal accented letter

SET NAMES utf8mb3;

DROP TABLE IF EXISTS t1;

CREATE TABLE t1 (a CHAR(3), PRIMARY KEY(a)) CHARACTER SET utf8mb3 COLLATE utf8mb3_unicode_nopad_ci ENGINE=InnoDB ROW_FORMAT=COMPACT;

INSERT INTO t1 VALUES (CONCAT('a',_utf8mb3 0x01)),('ä');

ERROR 1062 (23000): Duplicate entry 'ä' for key 'PRIMARY'

Looks wrong. The expected result is to accept both values.
The first value CONCAT('a',_utf8mb3 0x01) returns two characters, so it gets padded with a space when stored to the column. So the value which is actually stored is CONCAT('a',_utf8mb3 0x01,' ').
The 'ä' is just one character, so it gets padded with two spaces to 'ä ' when stored. These two values are different in this collation:

SELECT HEX(c1) FROM (SELECT CONCAT('a',_utf8mb3 0x01, ' ') COLLATE utf8mb3_unicode_nopad_ci AS c1 UNION SELECT 'ä  ') td1;

+----------+

| HEX(c1)  |

+----------+

| 610120   |

| C3A42020 |

+----------+

SET NAMES utf8mb3;

DROP TABLE IF EXISTS t1;

CREATE TABLE t1 (a CHAR(2), PRIMARY KEY(a)) COLLATE utf8_unicode_nopad_ci ENGINE=InnoDB ROW_FORMAT=COMPACT;

INSERT INTO t1 VALUES (CONCAT('a',_utf8mb3 0x01)),('ä');

ERROR 1062 (23000): Duplicate entry 'ä' for key 'PRIMARY'

Looks wrong. The expected result is to accept both values.
The first value CONCAT('a',_utf8mb3 0x01) consists of two characters. It is stored as is.
The 'ä' is just one character, so it gets padded with one spaces to 'ä ' when stored. These two values are different in this collation:

SELECT HEX(c1) FROM (SELECT CONCAT('a',_utf8mb3 0x01) COLLATE utf8mb3_unicode_nopad_ci AS c1 UNION SELECT 'ä ') td1;

+---------+

| HEX(c1) |

+---------+

| 6101    |

| C3A420  |

+---------+

Attachments

Issue Links

is caused by

MDEV-9711 NO PAD collations

Closed

relates to

MDEV-25440 Assertion `cmp_rec_rec(rec, old_rec, offsets, old_offsets, m_index) > 0' failed in PageBulk::insert

Closed

MDEV-25904 New collation functions to compare InnoDB style trimmed NO PAD strings

Closed

MDEV-26744 MyISAM, Aria, MEMORY: CHAR+nopad does not work well

Open

MDEV-25449 Add MY_COLLATION_HANDLER::strnncollsp_nchars()

Closed

Activity

People

Assignee:: Alexander Barkov

Reporter:: Alexander Barkov

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 2021-10-01 12:36

Updated:: 2023-11-13 16:06

Resolved:: 2023-11-10 02:54

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

3.5h

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.