[MDEV-30095] Unexpected duplicate entry error for UNIQUE USING HASH + NOPAD Created: 2022-11-25  Updated: 2022-11-25

Status: Open
Project: MariaDB Server
Component/s: Character Sets
Affects Version/s: 10.4, 10.5, 10.6, 10.7, 10.8, 10.9, 10.10, 10.11
Fix Version/s: 10.4

Type: Bug Priority: Major
Reporter: Alexander Barkov Assignee: Alexander Barkov
Resolution: Unresolved Votes: 1
Labels: None

Issue Links:
Problem/Incident
is caused by MDEV-25904 New collation functions to compare In... Closed
Relates
relates to MDEV-371 Unique indexes for blobs Closed
relates to MDEV-30072 Wrong ORDER BY for a partitioned pref... Closed

 Description   

This script returns an unexpected duplicate entry error:

CREATE OR REPLACE TABLE t1 (
  a LONGTEXT CHARACTER SET latin1 COLLATE latin1_swedish_nopad_ci,
  UNIQUE KEY(a(33000)) USING HASH
);
INSERT INTO t1 VALUES (CONCAT('td',SPACE(31696)));
INSERT INTO t1 VALUES (CONCAT('td',SPACE(31728)));

ERROR 1062 (23000): Duplicate entry 'td                                                           ...' for key 'a'

The two inserted strings cannot be equal because this is a NOPAD collation - different amounts of trailing spaces make the longer string greater. Therefore, the duplicate entry error is not correct.

The same problem is repeatable with these two strings:

CREATE OR REPLACE TABLE t1 (
  a LONGTEXT CHARACTER SET latin1 COLLATE latin1_swedish_nopad_ci,
  UNIQUE KEY(a(33000)) USING HASH
);
INSERT INTO t1 VALUES (CONCAT('xh',SPACE(32457)));
INSERT INTO t1 VALUES (CONCAT('xh',SPACE(32489)));

ERROR 1062 (23000): Duplicate entry 'xh                                                           ...' for key 'a'

Problem details

These two data pairs are special - they were found programatically.

The idea is that these two expressions produce the same hash value 10067658370460279189 in the collation latin1_swedish_nopad_ci:

  • CONCAT('td',SPACE(31696))
  • CONCAT('td',SPACE(31728))

The problem is that the further duplicate hash resolution (performed to know if the actual values are really equal) erroneously treat these strings as equal using strnncollsp_nchars(). Using this function is correct for the CHAR data type, but it's not correct for VARCHAR/TEXT.

The other two expressions also produce equal hash value 1452848758766898498 in the collation latin1_swedish_nopad_ci:

  • CONCAT('xh',SPACE(32457))
  • CONCAT('xh',SPACE(32489))

Generated at Thu Feb 08 10:13:37 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.