[MDEV-26294] Duplicate entries in unique index not detected when changing collation with INPLACE / NOCOPY algoritm Created: 2021-08-02  Updated: 2022-08-31  Resolved: 2022-07-04

Status: Closed
Project: MariaDB Server
Component/s: Data Definition - Alter Table, Storage Engine - InnoDB
Affects Version/s: 10.6.0, 10.4.20, 10.5.11
Fix Version/s: 10.4.26, 10.5.17, 10.6.9, 10.7.5, 10.8.4, 10.9.2, 10.10.0

Type: Bug Priority: Critical
Reporter: Hartmut Holzgraefe Assignee: Marko Mäkelä
Resolution: Fixed Votes: 2
Labels: corruption, regression-10.4

Issue Links:
Blocks
blocks MDEV-22365 Assertion `!col->mbmaxlen || len >= c... Closed
is blocked by MDEV-15250 UPSERT during ALTER-TABLE results in ... Closed
Duplicate
is duplicated by MDEV-27209 Assertion `((cursor)->page_cur.rec) =... Closed
is duplicated by MDEV-27280 server crashes on CHECK TABLE after C... Closed
Problem/Incident
causes MDEV-29425 Buffer overflow in dict_index_t::col_... Closed
is caused by MDEV-17301 Change of COLLATE unnecessarily requi... Closed
Relates
relates to MDEV-15564 Avoid table rebuild in ALTER TABLE on... Closed
relates to MDEV-18584 Avoid copying when altering CHAR colu... Confirmed

 Description   

When creating a table with a case sensitive collation on a column with unique constraint, then trying to alter the collation to a case insensitive one, entries that only differ in case lead to a duplicate entry error, aborting the ALTER TABLE, when using the COPY algorithm, as expected:

CREATE TABLE t1 (
  id INT PRIMARY KEY,
  msg VARCHAR(100) CHARACTER SET utf8 COLLATE utf8_bin,
  UNIQUE(msg)
) ENGINE=INNODB;
 
INSERT INTO t1 VALUES (1, 'aaa');
INSERT INTO t1 VALUES (2, 'AAA');
 
ALTER TABLE t1 MODIFY msg VARCHAR(100) CHARACTER SET utf8 COLLATE utf8_unicode_ci, ALGORITHM=COPY;

This gives

ERROR 1062 (23000): Duplicate entry 'AAA' for key 'msg'

as expected.

When trying the same with ALGORITHM=INPLACE or NOCOPY it works without detecting that we now have a duplicate entry though, so the new table version may now contain data violating the UNIQUE constraint.



 Comments   
Comment by Thirunarayanan Balathandayuthapani [ 2021-08-09 ]

Looks like this issue is caused the following patch:

commit 854c219a7f0e1878517d5a821992f650342380dd
Author: Eugene Kosov <claprix@yandex.ru>
Date:   Fri Jun 14 12:18:49 2019 +0300
 
    MDEV-17301 Change of COLLATE unnecessarily requires ALGORITHM=COPY

Comment by Alexander Barkov [ 2021-10-19 ]

This problem is also repeatable when altering between two _ci collations:

SET NAMES utf8;
CREATE OR REPLACE TABLE t1 (
  id INT PRIMARY KEY,
  msg VARCHAR(100) CHARACTER SET utf8 COLLATE utf8_swedish_ci,
  UNIQUE(msg)
) ENGINE=INNODB;
INSERT INTO t1 VALUES (1, 'aaa');
INSERT INTO t1 VALUES (2, 'ååå');
ALTER TABLE t1 MODIFY msg VARCHAR(100) CHARACTER SET utf8 COLLATE utf8_unicode_ci, ALGORITHM=COPY;

ERROR 1062 (23000): Duplicate entry 'ååå' for key 'msg'

The error is correct:

  • in utf8_swedish_ci, 'å' is a separate letter which is sorted after 'z'
  • in utf8_unicode_ci, 'å' is a variant of 'a'

Now I use ALGORITHM=NOCOPY:

ALTER TABLE t1 MODIFY msg VARCHAR(100) CHARACTER SET utf8 COLLATE utf8_unicode_ci, ALGORITHM=NOCOPY;

Query OK, 0 rows affected (0.009 sec)
Records: 0  Duplicates: 0  Warnings: 0

Looks wrong.

See:
https://collation-charts.org/mysql60/mysql604.utf8_swedish_ci.html
https://collation-charts.org/mysql60/mysql604.utf8_unicode_ci.european.html

Comment by Thirunarayanan Balathandayuthapani [ 2022-02-10 ]

I took a look at the given patch(https://github.com/MariaDB/server/commit/ab2414a6ceb376a8c03569a7d63d747ff215afdc).
Problem with the patch is that InnoDB has to get the precise type list for row0merge.cc and row0log.cc.

Moreover, column in the index points to old collation precise type
and it could lead to crash in pageBulk::insert() whether the
records are in ascending order.(MDEV-27280)

Test case is

--source include/have_innodb.inc
CREATE TABLE t1 (
  id INT PRIMARY KEY,
  msg VARCHAR(100) CHARACTER SET utf8 COLLATE utf8_bin,
  id_2 INT not null,
  unique index(msg, id_2)
) ENGINE=INNODB;
 
INSERT INTO t1 VALUES (1, 'aaa', 2);
INSERT INTO t1 VALUES (2, 'AAA', 3);
ALTER TABLE t1 MODIFY msg VARCHAR(100) CHARACTER SET utf8 COLLATE utf8_unicode_ci, ALGORITHM=NOCOPY;
DROP TABLE t1;
 
Assert:
 
#3  0x00007ffff5aef502 in __GI___assert_fail (assertion=0x555556ed72b0 "cmp_rec_rec(rec, old_rec, offsets, old_offsets, m_index) > 0", file=0x555556ed71c8 "/home/thiru/mariarepo/server/10.4/storage/innobase/btr/btr0bulk.cc", line=200, function=0x555556ed8280 <PageBulk::insert(unsigned char const*, unsigned short*)::__PRETTY_FUNCTION__> "void PageBulk::insert(const rec_t*, rec_offs*)") at assert.c:101
#4  0x0000555556756183 in PageBulk::insert (this=0x7fffa005d120, rec=0x7fffa005d73f "AAA\200", offsets=0x7fffa005d750) at /home/thiru/mariarepo/server/10.4/storage/innobase/btr/btr0bulk.cc:200
#5  0x0000555556758937 in BtrBulk::insert (this=0x7fffeae62710, tuple=0x7fffa05d66e0, level=0) at /home/thiru/mariarepo/server/10.4/storage/innobase/btr/btr0bulk.cc:988

InnoDB DDL has to have the index which has changed column collation value. Then only we could avoid this assert and re-arrange the record depends on
new collation.

Based on the approach, current patch does create new column in inplace_alter heap and index points to the new column.
It leads to reloading of table during commit_inplace_alter_table(). This approach has some drawbacks when
we're dealing with concurrent DML. Newly created column won't have relation to the table and it could
lead to many crashes for concurrent DML. Patch is in bb-10.6-MDEV-26294

Correct approach would be that InnoDB DDL has to use newly created column (collation changed one) and
concurrent DML has to use old column. By using this approach, InnoDB could achieve that
nocopy + instant operation can be done in nocopy algorithm.

Comment by Marko Mäkelä [ 2022-07-01 ]

thiru, your fix looked correct to me. I took the liberty to improve some comments and simplify the code. Please check if it looks OK to you.

Comment by Marko Mäkelä [ 2022-07-04 ]

In MariaDB Server 10.4.26 and 10.5.7, the bug will be fixed by refusing native ALTER TABLE when the collation of a column is changed such that it remains or becomes indexed:

CREATE TABLE t1 (
  id INT PRIMARY KEY,
  msg VARCHAR(100) CHARACTER SET utf8 COLLATE utf8_swedish_ci UNIQUE
) ENGINE=INNODB;
INSERT INTO t1 VALUES (1, 'aaa');
INSERT INTO t1 VALUES (2, 'ååå');
--error ER_ALTER_OPERATION_NOT_SUPPORTED_REASON
ALTER TABLE t1
MODIFY msg VARCHAR(100) CHARACTER SET utf8 COLLATE utf8_unicode_ci,
ALGORITHM=NOCOPY;
ALTER TABLE t1 DROP INDEX msg,
MODIFY msg VARCHAR(100) CHARACTER SET utf8 COLLATE utf8_unicode_ci,
ALGORITHM=NOCOPY;

In MariaDB Server 10.6.9, the collation change will remain allowed, while any affected secondary indexes will be correctly (re)built online.

Generated at Thu Feb 08 09:44:13 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.