[MDEV-30046] wrong row targeted with "insert ... on duplicate" and "replace", leading to data corruption Created: 2022-11-21  Updated: 2024-01-03

Status: Stalled
Project: MariaDB Server
Component/s: Data Manipulation - Insert
Affects Version/s: 10.5.15, 10.4, 10.5, 10.6, 10.7, 10.8, 10.9, 10.10
Fix Version/s: 10.4, 10.5, 10.6

Type: Bug Priority: Critical
Reporter: Frank Heckenbach Assignee: Nikita Malyavin
Resolution: Unresolved Votes: 0
Labels: None
Environment:

Linux 5.10.0-14-amd64, Debian 11.5


Issue Links:
Duplicate
is duplicated by MDEV-30588 Failed to update duplicate data when ... Closed
Relates
relates to MDEV-371 Unique indexes for blobs Closed
relates to MDEV-17395 REPLACE/INSERT ODKU: support WITHOUT ... Stalled
relates to MDEV-18748 REPLACE doesn't work with unique blob... Closed
relates to MDEV-31093 "ON DUPLICATE KEY UPDATE" saves wrong... Open

 Description   

(Also reported here: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1015293)

Using the MySQL interface, these statements:

DROP TABLE IF EXISTS t;
CREATE TABLE t (s BLOB, n INT, UNIQUE (s));
INSERT INTO t VALUES ('Hrecvx_0004ln-00',1), ('Hrecvx_0004mm-00',1);
INSERT INTO t VALUES ('Hrecvx_0004mm-00',2) ON DUPLICATE KEY UPDATE n = VALUES (n);
SELECT * FROM t;

produce this output:

s n
Hrecvx_0004ln-00 2
Hrecvx_0004mm-00 1

So the latter "INSERT" updates the wrong row.

This happens whether the first column is "BLOB" or "TEXT", but only
with specific values. (In my actual use case with ~1 million rows,
it happened a few dozen times, which might be consistent e.g. with
collisions of a 32 bit hash or so.)

Likewise, these statements:

DROP TABLE IF EXISTS t;
CREATE TABLE t (s BLOB, n INT, UNIQUE (s));
INSERT INTO t VALUES ('Hrecvx_0004ln-00',1), ('Hrecvx_0004mm-00',1);
REPLACE INTO t VALUES ('Hrecvx_0004mm-00',2);
SELECT * FROM t;

give the error:

ERROR 1062 (23000) at line 4: Duplicate entry 'Hrecvx_0004mm-00' for key 's'

In my understanding, this error should actually be impossible with
"REPLACE INTO".

It might be the same issue, i.e. it tries to delete the wrong row
before inserting the new one, so it's still duplicate.



 Comments   
Comment by Alice Sherepa [ 2022-11-21 ]

Thanks for the report! I repeated as described on 10.4-10.11, InnoDB. Probably caused by MDEV-371

--source include/have_innodb.inc 
CREATE TABLE t (s BLOB, n INT, UNIQUE (s)) engine=innodb;
INSERT INTO t VALUES ('Hrecvx_0004ln-00',1), ('Hrecvx_0004mm-00',1);
REPLACE INTO t VALUES ('Hrecvx_0004mm-00',2);

10.4 0d586d62e5326053383

CURRENT_TEST: main.1_my
mysqltest: At line 9: query 'REPLACE INTO t VALUES ('Hrecvx_0004mm-00',2)' failed: 1062: Duplicate entry 'Hrecvx_0004mm-00' for key 's'

CREATE TABLE t (s BLOB, n INT, UNIQUE (s)) engine=innodb;
INSERT INTO t VALUES ('Hrecvx_0004ln-00',1), ('Hrecvx_0004mm-00',1);
INSERT INTO t VALUES ('Hrecvx_0004mm-00',2) ON DUPLICATE KEY UPDATE n = VALUES(n) ;
SELECT * FROM t;

SELECT * FROM t;
s	n
Hrecvx_0004ln-00	2
Hrecvx_0004mm-00	1

Comment by Alexander Barkov [ 2022-12-16 ]

Works fine with MyISAM:

CREATE OR REPLACE TABLE t1 (s BLOB, n INT, UNIQUE (s)) engine=MyISAM;
INSERT INTO t1 VALUES  ('Hrecvx_0004ln-00',1),
                       ('Hrecvx_0004mm-00',1);
REPLACE INTO t1 VALUES ('Hrecvx_0004mm-00',2);
SELECT * FROM t1;

+------------------+------+
| s                | n    |
+------------------+------+
| Hrecvx_0004ln-00 |    1 |
| Hrecvx_0004mm-00 |    2 |
+------------------+------+

CREATE OR REPLACE TABLE t1 (s BLOB, n INT, UNIQUE (s)) engine=MyISAM;
INSERT INTO t1 VALUES ('Hrecvx_0004ln-00',1), ('Hrecvx_0004mm-00',1);
INSERT INTO t1 VALUES ('Hrecvx_0004mm-00',2) ON DUPLICATE KEY UPDATE n = VALUE(n) ;
SELECT * FROM t1;

+------------------+------+
| s                | n    |
+------------------+------+
| Hrecvx_0004ln-00 |    1 |
| Hrecvx_0004mm-00 |    2 |
+------------------+------+

Comment by Nikita Malyavin [ 2023-01-09 ]

Thanks bar for the pointers, the bug with IDEMPOTENT replication was found with long uniques, which uses the same logic as REPLACE, and even contains the copy-paste from there. I guess it was made so to allocate and reuse the key buffer memory on the stack, saving from extra malloc. I think our priority now is to minimize stack usage vs extra mallocs, so it's not the point anymore.

I made some refactoring and extracted common code. Also moved handler's RND search initialization to Write_rows_log_event::do_before_row_operations. This fixed many long unique bugs in relplication and optimized the use a little bit.

https://github.com/MariaDB/server/commit/703e73e221a42638f2f05379124b35c57482da93

One more refactoring should be done to generalize handler initialization (and de-initialization) across REPLACE, LOAD DATA, IDEMPOTENT replication, and improve memory usage.

Comment by Otto Kekäläinen [ 2023-12-01 ]

The commit https://github.com/MariaDB/server/commit/703e73e221a42638f2f05379124b35c57482da93 was never in a pull request nor included on any branch, and `git log -S MDEV-30046` does not yield any results from 10.5 branch, so I assume this issue is still open (and thus will also continue to keep https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1015293 open).

Generated at Thu Feb 08 10:13:13 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.