[MDEV-30024] InnoDB: tried to purge non-delete-marked record of an index on a virtual column prefix Created: 2022-11-16  Updated: 2023-10-03  Resolved: 2023-09-19

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB, Virtual Columns
Affects Version/s: 10.3, 10.4, 10.5, 10.6, 10.7, 10.8, 10.9, 10.10, 10.11, 11.0, 11.1, 11.2
Fix Version/s: 10.4.32, 10.5.23, 10.6.16, 10.10.7, 10.11.6, 11.0.4, 11.1.3

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Problem/Incident
is caused by MDEV-29666 InnoDB fails to purge secondary index... Closed
Relates

 Description   

The test case below fails nearly deterministically (but not quite, so run with --repeat=N) on 10.3, 10.4 and 10.6+ since the commit 1562b2c20b5 (MDEV-29666).

A recent comment in MDEV-20640 seems to suggest it may be a probability issue. I'm not so sure about it. This test fails on the 1st-2nd attempt after the change and didn't fail in 1000 attempts before the change, so how big the change in probability it must have been? And even if it was, it is still a regression from a very remotely possible problem to a very real one.

--source include/have_innodb.inc
 
CREATE TABLE t (a BINARY(8) NOT NULL DEFAULT '', b CHAR(8) AS (a) VIRTUAL, KEY(b(4))) CHARACTER SET utf8 ENGINE=InnoDB;
INSERT INTO t (a) VALUES (''),('');
UPDATE t SET a = 'x';
UPDATE t SET a = '';
 
--sleep 1
 
# Cleanup
DROP TABLE t;

10.3 df4c3d96

2022-11-17  0:10:54 3 [ERROR] InnoDB: tried to purge non-delete-marked record in index `b` of table `test`.`t`: tuple: TUPLE (info_bits=0, 2 fields): {[4]    (0x00000000),[6]      (0x000000000200)}, record: COMPACT RECORD(info_bits=0, 2 fields): {[4]    (0x00000000),[6]      (0x000000000200)}
mysqld: /home/jenkins/workspace/sandbox-elenst/Nightly-Build-CS/src/storage/innobase/row/row0purge.cc:599: bool row_purge_remove_sec_if_poss_leaf(purge_node_t*, dict_index_t*, const dtuple_t*): Assertion `0' failed.
 
#7  0x00007f1da665c662 in __GI___assert_fail (assertion=0x560bc03a8a80 "0", file=0x560bc03aae00 "/home/jenkins/workspace/sandbox-elenst/Nightly-Build-CS/src/storage/innobase/row/row0purge.cc", line=599, function=0x560bc03ab940 "bool row_purge_remove_sec_if_poss_leaf(purge_node_t*, dict_index_t*, const dtuple_t*)") at assert.c:101
#8  0x0000560bbf21d918 in row_purge_remove_sec_if_poss_leaf (node=0x61a000002508, index=0x618000048508, entry=0x619000051f08) at /home/jenkins/workspace/sandbox-elenst/Nightly-Build-CS/src/storage/innobase/row/row0purge.cc:599
#9  0x0000560bbf21df45 in row_purge_remove_sec_if_poss (node=0x61a000002508, index=0x618000048508, entry=0x619000051f08) at /home/jenkins/workspace/sandbox-elenst/Nightly-Build-CS/src/storage/innobase/row/row0purge.cc:695
#10 0x0000560bbf21fb37 in row_purge_upd_exist_or_extern_func (thr=0x616000006dd0, node=0x61a000002508, undo_rec=0x613000002060 "") at /home/jenkins/workspace/sandbox-elenst/Nightly-Build-CS/src/storage/innobase/row/row0purge.cc:897
#11 0x0000560bbf223734 in row_purge_record_func (node=0x61a000002508, undo_rec=0x613000002060 "", thr=0x616000006dd0, updated_extern=false) at /home/jenkins/workspace/sandbox-elenst/Nightly-Build-CS/src/storage/innobase/row/row0purge.cc:1342
#12 0x0000560bbf223af6 in row_purge (node=0x61a000002508, undo_rec=0x613000002060 "", thr=0x616000006dd0) at /home/jenkins/workspace/sandbox-elenst/Nightly-Build-CS/src/storage/innobase/row/row0purge.cc:1386
#13 0x0000560bbf2240b4 in row_purge_step (thr=0x616000006dd0) at /home/jenkins/workspace/sandbox-elenst/Nightly-Build-CS/src/storage/innobase/row/row0purge.cc:1464
#14 0x0000560bbf100937 in que_thr_step (thr=0x616000006dd0) at /home/jenkins/workspace/sandbox-elenst/Nightly-Build-CS/src/storage/innobase/que/que0que.cc:966
#15 0x0000560bbf100d6e in que_run_threads_low (thr=0x616000006dd0) at /home/jenkins/workspace/sandbox-elenst/Nightly-Build-CS/src/storage/innobase/que/que0que.cc:1028
#16 0x0000560bbf101194 in que_run_threads (thr=0x616000006dd0) at /home/jenkins/workspace/sandbox-elenst/Nightly-Build-CS/src/storage/innobase/que/que0que.cc:1068
#17 0x0000560bbf2c9010 in srv_task_execute (slot=0x560bc10049b8 <srv_sys+1016>) at /home/jenkins/workspace/sandbox-elenst/Nightly-Build-CS/src/storage/innobase/srv/srv0srv.cc:2452
#18 0x0000560bbf2c93a9 in srv_worker_thread (arg=0x0) at /home/jenkins/workspace/sandbox-elenst/Nightly-Build-CS/src/storage/innobase/srv/srv0srv.cc:2507
#19 0x00007f1da6807ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#20 0x00007f1da6727aef in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

10.6 ae6ebafd

2022-11-17  0:23:22 0 [ERROR] InnoDB: tried to purge non-delete-marked record in index `b` of table `test`.`t`: tuple: TUPLE (info_bits=0, 2 fields): {[4]    (0x00000000),[6]      (0x000000000200)}, record: COMPACT RECORD(info_bits=0, 2 fields): {[4]    (0x00000000),[6]      (0x000000000200)}
2022-11-17  0:23:22 0 [ERROR] InnoDB: Flagged corruption of `b` in table `test`.`t` in purge



 Comments   
Comment by Marko Mäkelä [ 2022-11-17 ]

Here is a little smaller test (shorter columns, fewer transactions and rows):

--source include/have_innodb.inc
 
SET @save_freq=@@GLOBAL.innodb_purge_rseg_truncate_frequency;
SET GLOBAL innodb_purge_rseg_truncate_frequency=1;
 
CREATE TABLE t (a BINARY(2) NOT NULL DEFAULT '', b CHAR(2) AS (a) VIRTUAL, KEY(b(1)))
CHARACTER SET utf8
ENGINE=InnoDB;
BEGIN;
INSERT INTO t (a) VALUES ('');
UPDATE t SET a = 'x';
UPDATE t SET a = '';
COMMIT;
 
SET GLOBAL innodb_max_purge_lag_wait=0;
--replace_regex /.*History list length ([0-9]+).*/\1/
SHOW ENGINE INNODB STATUS;
SET GLOBAL innodb_purge_rseg_truncate_frequency=@save_freq;
 
DROP TABLE t;

10.6 9aea7d83c8d006519bdf1f3269136b3756ed7548

2022-11-17 10:25:30 0 [ERROR] InnoDB: tried to purge non-delete-marked record in index `b` of table `test`.`t`: tuple: TUPLE (info_bits=0, 2 fields): {[1] (0x00),[6]      (0x000000000200)}, record: COMPACT RECORD(info_bits=0, 2 fields): {[1] (0x00),[6]      (0x000000000200)}
2022-11-17 10:25:30 0 [ERROR] InnoDB: Flagged corruption of `b` in table `test`.`t` in purge

At the time sql_print_error() is invoked by purge, the indexes of the table contain the following records:
GEN_CLUST_INDEX(DB_ROW_ID,DB_TRX_ID,DB_ROLL_PTR,a)=(0x200,20,(update),'')
b(b(4),DB_ROW_ID)=('',0x200),delete-marked('x',0x200)

The contents corresponds to what I would expect. The delete-marked record should be removed.
The message is output when the undo log record that was written by the first UPDATE statement is being processed.

There was no call to innobase_get_computed_value() before the message was output. During the purge of history, it could be called by row_vers_old_has_index_entry(true, …).

The corresponding test that does not involve a virtual column does not fail:

CREATE TABLE t (a BINARY(8) NOT NULL DEFAULT '', KEY(a(4))) CHARACTER SET utf8 ENGINE=InnoDB;

If I change the table to use a different collation, the test will not fail. It fails with the following:

CREATE TABLE t (a BINARY(2) NOT NULL, b CHAR(2) AS (a) VIRTUAL, KEY(b(1)))
CHARACTER SET utf8
ENGINE=InnoDB;

It will not fail if I replace either the BINARY with CHAR or omit the CHARACTER SET utf8 specification. Due to this role of character sets and collations, I strongly suspect that something is going wrong in the SQL layer.

Comment by Marko Mäkelä [ 2023-09-04 ]

There is a rather similar test case in MySQL Bug #86485:

set sql_mode="";
drop table if exists t;
create table t (
  `a` int,
  `b` int not null,
  `c` char(2) generated always as ('aa') virtual,
  unique key(`b`),
  unique key(`c`(1))
) engine=innodb default charset=utf8 row_format=compact;
 
drop procedure if exists p;
delimiter $
create procedure p()
begin
  declare continue handler for sqlexception begin end;
  repeat
    if rand()>0.5 then insert into t(a,b) values(floor(rand()*10),floor(rand()*10)); end if;
    if rand()>0.5 then delete from t where a<floor(rand()*10); end if;
    if rand()>0.5 then update t set b=concat(a,a); end if;
    if rand()>0.5 then replace into t(a,b) values(floor(rand()*10),floor(rand()*10)); end if;
  until 1=2 end repeat;
end $
delimiter ;
 
call p();

As far as I can tell, the column a as well as the specification row_format=compact should be irrelevant. The column b could be defined as a PRIMARY KEY. Here is a slightly simplified mtr test:

--source include/have_innodb.inc
 
set sql_mode="";
create table t (
  b int primary key,
  c char(2) generated always as ('aa') virtual,
  unique key(c(1))
) engine=innodb default charset=utf8;
 
delimiter $;
create procedure p()
begin
  declare continue handler for sqlexception begin end;
  repeat
    if rand()>0.5 then insert into t(b) values(floor(rand()*10)); end if;
    if rand()>0.5 then delete from t; end if;
    if rand()>0.5 then update t set b=floor(rand()*10); end if;
    if rand()>0.5 then replace into t(b) values(floor(rand()*10)); end if;
  until 1=2 end repeat;
end $
delimiter ;$
 
call p();

This will eventually crash due to a debug assertion failure:

10.6 2325f8f33971c29503639126c69e549aa53b5585

mysqltest: At line 23: query 'call p()' failed: <Unknown> (2013): Lost connection to server during query
Version: '10.6.16-MariaDB-debug-log'  socket: '/dev/shm/10.6/mysql-test/var/tmp/4/mysqld.1.sock'  port: 16060  Source distribution
2023-09-04  8:48:30 0 [ERROR] InnoDB: tried to purge non-delete-marked record in index `c` of table `test`.`t`: tuple: TUPLE (info_bits=0, 2 fields): {[1]a(0x61),[4]    (0x80000000)}, record: COMPACT RECORD(info_bits=0, 2 fields): {[1]a(0x61),[4]    (0x80000000)}
mariadbd: /mariadb/10.6/storage/innobase/btr/btr0cur.cc:1011: dberr_t btr_cur_t::search_leaf(const dtuple_t*, page_cur_mode_t, btr_latch_mode, mtr_t*): Assertion `index()->is_btree() || index()->is_ibuf()' failed.
2023-09-04  8:48:30 0 [ERROR] InnoDB: Flagged corruption of `c` in table `test`.`t` in purge

There was an attempt to fix this bug in MySQL 8.0.11 but never in MySQL 5.7 or MariaDB Server. I think that the fix may be incorrect, because it is dealing with bytes and not characters. A test case that used non-ASCII data should reveal it. With this port of the suspected incorrect patch and the above test case, 10.6 does not crash for me:

diff --git a/storage/innobase/row/row0vers.cc b/storage/innobase/row/row0vers.cc
index a4fc32cc5a8..de1586ac9af 100644
--- a/storage/innobase/row/row0vers.cc
+++ b/storage/innobase/row/row0vers.cc
@@ -734,9 +734,15 @@ row_vers_vc_matches_cluster(
 			    && (!compare[v_col->v_pos])) {
 
 				if (ind_field->prefix_len != 0
-				    && !dfield_is_null(field2)
-				    && field2->len > ind_field->prefix_len) {
-					field2->len = ind_field->prefix_len;
+				    && !dfield_is_null(field2)) {
+					if (field2->len > ind_field->prefix_len) {
+						field2->len = ind_field->prefix_len;
+					}
+					if (field2->type.mbmaxlen > 1) {
+						/* FIXME: extract the correct number of characters
+						(not bytes)! */
+						field2->len = field1->len;
+					}
 				}
 
 				/* The index field mismatch */

I interrupted the test after 143 seconds (23 concurrent instances). The last run without the patch would end in a crash in 26 seconds.

Comment by Marko Mäkelä [ 2023-09-18 ]

I think that the correct approach is to invoke dtype_get_at_most_n_mbchars(), like row_sel_sec_rec_is_for_clust_rec() and the MDEV-24402 row_check_index_match() do.

Comment by Marko Mäkelä [ 2023-09-18 ]

I tested an improved version of the test case in the Description:

--source include/have_innodb.inc
CREATE TABLE t (a BINARY(8) NOT NULL DEFAULT '', b CHAR(8) AS (a) VIRTUAL, KEY(b(4))) CHARACTER SET utf8 ENGINE=InnoDB;
INSERT INTO t (a) VALUES (''),('');
UPDATE t SET a = 'x';
UPDATE t SET a = '';
 
SET @save_freq=@@GLOBAL.innodb_purge_rseg_truncate_frequency;
SET GLOBAL innodb_purge_rseg_truncate_frequency=1;
SET GLOBAL innodb_max_purge_lag_wait=0;
SET GLOBAL innodb_purge_rseg_truncate_frequency=@save_freq;
 
CHECK TABLE t EXTENDED;
DROP TABLE t;

This test case passed 5×100 runs when the patch was applied. When not, it would fail reliably thanks to MDEV-24402:

10.6 0f870914d410e8a34ea99b1e0ea7eb71094606f9

@@ -8,5 +8,6 @@
 SET GLOBAL innodb_purge_rseg_truncate_frequency=@save_freq;
 CHECK TABLE t EXTENDED;
 Table	Op	Msg_type	Msg_text
-test.t	check	status	OK
+test.t	check	Warning	InnoDB: Index b is marked as corrupted
+test.t	check	error	Corrupt
 DROP TABLE t;
 
mysqltest: Result length mismatch

Generated at Thu Feb 08 10:13:04 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.