Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-30024

InnoDB: tried to purge non-delete-marked record of an index on a virtual column prefix

Details

    Description

      The test case below fails nearly deterministically (but not quite, so run with --repeat=N) on 10.3, 10.4 and 10.6+ since the commit 1562b2c20b5 (MDEV-29666).

      A recent comment in MDEV-20640 seems to suggest it may be a probability issue. I'm not so sure about it. This test fails on the 1st-2nd attempt after the change and didn't fail in 1000 attempts before the change, so how big the change in probability it must have been? And even if it was, it is still a regression from a very remotely possible problem to a very real one.

      --source include/have_innodb.inc
       
      CREATE TABLE t (a BINARY(8) NOT NULL DEFAULT '', b CHAR(8) AS (a) VIRTUAL, KEY(b(4))) CHARACTER SET utf8 ENGINE=InnoDB;
      INSERT INTO t (a) VALUES (''),('');
      UPDATE t SET a = 'x';
      UPDATE t SET a = '';
       
      --sleep 1
       
      # Cleanup
      DROP TABLE t;
      

      10.3 df4c3d96

      2022-11-17  0:10:54 3 [ERROR] InnoDB: tried to purge non-delete-marked record in index `b` of table `test`.`t`: tuple: TUPLE (info_bits=0, 2 fields): {[4]    (0x00000000),[6]      (0x000000000200)}, record: COMPACT RECORD(info_bits=0, 2 fields): {[4]    (0x00000000),[6]      (0x000000000200)}
      mysqld: /home/jenkins/workspace/sandbox-elenst/Nightly-Build-CS/src/storage/innobase/row/row0purge.cc:599: bool row_purge_remove_sec_if_poss_leaf(purge_node_t*, dict_index_t*, const dtuple_t*): Assertion `0' failed.
       
      #7  0x00007f1da665c662 in __GI___assert_fail (assertion=0x560bc03a8a80 "0", file=0x560bc03aae00 "/home/jenkins/workspace/sandbox-elenst/Nightly-Build-CS/src/storage/innobase/row/row0purge.cc", line=599, function=0x560bc03ab940 "bool row_purge_remove_sec_if_poss_leaf(purge_node_t*, dict_index_t*, const dtuple_t*)") at assert.c:101
      #8  0x0000560bbf21d918 in row_purge_remove_sec_if_poss_leaf (node=0x61a000002508, index=0x618000048508, entry=0x619000051f08) at /home/jenkins/workspace/sandbox-elenst/Nightly-Build-CS/src/storage/innobase/row/row0purge.cc:599
      #9  0x0000560bbf21df45 in row_purge_remove_sec_if_poss (node=0x61a000002508, index=0x618000048508, entry=0x619000051f08) at /home/jenkins/workspace/sandbox-elenst/Nightly-Build-CS/src/storage/innobase/row/row0purge.cc:695
      #10 0x0000560bbf21fb37 in row_purge_upd_exist_or_extern_func (thr=0x616000006dd0, node=0x61a000002508, undo_rec=0x613000002060 "") at /home/jenkins/workspace/sandbox-elenst/Nightly-Build-CS/src/storage/innobase/row/row0purge.cc:897
      #11 0x0000560bbf223734 in row_purge_record_func (node=0x61a000002508, undo_rec=0x613000002060 "", thr=0x616000006dd0, updated_extern=false) at /home/jenkins/workspace/sandbox-elenst/Nightly-Build-CS/src/storage/innobase/row/row0purge.cc:1342
      #12 0x0000560bbf223af6 in row_purge (node=0x61a000002508, undo_rec=0x613000002060 "", thr=0x616000006dd0) at /home/jenkins/workspace/sandbox-elenst/Nightly-Build-CS/src/storage/innobase/row/row0purge.cc:1386
      #13 0x0000560bbf2240b4 in row_purge_step (thr=0x616000006dd0) at /home/jenkins/workspace/sandbox-elenst/Nightly-Build-CS/src/storage/innobase/row/row0purge.cc:1464
      #14 0x0000560bbf100937 in que_thr_step (thr=0x616000006dd0) at /home/jenkins/workspace/sandbox-elenst/Nightly-Build-CS/src/storage/innobase/que/que0que.cc:966
      #15 0x0000560bbf100d6e in que_run_threads_low (thr=0x616000006dd0) at /home/jenkins/workspace/sandbox-elenst/Nightly-Build-CS/src/storage/innobase/que/que0que.cc:1028
      #16 0x0000560bbf101194 in que_run_threads (thr=0x616000006dd0) at /home/jenkins/workspace/sandbox-elenst/Nightly-Build-CS/src/storage/innobase/que/que0que.cc:1068
      #17 0x0000560bbf2c9010 in srv_task_execute (slot=0x560bc10049b8 <srv_sys+1016>) at /home/jenkins/workspace/sandbox-elenst/Nightly-Build-CS/src/storage/innobase/srv/srv0srv.cc:2452
      #18 0x0000560bbf2c93a9 in srv_worker_thread (arg=0x0) at /home/jenkins/workspace/sandbox-elenst/Nightly-Build-CS/src/storage/innobase/srv/srv0srv.cc:2507
      #19 0x00007f1da6807ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
      #20 0x00007f1da6727aef in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
      

      10.6 ae6ebafd

      2022-11-17  0:23:22 0 [ERROR] InnoDB: tried to purge non-delete-marked record in index `b` of table `test`.`t`: tuple: TUPLE (info_bits=0, 2 fields): {[4]    (0x00000000),[6]      (0x000000000200)}, record: COMPACT RECORD(info_bits=0, 2 fields): {[4]    (0x00000000),[6]      (0x000000000200)}
      2022-11-17  0:23:22 0 [ERROR] InnoDB: Flagged corruption of `b` in table `test`.`t` in purge
      

      Attachments

        Issue Links

          Activity

            Here is a little smaller test (shorter columns, fewer transactions and rows):

            --source include/have_innodb.inc
             
            SET @save_freq=@@GLOBAL.innodb_purge_rseg_truncate_frequency;
            SET GLOBAL innodb_purge_rseg_truncate_frequency=1;
             
            CREATE TABLE t (a BINARY(2) NOT NULL DEFAULT '', b CHAR(2) AS (a) VIRTUAL, KEY(b(1)))
            CHARACTER SET utf8
            ENGINE=InnoDB;
            BEGIN;
            INSERT INTO t (a) VALUES ('');
            UPDATE t SET a = 'x';
            UPDATE t SET a = '';
            COMMIT;
             
            SET GLOBAL innodb_max_purge_lag_wait=0;
            --replace_regex /.*History list length ([0-9]+).*/\1/
            SHOW ENGINE INNODB STATUS;
            SET GLOBAL innodb_purge_rseg_truncate_frequency=@save_freq;
             
            DROP TABLE t;
            

            10.6 9aea7d83c8d006519bdf1f3269136b3756ed7548

            2022-11-17 10:25:30 0 [ERROR] InnoDB: tried to purge non-delete-marked record in index `b` of table `test`.`t`: tuple: TUPLE (info_bits=0, 2 fields): {[1] (0x00),[6]      (0x000000000200)}, record: COMPACT RECORD(info_bits=0, 2 fields): {[1] (0x00),[6]      (0x000000000200)}
            2022-11-17 10:25:30 0 [ERROR] InnoDB: Flagged corruption of `b` in table `test`.`t` in purge
            

            At the time sql_print_error() is invoked by purge, the indexes of the table contain the following records:
            GEN_CLUST_INDEX(DB_ROW_ID,DB_TRX_ID,DB_ROLL_PTR,a)=(0x200,20,(update),'')
            b(b(4),DB_ROW_ID)=('',0x200),delete-marked('x',0x200)

            The contents corresponds to what I would expect. The delete-marked record should be removed.
            The message is output when the undo log record that was written by the first UPDATE statement is being processed.

            There was no call to innobase_get_computed_value() before the message was output. During the purge of history, it could be called by row_vers_old_has_index_entry(true, …).

            The corresponding test that does not involve a virtual column does not fail:

            CREATE TABLE t (a BINARY(8) NOT NULL DEFAULT '', KEY(a(4))) CHARACTER SET utf8 ENGINE=InnoDB;
            

            If I change the table to use a different collation, the test will not fail. It fails with the following:

            CREATE TABLE t (a BINARY(2) NOT NULL, b CHAR(2) AS (a) VIRTUAL, KEY(b(1)))
            CHARACTER SET utf8
            ENGINE=InnoDB;
            

            It will not fail if I replace either the BINARY with CHAR or omit the CHARACTER SET utf8 specification. Due to this role of character sets and collations, I strongly suspect that something is going wrong in the SQL layer.

            marko Marko Mäkelä added a comment - Here is a little smaller test (shorter columns, fewer transactions and rows): --source include/have_innodb.inc   SET @save_freq=@@ GLOBAL .innodb_purge_rseg_truncate_frequency; SET GLOBAL innodb_purge_rseg_truncate_frequency=1;   CREATE TABLE t (a BINARY (2) NOT NULL DEFAULT '' , b CHAR (2) AS (a) VIRTUAL, KEY (b(1))) CHARACTER SET utf8 ENGINE=InnoDB; BEGIN ; INSERT INTO t (a) VALUES ( '' ); UPDATE t SET a = 'x' ; UPDATE t SET a = '' ; COMMIT ;   SET GLOBAL innodb_max_purge_lag_wait=0; --replace_regex /.*History list length ([0-9]+).*/\1/ SHOW ENGINE INNODB STATUS; SET GLOBAL innodb_purge_rseg_truncate_frequency=@save_freq;   DROP TABLE t; 10.6 9aea7d83c8d006519bdf1f3269136b3756ed7548 2022-11-17 10:25:30 0 [ERROR] InnoDB: tried to purge non-delete-marked record in index `b` of table `test`.`t`: tuple: TUPLE (info_bits=0, 2 fields): {[1] (0x00),[6] (0x000000000200)}, record: COMPACT RECORD(info_bits=0, 2 fields): {[1] (0x00),[6] (0x000000000200)} 2022-11-17 10:25:30 0 [ERROR] InnoDB: Flagged corruption of `b` in table `test`.`t` in purge At the time sql_print_error() is invoked by purge, the indexes of the table contain the following records: GEN_CLUST_INDEX(DB_ROW_ID,DB_TRX_ID,DB_ROLL_PTR,a)=(0x200,20,(update),'') b(b(4),DB_ROW_ID)=('',0x200),delete-marked('x',0x200) The contents corresponds to what I would expect. The delete-marked record should be removed. The message is output when the undo log record that was written by the first UPDATE statement is being processed. There was no call to innobase_get_computed_value() before the message was output. During the purge of history, it could be called by row_vers_old_has_index_entry(true, …) . The corresponding test that does not involve a virtual column does not fail: CREATE TABLE t (a BINARY (8) NOT NULL DEFAULT '' , KEY (a(4))) CHARACTER SET utf8 ENGINE=InnoDB; If I change the table to use a different collation, the test will not fail. It fails with the following: CREATE TABLE t (a BINARY (2) NOT NULL , b CHAR (2) AS (a) VIRTUAL, KEY (b(1))) CHARACTER SET utf8 ENGINE=InnoDB; It will not fail if I replace either the BINARY with CHAR or omit the CHARACTER SET utf8 specification. Due to this role of character sets and collations, I strongly suspect that something is going wrong in the SQL layer.

            There is a rather similar test case in MySQL Bug #86485:

            set sql_mode="";
            drop table if exists t;
            create table t (
              `a` int,
              `b` int not null,
              `c` char(2) generated always as ('aa') virtual,
              unique key(`b`),
              unique key(`c`(1))
            ) engine=innodb default charset=utf8 row_format=compact;
             
            drop procedure if exists p;
            delimiter $
            create procedure p()
            begin
              declare continue handler for sqlexception begin end;
              repeat
                if rand()>0.5 then insert into t(a,b) values(floor(rand()*10),floor(rand()*10)); end if;
                if rand()>0.5 then delete from t where a<floor(rand()*10); end if;
                if rand()>0.5 then update t set b=concat(a,a); end if;
                if rand()>0.5 then replace into t(a,b) values(floor(rand()*10),floor(rand()*10)); end if;
              until 1=2 end repeat;
            end $
            delimiter ;
             
            call p();
            

            As far as I can tell, the column a as well as the specification row_format=compact should be irrelevant. The column b could be defined as a PRIMARY KEY. Here is a slightly simplified mtr test:

            --source include/have_innodb.inc
             
            set sql_mode="";
            create table t (
              b int primary key,
              c char(2) generated always as ('aa') virtual,
              unique key(c(1))
            ) engine=innodb default charset=utf8;
             
            delimiter $;
            create procedure p()
            begin
              declare continue handler for sqlexception begin end;
              repeat
                if rand()>0.5 then insert into t(b) values(floor(rand()*10)); end if;
                if rand()>0.5 then delete from t; end if;
                if rand()>0.5 then update t set b=floor(rand()*10); end if;
                if rand()>0.5 then replace into t(b) values(floor(rand()*10)); end if;
              until 1=2 end repeat;
            end $
            delimiter ;$
             
            call p();
            

            This will eventually crash due to a debug assertion failure:

            10.6 2325f8f33971c29503639126c69e549aa53b5585

            mysqltest: At line 23: query 'call p()' failed: <Unknown> (2013): Lost connection to server during query
            …
            Version: '10.6.16-MariaDB-debug-log'  socket: '/dev/shm/10.6/mysql-test/var/tmp/4/mysqld.1.sock'  port: 16060  Source distribution
            2023-09-04  8:48:30 0 [ERROR] InnoDB: tried to purge non-delete-marked record in index `c` of table `test`.`t`: tuple: TUPLE (info_bits=0, 2 fields): {[1]a(0x61),[4]    (0x80000000)}, record: COMPACT RECORD(info_bits=0, 2 fields): {[1]a(0x61),[4]    (0x80000000)}
            mariadbd: /mariadb/10.6/storage/innobase/btr/btr0cur.cc:1011: dberr_t btr_cur_t::search_leaf(const dtuple_t*, page_cur_mode_t, btr_latch_mode, mtr_t*): Assertion `index()->is_btree() || index()->is_ibuf()' failed.
            2023-09-04  8:48:30 0 [ERROR] InnoDB: Flagged corruption of `c` in table `test`.`t` in purge
            

            There was an attempt to fix this bug in MySQL 8.0.11 but never in MySQL 5.7 or MariaDB Server. I think that the fix may be incorrect, because it is dealing with bytes and not characters. A test case that used non-ASCII data should reveal it. With this port of the suspected incorrect patch and the above test case, 10.6 does not crash for me:

            diff --git a/storage/innobase/row/row0vers.cc b/storage/innobase/row/row0vers.cc
            index a4fc32cc5a8..de1586ac9af 100644
            --- a/storage/innobase/row/row0vers.cc
            +++ b/storage/innobase/row/row0vers.cc
            @@ -734,9 +734,15 @@ row_vers_vc_matches_cluster(
             			    && (!compare[v_col->v_pos])) {
             
             				if (ind_field->prefix_len != 0
            -				    && !dfield_is_null(field2)
            -				    && field2->len > ind_field->prefix_len) {
            -					field2->len = ind_field->prefix_len;
            +				    && !dfield_is_null(field2)) {
            +					if (field2->len > ind_field->prefix_len) {
            +						field2->len = ind_field->prefix_len;
            +					}
            +					if (field2->type.mbmaxlen > 1) {
            +						/* FIXME: extract the correct number of characters
            +						(not bytes)! */
            +						field2->len = field1->len;
            +					}
             				}
             
             				/* The index field mismatch */
            

            I interrupted the test after 143 seconds (23 concurrent instances). The last run without the patch would end in a crash in 26 seconds.

            marko Marko Mäkelä added a comment - There is a rather similar test case in MySQL Bug #86485 : set sql_mode= "" ; drop table if exists t; create table t ( `a` int , `b` int not null , `c` char (2) generated always as ( 'aa' ) virtual, unique key (`b`), unique key (`c`(1)) ) engine=innodb default charset=utf8 row_format=compact;   drop procedure if exists p; delimiter $ create procedure p() begin declare continue handler for sqlexception begin end ; repeat if rand()>0.5 then insert into t(a,b) values (floor(rand()*10),floor(rand()*10)); end if ; if rand()>0.5 then delete from t where a<floor(rand()*10); end if ; if rand()>0.5 then update t set b=concat(a,a); end if ; if rand()>0.5 then replace into t(a,b) values (floor(rand()*10),floor(rand()*10)); end if ; until 1=2 end repeat; end $ delimiter ;   call p(); As far as I can tell, the column a as well as the specification row_format=compact should be irrelevant. The column b could be defined as a PRIMARY KEY . Here is a slightly simplified mtr test: --source include/have_innodb.inc   set sql_mode= "" ; create table t ( b int primary key , c char (2) generated always as ( 'aa' ) virtual, unique key (c(1)) ) engine=innodb default charset=utf8;   delimiter $; create procedure p() begin declare continue handler for sqlexception begin end ; repeat if rand()>0.5 then insert into t(b) values (floor(rand()*10)); end if ; if rand()>0.5 then delete from t; end if ; if rand()>0.5 then update t set b=floor(rand()*10); end if ; if rand()>0.5 then replace into t(b) values (floor(rand()*10)); end if ; until 1=2 end repeat; end $ delimiter ;$   call p(); This will eventually crash due to a debug assertion failure: 10.6 2325f8f33971c29503639126c69e549aa53b5585 mysqltest: At line 23: query 'call p()' failed: <Unknown> (2013): Lost connection to server during query … Version: '10.6.16-MariaDB-debug-log' socket: '/dev/shm/10.6/mysql-test/var/tmp/4/mysqld.1.sock' port: 16060 Source distribution 2023-09-04 8:48:30 0 [ERROR] InnoDB: tried to purge non-delete-marked record in index `c` of table `test`.`t`: tuple: TUPLE (info_bits=0, 2 fields): {[1]a(0x61),[4] (0x80000000)}, record: COMPACT RECORD(info_bits=0, 2 fields): {[1]a(0x61),[4] (0x80000000)} mariadbd: /mariadb/10.6/storage/innobase/btr/btr0cur.cc:1011: dberr_t btr_cur_t::search_leaf(const dtuple_t*, page_cur_mode_t, btr_latch_mode, mtr_t*): Assertion `index()->is_btree() || index()->is_ibuf()' failed. 2023-09-04 8:48:30 0 [ERROR] InnoDB: Flagged corruption of `c` in table `test`.`t` in purge There was an attempt to fix this bug in MySQL 8.0.11 but never in MySQL 5.7 or MariaDB Server. I think that the fix may be incorrect, because it is dealing with bytes and not characters. A test case that used non-ASCII data should reveal it. With this port of the suspected incorrect patch and the above test case, 10.6 does not crash for me: diff --git a/storage/innobase/row/row0vers.cc b/storage/innobase/row/row0vers.cc index a4fc32cc5a8..de1586ac9af 100644 --- a/storage/innobase/row/row0vers.cc +++ b/storage/innobase/row/row0vers.cc @@ -734,9 +734,15 @@ row_vers_vc_matches_cluster( && (!compare[v_col->v_pos])) { if (ind_field->prefix_len != 0 - && !dfield_is_null(field2) - && field2->len > ind_field->prefix_len) { - field2->len = ind_field->prefix_len; + && !dfield_is_null(field2)) { + if (field2->len > ind_field->prefix_len) { + field2->len = ind_field->prefix_len; + } + if (field2->type.mbmaxlen > 1) { + /* FIXME: extract the correct number of characters + (not bytes)! */ + field2->len = field1->len; + } } /* The index field mismatch */ I interrupted the test after 143 seconds (23 concurrent instances). The last run without the patch would end in a crash in 26 seconds.

            I think that the correct approach is to invoke dtype_get_at_most_n_mbchars(), like row_sel_sec_rec_is_for_clust_rec() and the MDEV-24402 row_check_index_match() do.

            marko Marko Mäkelä added a comment - I think that the correct approach is to invoke dtype_get_at_most_n_mbchars() , like row_sel_sec_rec_is_for_clust_rec() and the MDEV-24402 row_check_index_match() do.

            I tested an improved version of the test case in the Description:

            --source include/have_innodb.inc
            CREATE TABLE t (a BINARY(8) NOT NULL DEFAULT '', b CHAR(8) AS (a) VIRTUAL, KEY(b(4))) CHARACTER SET utf8 ENGINE=InnoDB;
            INSERT INTO t (a) VALUES (''),('');
            UPDATE t SET a = 'x';
            UPDATE t SET a = '';
             
            SET @save_freq=@@GLOBAL.innodb_purge_rseg_truncate_frequency;
            SET GLOBAL innodb_purge_rseg_truncate_frequency=1;
            SET GLOBAL innodb_max_purge_lag_wait=0;
            SET GLOBAL innodb_purge_rseg_truncate_frequency=@save_freq;
             
            CHECK TABLE t EXTENDED;
            DROP TABLE t;
            

            This test case passed 5×100 runs when the patch was applied. When not, it would fail reliably thanks to MDEV-24402:

            10.6 0f870914d410e8a34ea99b1e0ea7eb71094606f9

            @@ -8,5 +8,6 @@
             SET GLOBAL innodb_purge_rseg_truncate_frequency=@save_freq;
             CHECK TABLE t EXTENDED;
             Table	Op	Msg_type	Msg_text
            -test.t	check	status	OK
            +test.t	check	Warning	InnoDB: Index b is marked as corrupted
            +test.t	check	error	Corrupt
             DROP TABLE t;
             
            mysqltest: Result length mismatch
            

            marko Marko Mäkelä added a comment - I tested an improved version of the test case in the Description: --source include/have_innodb.inc CREATE TABLE t (a BINARY (8) NOT NULL DEFAULT '' , b CHAR (8) AS (a) VIRTUAL, KEY (b(4))) CHARACTER SET utf8 ENGINE=InnoDB; INSERT INTO t (a) VALUES ( '' ),( '' ); UPDATE t SET a = 'x' ; UPDATE t SET a = '' ;   SET @save_freq=@@ GLOBAL .innodb_purge_rseg_truncate_frequency; SET GLOBAL innodb_purge_rseg_truncate_frequency=1; SET GLOBAL innodb_max_purge_lag_wait=0; SET GLOBAL innodb_purge_rseg_truncate_frequency=@save_freq;   CHECK TABLE t EXTENDED; DROP TABLE t; This test case passed 5×100 runs when the patch was applied. When not, it would fail reliably thanks to MDEV-24402 : 10.6 0f870914d410e8a34ea99b1e0ea7eb71094606f9 @@ -8,5 +8,6 @@ SET GLOBAL innodb_purge_rseg_truncate_frequency=@save_freq; CHECK TABLE t EXTENDED; Table Op Msg_type Msg_text -test.t check status OK +test.t check Warning InnoDB: Index b is marked as corrupted +test.t check error Corrupt DROP TABLE t;   mysqltest: Result length mismatch

            People

              marko Marko Mäkelä
              elenst Elena Stepanova
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.