Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-15060

Assertion `0' failed in row_log_table_apply_op after instant ADD when the table is emptied during subsequent ALTER TABLE

Details

    Description

      Note: run the test case with --mem --repeat=N. For me N=100 has always been enough so far, but it can vary on different machines. --mem is important, at least on my machines, apparently the test is not fast/concurrent enough when it's run on disk.

      --source include/have_innodb.inc
       
      --disable_abort_on_error
       
      CREATE TABLE t1 (a INT) ENGINE=InnoDB;
      INSERT INTO t1 VALUES (NULL),(NULL);
      ALTER TABLE t1 ADD COLUMN b INT;
      ALTER TABLE t1 ADD UNIQUE KEY uidx (b);
       
      --connect (con1,localhost,root,,test)
      --send
      	ALTER TABLE t1 DROP COLUMN b;
       
      --connection default
      DELETE FROM t1;
      INSERT INTO t1 (b) VALUES (1),(1);
       
      # Cleanup
      --connection con1
      --reap
      --disconnect con1
      --connection default
      DROP TABLE t1;
      

      10.3 cb16bc95ff

      mysqld: /data/src/10.3/storage/innobase/row/row0log.cc:2435: const mrec_t* row_log_table_apply_op(que_thr_t*, ulint, row_merge_dup_t*, dberr_t*, mem_heap_t*, mem_heap_t*, const mrec_t*, const mrec_t*, ulint*): Assertion `0' failed.
      180430 18:12:25 [ERROR] mysqld got signal 6 ;
       
      #7  0x00007f48b64a0ee2 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
      #8  0x000055bf625b97d3 in row_log_table_apply_op (thr=0x7f48600771c8, new_trx_id_col=1, dup=0x7f48ab74fd80, error=0x7f48ab74fb2c, offsets_heap=0x7f486003fe60, heap=0x7f486003bdd0, mrec=0x7f48ab60c021 "", mrec_end=0x7f48ab60c06c "", offsets=0x7f48600358c0) at /data/src/10.3/storage/innobase/row/row0log.cc:2435
      #9  0x000055bf625bbd36 in row_log_table_apply_ops (thr=0x7f48600771c8, dup=0x7f48ab74fd80, stage=0x7f4860038100) at /data/src/10.3/storage/innobase/row/row0log.cc:3046
      #10 0x000055bf625bc2ee in row_log_table_apply (thr=0x7f48600771c8, old_table=0x7f485c03c1e8, table=0x7f48600d48b0, stage=0x7f4860038100) at /data/src/10.3/storage/innobase/row/row0log.cc:3146
      #11 0x000055bf62495fdf in commit_try_rebuild (ha_alter_info=0x7f48ab7513f0, ctx=0x7f4860014e88, altered_table=0x7f48600d48b0, old_table=0x7f485c085b70, trx=0x7f48b02dcac8, table_name=0x7f485c08a185 "t1") at /data/src/10.3/storage/innobase/handler/handler0alter.cc:8439
      #12 0x000055bf62490618 in ha_innobase::commit_inplace_alter_table (this=0x7f485c087368, altered_table=0x7f48600d48b0, ha_alter_info=0x7f48ab7513f0, commit=true) at /data/src/10.3/storage/innobase/handler/handler0alter.cc:9314
      #13 0x000055bf62138de6 in handler::ha_commit_inplace_alter_table (this=0x7f485c087368, altered_table=0x7f48600d48b0, ha_alter_info=0x7f48ab7513f0, commit=true) at /data/src/10.3/sql/handler.cc:4406
      #14 0x000055bf61f23d58 in mysql_inplace_alter_table (thd=0x7f4860000b00, table_list=0x7f4860013bd8, table=0x7f485c085b70, altered_table=0x7f48600d48b0, ha_alter_info=0x7f48ab7513f0, inplace_supported=HA_ALTER_INPLACE_NO_LOCK_AFTER_PREPARE, target_mdl_request=0x7f48ab751460, alter_ctx=0x7f48ab752060) at /data/src/10.3/sql/sql_table.cc:7535
      #15 0x000055bf61f29c3f in mysql_alter_table (thd=0x7f4860000b00, new_db=0x7f4860005168, new_name=0x7f4860005518, create_info=0x7f48ab752c50, table_list=0x7f4860013bd8, alter_info=0x7f48ab752b90, order_num=0, order=0x0, ignore=false) at /data/src/10.3/sql/sql_table.cc:9670
      #16 0x000055bf61faf0fc in Sql_cmd_alter_table::execute (this=0x7f4860014258, thd=0x7f4860000b00) at /data/src/10.3/sql/sql_alter.cc:334
      #17 0x000055bf61e55f79 in mysql_execute_command (thd=0x7f4860000b00) at /data/src/10.3/sql/sql_parse.cc:6282
      #18 0x000055bf61e5aa46 in mysql_parse (thd=0x7f4860000b00, rawbuf=0x7f4860013af0 "ALTER TABLE t1 DROP COLUMN b", length=28, parser_state=0x7f48ab7545d0, is_com_multi=false, is_next_command=false) at /data/src/10.3/sql/sql_parse.cc:8001
      #19 0x000055bf61e48229 in dispatch_command (command=COM_QUERY, thd=0x7f4860000b00, packet=0x7f486000b271 "ALTER TABLE t1 DROP COLUMN b", packet_length=28, is_com_multi=false, is_next_command=false) at /data/src/10.3/sql/sql_parse.cc:1846
      #20 0x000055bf61e46c68 in do_command (thd=0x7f4860000b00) at /data/src/10.3/sql/sql_parse.cc:1391
      #21 0x000055bf61fa9aef in do_handle_one_connection (connect=0x55bf64f3b540) at /data/src/10.3/sql/sql_connect.cc:1402
      #22 0x000055bf61fa987c in handle_one_connection (arg=0x55bf64f3b540) at /data/src/10.3/sql/sql_connect.cc:1308
      #23 0x000055bf6242da8d in pfs_spawn_thread (arg=0x55bf64f43d70) at /data/src/10.3/storage/perfschema/pfs.cc:1862
      #24 0x00007f48b8177494 in start_thread (arg=0x7f48ab755700) at pthread_create.c:333
      #25 0x00007f48b655d93f in clone () from /lib/x86_64-linux-gnu/libc.so.6
      

      Not reproducible on 10.2.

      Initially this problem was interchangeable with MDEV-14906 (in the sense that they both were happening randomly on the same test cases), but MDEV-14906 has been presumably fixed now, and this one still happens.

      Attachments

        Issue Links

          Activity

            I would need a core dump so that I can investigate the contents of the online_log as well as the table definitions before and after table-rebuilding online alter.

            marko Marko Mäkelä added a comment - I would need a core dump so that I can investigate the contents of the online_log as well as the table definitions before and after table-rebuilding online alter.

            I've updated the description, hopefully with the test case you will get a coredump of your liking. If not, I can produce one for you, but it will be a non-ASAN coredump.

            elenst Elena Stepanova added a comment - I've updated the description, hopefully with the test case you will get a coredump of your liking. If not, I can produce one for you, but it will be a non-ASAN coredump.

            I repeated this. In my core dump, the payload of log.tail.block is as follows (one record per line):

            0x41, 3, 4, 1, 0, 0, 0, 0, 0, 2, 2, 0, 0, 0, 0, 0, 0x34, 0x8d, 0, 0, 1,
            0x43, 3, 0xd1, 0x80, 0, 0, 1,
            0x41, 3, 4, 1, 0, 0, 0, 0, 0, 2, 3, 0, 0, 0, 0, 0, 0x34, 0x8d, 0, 0, 1,
            0x43, 3, 0xdf, 0x80, 0, 0, 1,
            0x43, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 3, 0, 0, 0, 0, 0, 0x34, 0x8d, 0, 0, 1,
            0x43, 3, 0xdf,
            0x43, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 0, 0, 0, 0, 0, 0x34, 0x8d, 0, 0, 1,
            0x43, 3, 0xd1

            The assertion fails because the log record parser does not find a valid start byte (ROW_T_INSERT=0x41 or ROW_T_UPDATE=0x42 or ROW_T_DELETE=0x43) at offset 32. That offset is 4 bytes after the start of the third record.

            Notably, other records that start with 0x43,3 are 4 bytes shorter than the first one. I must find out why the 4 extra bytes are sometimes being written.

            marko Marko Mäkelä added a comment - I repeated this. In my core dump, the payload of log.tail.block is as follows (one record per line): 0x41, 3, 4, 1, 0, 0, 0, 0, 0, 2, 2, 0, 0, 0, 0, 0, 0x34, 0x8d, 0, 0, 1, 0x43, 3, 0xd1, 0x80, 0, 0, 1, 0x41, 3, 4, 1, 0, 0, 0, 0, 0, 2, 3, 0, 0, 0, 0, 0, 0x34, 0x8d, 0, 0, 1, 0x43, 3, 0xdf, 0x80, 0, 0, 1, 0x43, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 3, 0, 0, 0, 0, 0, 0x34, 0x8d, 0, 0, 1, 0x43, 3, 0xdf, 0x43, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 0, 0, 0, 0, 0, 0x34, 0x8d, 0, 0, 1, 0x43, 3, 0xd1 The assertion fails because the log record parser does not find a valid start byte ( ROW_T_INSERT=0x41 or ROW_T_UPDATE=0x42 or ROW_T_DELETE=0x43 ) at offset 32. That offset is 4 bytes after the start of the third record. Notably, other records that start with 0x43,3 are 4 bytes shorter than the first one. I must find out why the 4 extra bytes are sometimes being written.

            Here is a little simpler test:

            --source include/have_innodb.inc
            --disable_abort_on_error
            CREATE TABLE t1 (a INT) ENGINE=InnoDB;
            INSERT INTO t1 VALUES (NULL);
            ALTER TABLE t1 ADD COLUMN b INT NOT NULL;
            --connect (con1,localhost,root,,test)
            send ALTER TABLE t1 FORCE;
            --connection default
            DELETE FROM t1;
            BEGIN;
            INSERT INTO t1 SET b=1;
            ROLLBACK;
            --disconnect con1
            DROP TABLE t1;
            

            I cannot repeat if I declare a INT NOT NULL (which would also be implied by PRIMARY KEY) or if I insert non-NULL values into a.
            The parsing always fails at offset 33, after exactly one ROW_T_INSERT=0x41 record, and 4 bytes after the apparent start of the second record which is ROW_T_DELETE=0x43.
            It looks like the DELETE FROM t1 would be completed before the ALTER TABLE t1 FORCE starts copying the table. Then, we should have exactly one ROW_T_INSERT and ROW_T_DELETE for the INSERT and ROLLBACK.
            Here is a sample of the log with the above test:

            0x41, 3, 4, 1, 0, 0, 0, 0, 0, 2, 0x27, 0, 0, 0, 0, 2, 0x43, 0xf3, 0, 0, 1, 0xb3, 1, 0x10, 0x80, 0, 0, 1,
            0x43, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0x27, 0, 0, 0, 0, 2, 0x43, 0xf3, 0, 0, 1, 0xb3, 1, 0x10

            Decoded:

            ROW_T_INSERT((4,1,0),DB_ROW_ID=0x227,DB_TRX_ID=0x243,DB_ROLL_PTR=…,a=NULL,b=1)
            ROW_T_DELETE((0,ext_size=0),DB_ROW_ID=0x227,DB_TRX_ID=0x243,ROLL_PTR=…)

            The problem appears to be that the parser expects the ROW_T_INSERT to be longer. It looks like this is because purge emptied the table, and the table no longer is in instant-added format. I wrote a DEBUG_SYNC version of the test that always crashes:

            --source include/have_innodb.inc
            --source include/have_debug.inc
            --source include/have_debug_sync.inc
             
            SET @saved_frequency = @@GLOBAL.innodb_purge_rseg_truncate_frequency;
            SET GLOBAL innodb_purge_rseg_truncate_frequency = 1;
             
            CREATE TABLE t1 (a INT) ENGINE=InnoDB;
            INSERT INTO t1 VALUES (NULL);
            ALTER TABLE t1 ADD COLUMN b INT NOT NULL;
            connect stop_purge,localhost,root;
            START TRANSACTION WITH CONSISTENT SNAPSHOT;
            connect ddl,localhost,root,,test;
            DELETE FROM t1;
            SET DEBUG_SYNC='row_log_table_apply1_before SIGNAL copied WAIT_FOR logged';
            send ALTER TABLE t1 FORCE;
            connection default;
            SET DEBUG_SYNC='now WAIT_FOR copied';
             
            BEGIN;
            INSERT INTO t1 SET b=1;
            ROLLBACK;
            disconnect stop_purge;
             
            # Wait for purge to empty the table.
            # (This is based on wait_all_purged.inc, but there are 2 transactions
            # from the pending ALTER TABLE t1 FORCE.)
             
            let $wait_counter= 300;
            while ($wait_counter)
            {
                --replace_regex /.*History list length ([0-9]+).*/\1/
                let $remaining= `SHOW ENGINE INNODB STATUS`;
                if ($remaining == 'InnoDB		2')
                {
                    let $wait_counter= 0;
                }
                if ($wait_counter)
                {
                    real_sleep 0.1;
                    dec $wait_counter;
                }
            }
            echo $remaining transactions not purged;
            SET GLOBAL innodb_purge_rseg_truncate_frequency = @saved_frequency;
             
            SET DEBUG_SYNC='now SIGNAL logged';
            disconnect ddl;
            DROP TABLE t1;
            SET DEBUG_SYNC='RESET';
            

            It is important that we wait for the purge after the INSERT and ROLLBACK have been logged. In that way, the log for the INSERT and the ROLLBACK will be written as if the table is in the instant-add format. When the log is applied, the source table would be in the plain format, and the log would be parsed incorrectly.

            marko Marko Mäkelä added a comment - Here is a little simpler test: --source include/have_innodb.inc --disable_abort_on_error CREATE TABLE t1 (a INT ) ENGINE=InnoDB; INSERT INTO t1 VALUES ( NULL ); ALTER TABLE t1 ADD COLUMN b INT NOT NULL ; --connect (con1,localhost,root,,test) send ALTER TABLE t1 FORCE ; --connection default DELETE FROM t1; BEGIN ; INSERT INTO t1 SET b=1; ROLLBACK ; --disconnect con1 DROP TABLE t1; I cannot repeat if I declare a INT NOT NULL (which would also be implied by PRIMARY KEY ) or if I insert non- NULL values into a . The parsing always fails at offset 33, after exactly one ROW_T_INSERT=0x41 record, and 4 bytes after the apparent start of the second record which is ROW_T_DELETE=0x43 . It looks like the DELETE FROM t1 would be completed before the ALTER TABLE t1 FORCE starts copying the table. Then, we should have exactly one ROW_T_INSERT and ROW_T_DELETE for the INSERT and ROLLBACK . Here is a sample of the log with the above test: 0x41, 3, 4, 1, 0, 0, 0, 0, 0, 2, 0x27, 0, 0, 0, 0, 2, 0x43, 0xf3, 0, 0, 1, 0xb3, 1, 0x10, 0x80, 0, 0, 1, 0x43, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0x27, 0, 0, 0, 0, 2, 0x43, 0xf3, 0, 0, 1, 0xb3, 1, 0x10 Decoded: ROW_T_INSERT((4,1,0),DB_ROW_ID=0x227,DB_TRX_ID=0x243,DB_ROLL_PTR=…,a=NULL,b=1) ROW_T_DELETE((0,ext_size=0),DB_ROW_ID=0x227,DB_TRX_ID=0x243,ROLL_PTR=…) The problem appears to be that the parser expects the ROW_T_INSERT  to be longer. It looks like this is because purge emptied the table, and the table no longer is in instant-added format. I wrote a DEBUG_SYNC version of the test that always crashes: --source include/have_innodb.inc --source include/have_debug.inc --source include/have_debug_sync.inc   SET @saved_frequency = @@ GLOBAL .innodb_purge_rseg_truncate_frequency; SET GLOBAL innodb_purge_rseg_truncate_frequency = 1;   CREATE TABLE t1 (a INT ) ENGINE=InnoDB; INSERT INTO t1 VALUES ( NULL ); ALTER TABLE t1 ADD COLUMN b INT NOT NULL ; connect stop_purge,localhost,root; START TRANSACTION WITH CONSISTENT SNAPSHOT; connect ddl,localhost,root,,test; DELETE FROM t1; SET DEBUG_SYNC= 'row_log_table_apply1_before SIGNAL copied WAIT_FOR logged' ; send ALTER TABLE t1 FORCE ; connection default ; SET DEBUG_SYNC= 'now WAIT_FOR copied' ;   BEGIN ; INSERT INTO t1 SET b=1; ROLLBACK ; disconnect stop_purge;   # Wait for purge to empty the table . # (This is based on wait_all_purged.inc, but there are 2 transactions # from the pending ALTER TABLE t1 FORCE .)   let $wait_counter= 300; while ($wait_counter) { --replace_regex /.*History list length ([0-9]+).*/\1/ let $remaining= `SHOW ENGINE INNODB STATUS`; if ($remaining == 'InnoDB 2' ) { let $wait_counter= 0; } if ($wait_counter) { real_sleep 0.1; dec $wait_counter; } } echo $remaining transactions not purged; SET GLOBAL innodb_purge_rseg_truncate_frequency = @saved_frequency;   SET DEBUG_SYNC= 'now SIGNAL logged' ; disconnect ddl; DROP TABLE t1; SET DEBUG_SYNC= 'RESET' ; It is important that we wait for the purge after the INSERT and ROLLBACK  have been logged. In that way, the log for the INSERT and the ROLLBACK  will be written as if the table is in the instant-add format. When the log is applied, the source table would be in the plain format, and the log would be parsed incorrectly.

            People

              marko Marko Mäkelä
              elenst Elena Stepanova
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.