Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-24653

Assertion `block->page.id().page_no() == index->page' failed in innobase_instant_try

Details

    Description

      Possibly related to MDEV-22988, MDEV-11369. Replay via CLI required. SOURCE and Bash redirect to client do not work. CLI reproduces sporadically. MTR may not work either.

      10.6.0 9118fd360a3da0bba521caf2a35c424968235ac4 (Debug)

      10.6.0>SET GLOBAL innodb_limit_optimistic_insert_debug=2;
      Query OK, 0 rows affected (0.000 sec)
       
      10.6.0>CREATE TABLE t (c INT) ENGINE=InnoDB;
      Query OK, 0 rows affected (0.014 sec)
       
      10.6.0>INSERT INTO t VALUES (1),(2),(3),(4);
      Query OK, 4 rows affected (0.007 sec)
      Records: 4  Duplicates: 0  Warnings: 0
       
      10.6.0>ALTER TABLE t ADD COLUMN (a INT);
      Query OK, 0 rows affected (0.005 sec)
      Records: 0  Duplicates: 0  Warnings: 0
       
      10.6.0>DELETE FROM t;
      Query OK, 4 rows affected (0.001 sec)
       
      10.6.0>ALTER TABLE t ADD COLUMN (b INT);
      ERROR 2013 (HY000): Lost connection to MySQL server during query
      

      SET GLOBAL innodb_limit_optimistic_insert_debug=2;
      CREATE TABLE t (c INT) ENGINE=MyISAM;
      INSERT INTO t VALUES (1),(2),(3),(4);
      ALTER TABLE t ADD COLUMN (a INT);
      DELETE FROM t;
      ALTER TABLE t ADD COLUMN (b INT);
      

      Leads to:

      10.6.0 9118fd360a3da0bba521caf2a35c424968235ac4 (Debug)

      mysqld: /test/10.6_dbg/storage/innobase/handler/handler0alter.cc:5916: bool innobase_instant_try(const Alter_inplace_info*, ha_innobase_inplace_ctx*, const TABLE*, const TABLE*, trx_t*): Assertion `block->page.id().page_no() == index->page' failed.
      

      10.6.0 9118fd360a3da0bba521caf2a35c424968235ac4 (Debug)

      Core was generated by `/test/MD010121-mariadb-10.6.0-linux-x86_64-dbg/bin/mysqld --no-defaults --core-'.
      Program terminated with signal SIGABRT, Aborted.
      #0  __pthread_kill (threadid=<optimized out>, signo=signo@entry=6)
          at ../sysdeps/unix/sysv/linux/pthread_kill.c:56
      [Current thread is 1 (Thread 0x14f168440700 (LWP 2019357))]
      (gdb) bt
      #0  __pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ../sysdeps/unix/sysv/linux/pthread_kill.c:56
      #1  0x0000563bb11b60d7 in my_write_core (sig=sig@entry=6) at /test/10.6_dbg/mysys/stacktrace.c:424
      #2  0x0000563bb094aab1 in handle_fatal_signal (sig=6) at /test/10.6_dbg/sql/signal_handler.cc:330
      #3  <signal handler called>
      #4  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
      #5  0x000014f16affe859 in __GI_abort () at abort.c:79
      #6  0x000014f16affe729 in __assert_fail_base (fmt=0x14f16b194588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x563bb156ef48 "block->page.id().page_no() == index->page", file=0x563bb1566920 "/test/10.6_dbg/storage/innobase/handler/handler0alter.cc", line=5916, function=<optimized out>) at assert.c:92
      #7  0x000014f16b00ff36 in __GI___assert_fail (assertion=assertion@entry=0x563bb156ef48 "block->page.id().page_no() == index->page", file=file@entry=0x563bb1566920 "/test/10.6_dbg/storage/innobase/handler/handler0alter.cc", line=line@entry=5916, function=function@entry=0x563bb156e8b8 "bool innobase_instant_try(const Alter_inplace_info*, ha_innobase_inplace_ctx*, const TABLE*, const TABLE*, trx_t*)") at assert.c:101
      #8  0x0000563bb0da1f33 in innobase_instant_try (ha_alter_info=ha_alter_info@entry=0x14f16843c750, ctx=ctx@entry=0x14f124013f48, altered_table=altered_table@entry=0x14f16843c7f0, table=table@entry=0x14f12406bd28, trx=trx@entry=0x14f1686fc378) at /test/10.6_dbg/storage/innobase/include/buf0types.h:159
      #9  0x0000563bb0da424e in commit_try_norebuild (ha_alter_info=ha_alter_info@entry=0x14f16843c750, ctx=ctx@entry=0x14f124013f48, altered_table=altered_table@entry=0x14f16843c7f0, old_table=0x14f12406bd28, trx=trx@entry=0x14f1686fc378, table_name=<optimized out>) at /test/10.6_dbg/storage/innobase/handler/handler0alter.cc:10212
      #10 0x0000563bb0d8c147 in ha_innobase::commit_inplace_alter_table (this=0x14f1240233f0, altered_table=0x14f16843c7f0, ha_alter_info=0x14f16843c750, commit=true) at /test/10.6_dbg/storage/innobase/handler/handler0alter.cc:10975
      #11 0x0000563bb09592c5 in handler::ha_commit_inplace_alter_table (this=0x14f1240233f0, altered_table=altered_table@entry=0x14f16843c7f0, ha_alter_info=ha_alter_info@entry=0x14f16843c750, commit=commit@entry=true) at /test/10.6_dbg/sql/handler.cc:4848
      #12 0x0000563bb074e43e in mysql_inplace_alter_table (thd=thd@entry=0x14f124000db8, table_list=0x14f1240127b0, table=table@entry=0x14f12406bd28, altered_table=altered_table@entry=0x14f16843c7f0, ha_alter_info=ha_alter_info@entry=0x14f16843c750, target_mdl_request=target_mdl_request@entry=0x14f16843cd40, alter_ctx=0x14f16843d890) at /test/10.6_dbg/sql/sql_table.cc:8107
      #13 0x0000563bb0761b1f in mysql_alter_table (thd=thd@entry=0x14f124000db8, new_db=new_db@entry=0x14f124005828, new_name=new_name@entry=0x14f124005c28, create_info=create_info@entry=0x14f16843e4a0, table_list=<optimized out>, table_list@entry=0x14f1240127b0, alter_info=alter_info@entry=0x14f16843e3d0, order_num=0, order=0x0, ignore=false, if_exists=false) at /test/10.6_dbg/sql/sql_table.cc:10651
      #14 0x0000563bb07e7640 in Sql_cmd_alter_table::execute (this=<optimized out>, thd=0x14f124000db8) at /test/10.6_dbg/sql/structs.h:559
      #15 0x0000563bb068b4c6 in mysql_execute_command (thd=thd@entry=0x14f124000db8) at /test/10.6_dbg/sql/sql_parse.cc:5858
      #16 0x0000563bb0672072 in mysql_parse (thd=thd@entry=0x14f124000db8, rawbuf=<optimized out>, length=<optimized out>, parser_state=parser_state@entry=0x14f16843f3d0) at /test/10.6_dbg/sql/sql_parse.cc:7881
      #17 0x0000563bb06801ec in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x14f124000db8, packet=packet@entry=0x14f124008d39 "ALTER TABLE t ADD COLUMN (b INT)", packet_length=packet_length@entry=32) at /test/10.6_dbg/sql/sql_class.h:1293
      #18 0x0000563bb068352d in do_command (thd=0x14f124000db8) at /test/10.6_dbg/sql/sql_parse.cc:1348
      #19 0x0000563bb07df7fc in do_handle_one_connection (connect=<optimized out>, connect@entry=0x563bb3e9b9c8, put_in_cache=put_in_cache@entry=true) at /test/10.6_dbg/sql/sql_connect.cc:1410
      #20 0x0000563bb07dff03 in handle_one_connection (arg=arg@entry=0x563bb3e9b9c8) at /test/10.6_dbg/sql/sql_connect.cc:1312
      #21 0x0000563bb0c9588f in pfs_spawn_thread (arg=0x563bb3dc38b8) at /test/10.6_dbg/storage/perfschema/pfs.cc:2201
      #22 0x000014f16b50c609 in start_thread (arg=<optimized out>) at pthread_create.c:477
      #23 0x000014f16b0fb293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
      

      Nothing relevant observed in ASAN/UBSAN. No obvious problems on optimized builds.

      Attachments

        Issue Links

          Activity

            Roel Roel Van de Paar added a comment - - edited

            TLDR; patch resolves issue on both 10.6 and 10.5.

            It is otherwise unclear as to why some non-patched mysqld's do not reproduce the issue. It seemed version, hardware or cmake options related, but nothing of those was conclusive, and the cause for this is likely elsewhere.

            Roel Roel Van de Paar added a comment - - edited TLDR; patch resolves issue on both 10.6 and 10.5. It is otherwise unclear as to why some non-patched mysqld's do not reproduce the issue. It seemed version, hardware or cmake options related, but nothing of those was conclusive, and the cause for this is likely elsewhere.

            Roel, I created what I believe is a reproducible test case for this. The randomness that you experienced was probably related to the timing of the purge of history. I added an explicit wait for it.

            --source include/have_innodb.inc
            --source include/have_debug.inc
             
            SET @saved_frequency = @@GLOBAL.innodb_purge_rseg_truncate_frequency;
            SET @saved_limit = @@GLOBAL.innodb_limit_optimistic_insert_debug;
            SET GLOBAL innodb_purge_rseg_truncate_frequency = 1;
            SET GLOBAL innodb_limit_optimistic_insert_debug = 2;
             
            CREATE TABLE t1 (a INT PRIMARY KEY) ENGINE=InnoDB;
            INSERT INTO t1 VALUES (1),(2),(3),(4);
            ALTER TABLE t1 ADD COLUMN b INT;
            DELETE FROM t1;
            --source include/wait_all_purged.inc
            ALTER TABLE t1 ADD COLUMN c INT;
             
            SELECT * FROM t1;
            DROP TABLE t1;
            SET GLOBAL innodb_purge_rseg_truncate_frequency = @saved_frequency;
            SET GLOBAL innodb_limit_optimistic_insert_debug = @saved_limit;
            

            Using this test case, I figured out why we end up with this situation. Because of innodb_limit_optimistic_insert_debug = 2, in the end we will only have the hidden ADD COLUMN metadata record in page 4. The right sibling of the page is deleted in purge:

            10.3 4e503aec7feaeb2297035f565ca92c98d4557577

            #2  0x0000556d400afcb3 in btr_page_set_next (page=0x7f58c7720000 "", 
                page_zip=0x0, next=4294967295, mtr=0x7f589f7fd280)
                at /mariadb/10.3/storage/innobase/include/btr0btr.ic:156
            #3  0x0000556d400bae6c in btr_level_list_remove_func (space=5, page_size=
                  @0x7f589f7fc8e0: {m_physical = 16384, m_logical = 16384, m_is_compressed = 0}, page=0x7f58c7724000 "", index=0x7f58bc0801e0, mtr=0x7f589f7fd280)
                at /mariadb/10.3/storage/innobase/btr/btr0btr.cc:3194
            #4  0x0000556d400bfa0a in btr_discard_page (cursor=0x556d42947578, 
                mtr=0x7f589f7fd280) at /mariadb/10.3/storage/innobase/btr/btr0btr.cc:4215
            #5  0x0000556d400f59f7 in btr_cur_pessimistic_delete (err=0x7f589f7fcfe0, 
                has_reserved_extents=0, cursor=0x556d42947578, flags=0, rollback=false, 
                mtr=0x7f589f7fd280) at /mariadb/10.3/storage/innobase/btr/btr0cur.cc:5772
            #6  0x0000556d3ffb9d0c in row_purge_remove_clust_if_poss_low (
                node=0x556d429474d0, mode=65569)
                at /mariadb/10.3/storage/innobase/row/row0purge.cc:149
            #7  0x0000556d3ffb6771 in row_purge_remove_clust_if_poss (node=0x556d429474d0)
                at /mariadb/10.3/storage/innobase/row/row0purge.cc:198
            #8  0x0000556d3ffb4e4e in row_purge_del_mark (node=0x556d429474d0) at /mariadb/10.3/storage/innobase/row/row0purge.cc:781
            

            At this point, we are missing the opportunity to raise the surviving sibling page up by one level, which would be the opposite of to the btr_page_split_and_insert() operation.

            I think that my original proposed fix is the lowest-risk resolution for this bug. Optimizing the B-tree space management (in the above stack trace) would be rather risky and best done in a development branch, not in GA versions.

            marko Marko Mäkelä added a comment - Roel , I created what I believe is a reproducible test case for this. The randomness that you experienced was probably related to the timing of the purge of history. I added an explicit wait for it. --source include/have_innodb.inc --source include/have_debug.inc   SET @saved_frequency = @@ GLOBAL .innodb_purge_rseg_truncate_frequency; SET @saved_limit = @@ GLOBAL .innodb_limit_optimistic_insert_debug; SET GLOBAL innodb_purge_rseg_truncate_frequency = 1; SET GLOBAL innodb_limit_optimistic_insert_debug = 2;   CREATE TABLE t1 (a INT PRIMARY KEY ) ENGINE=InnoDB; INSERT INTO t1 VALUES (1),(2),(3),(4); ALTER TABLE t1 ADD COLUMN b INT ; DELETE FROM t1; --source include/wait_all_purged.inc ALTER TABLE t1 ADD COLUMN c INT ;   SELECT * FROM t1; DROP TABLE t1; SET GLOBAL innodb_purge_rseg_truncate_frequency = @saved_frequency; SET GLOBAL innodb_limit_optimistic_insert_debug = @saved_limit; Using this test case, I figured out why we end up with this situation. Because of innodb_limit_optimistic_insert_debug = 2 , in the end we will only have the hidden ADD COLUMN metadata record in page 4. The right sibling of the page is deleted in purge: 10.3 4e503aec7feaeb2297035f565ca92c98d4557577 #2 0x0000556d400afcb3 in btr_page_set_next (page=0x7f58c7720000 "", page_zip=0x0, next=4294967295, mtr=0x7f589f7fd280) at /mariadb/10.3/storage/innobase/include/btr0btr.ic:156 #3 0x0000556d400bae6c in btr_level_list_remove_func (space=5, page_size= @0x7f589f7fc8e0: {m_physical = 16384, m_logical = 16384, m_is_compressed = 0}, page=0x7f58c7724000 "", index=0x7f58bc0801e0, mtr=0x7f589f7fd280) at /mariadb/10.3/storage/innobase/btr/btr0btr.cc:3194 #4 0x0000556d400bfa0a in btr_discard_page (cursor=0x556d42947578, mtr=0x7f589f7fd280) at /mariadb/10.3/storage/innobase/btr/btr0btr.cc:4215 #5 0x0000556d400f59f7 in btr_cur_pessimistic_delete (err=0x7f589f7fcfe0, has_reserved_extents=0, cursor=0x556d42947578, flags=0, rollback=false, mtr=0x7f589f7fd280) at /mariadb/10.3/storage/innobase/btr/btr0cur.cc:5772 #6 0x0000556d3ffb9d0c in row_purge_remove_clust_if_poss_low ( node=0x556d429474d0, mode=65569) at /mariadb/10.3/storage/innobase/row/row0purge.cc:149 #7 0x0000556d3ffb6771 in row_purge_remove_clust_if_poss (node=0x556d429474d0) at /mariadb/10.3/storage/innobase/row/row0purge.cc:198 #8 0x0000556d3ffb4e4e in row_purge_del_mark (node=0x556d429474d0) at /mariadb/10.3/storage/innobase/row/row0purge.cc:781 At this point, we are missing the opportunity to raise the surviving sibling page up by one level, which would be the opposite of to the btr_page_split_and_insert() operation. I think that my original proposed fix is the lowest-risk resolution for this bug. Optimizing the B-tree space management (in the above stack trace) would be rather risky and best done in a development branch, not in GA versions.

            Thank you marko.

            Roel Roel Van de Paar added a comment - Thank you marko .

            I adjusted btr_pcur_store_position() as well.

            marko Marko Mäkelä added a comment - I adjusted btr_pcur_store_position() as well.

            This fix is now merged up to the 10.6 branch. I filed MDEV-24673 for the failure to shrink the index tree.

            marko Marko Mäkelä added a comment - This fix is now merged up to the 10.6 branch. I filed MDEV-24673 for the failure to shrink the index tree.

            People

              marko Marko Mäkelä
              Roel Roel Van de Paar
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.