Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-24653

Assertion `block->page.id().page_no() == index->page' failed in innobase_instant_try

Details

    Description

      Possibly related to MDEV-22988, MDEV-11369. Replay via CLI required. SOURCE and Bash redirect to client do not work. CLI reproduces sporadically. MTR may not work either.

      10.6.0 9118fd360a3da0bba521caf2a35c424968235ac4 (Debug)

      10.6.0>SET GLOBAL innodb_limit_optimistic_insert_debug=2;
      Query OK, 0 rows affected (0.000 sec)
       
      10.6.0>CREATE TABLE t (c INT) ENGINE=InnoDB;
      Query OK, 0 rows affected (0.014 sec)
       
      10.6.0>INSERT INTO t VALUES (1),(2),(3),(4);
      Query OK, 4 rows affected (0.007 sec)
      Records: 4  Duplicates: 0  Warnings: 0
       
      10.6.0>ALTER TABLE t ADD COLUMN (a INT);
      Query OK, 0 rows affected (0.005 sec)
      Records: 0  Duplicates: 0  Warnings: 0
       
      10.6.0>DELETE FROM t;
      Query OK, 4 rows affected (0.001 sec)
       
      10.6.0>ALTER TABLE t ADD COLUMN (b INT);
      ERROR 2013 (HY000): Lost connection to MySQL server during query
      

      SET GLOBAL innodb_limit_optimistic_insert_debug=2;
      CREATE TABLE t (c INT) ENGINE=MyISAM;
      INSERT INTO t VALUES (1),(2),(3),(4);
      ALTER TABLE t ADD COLUMN (a INT);
      DELETE FROM t;
      ALTER TABLE t ADD COLUMN (b INT);
      

      Leads to:

      10.6.0 9118fd360a3da0bba521caf2a35c424968235ac4 (Debug)

      mysqld: /test/10.6_dbg/storage/innobase/handler/handler0alter.cc:5916: bool innobase_instant_try(const Alter_inplace_info*, ha_innobase_inplace_ctx*, const TABLE*, const TABLE*, trx_t*): Assertion `block->page.id().page_no() == index->page' failed.
      

      10.6.0 9118fd360a3da0bba521caf2a35c424968235ac4 (Debug)

      Core was generated by `/test/MD010121-mariadb-10.6.0-linux-x86_64-dbg/bin/mysqld --no-defaults --core-'.
      Program terminated with signal SIGABRT, Aborted.
      #0  __pthread_kill (threadid=<optimized out>, signo=signo@entry=6)
          at ../sysdeps/unix/sysv/linux/pthread_kill.c:56
      [Current thread is 1 (Thread 0x14f168440700 (LWP 2019357))]
      (gdb) bt
      #0  __pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ../sysdeps/unix/sysv/linux/pthread_kill.c:56
      #1  0x0000563bb11b60d7 in my_write_core (sig=sig@entry=6) at /test/10.6_dbg/mysys/stacktrace.c:424
      #2  0x0000563bb094aab1 in handle_fatal_signal (sig=6) at /test/10.6_dbg/sql/signal_handler.cc:330
      #3  <signal handler called>
      #4  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
      #5  0x000014f16affe859 in __GI_abort () at abort.c:79
      #6  0x000014f16affe729 in __assert_fail_base (fmt=0x14f16b194588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x563bb156ef48 "block->page.id().page_no() == index->page", file=0x563bb1566920 "/test/10.6_dbg/storage/innobase/handler/handler0alter.cc", line=5916, function=<optimized out>) at assert.c:92
      #7  0x000014f16b00ff36 in __GI___assert_fail (assertion=assertion@entry=0x563bb156ef48 "block->page.id().page_no() == index->page", file=file@entry=0x563bb1566920 "/test/10.6_dbg/storage/innobase/handler/handler0alter.cc", line=line@entry=5916, function=function@entry=0x563bb156e8b8 "bool innobase_instant_try(const Alter_inplace_info*, ha_innobase_inplace_ctx*, const TABLE*, const TABLE*, trx_t*)") at assert.c:101
      #8  0x0000563bb0da1f33 in innobase_instant_try (ha_alter_info=ha_alter_info@entry=0x14f16843c750, ctx=ctx@entry=0x14f124013f48, altered_table=altered_table@entry=0x14f16843c7f0, table=table@entry=0x14f12406bd28, trx=trx@entry=0x14f1686fc378) at /test/10.6_dbg/storage/innobase/include/buf0types.h:159
      #9  0x0000563bb0da424e in commit_try_norebuild (ha_alter_info=ha_alter_info@entry=0x14f16843c750, ctx=ctx@entry=0x14f124013f48, altered_table=altered_table@entry=0x14f16843c7f0, old_table=0x14f12406bd28, trx=trx@entry=0x14f1686fc378, table_name=<optimized out>) at /test/10.6_dbg/storage/innobase/handler/handler0alter.cc:10212
      #10 0x0000563bb0d8c147 in ha_innobase::commit_inplace_alter_table (this=0x14f1240233f0, altered_table=0x14f16843c7f0, ha_alter_info=0x14f16843c750, commit=true) at /test/10.6_dbg/storage/innobase/handler/handler0alter.cc:10975
      #11 0x0000563bb09592c5 in handler::ha_commit_inplace_alter_table (this=0x14f1240233f0, altered_table=altered_table@entry=0x14f16843c7f0, ha_alter_info=ha_alter_info@entry=0x14f16843c750, commit=commit@entry=true) at /test/10.6_dbg/sql/handler.cc:4848
      #12 0x0000563bb074e43e in mysql_inplace_alter_table (thd=thd@entry=0x14f124000db8, table_list=0x14f1240127b0, table=table@entry=0x14f12406bd28, altered_table=altered_table@entry=0x14f16843c7f0, ha_alter_info=ha_alter_info@entry=0x14f16843c750, target_mdl_request=target_mdl_request@entry=0x14f16843cd40, alter_ctx=0x14f16843d890) at /test/10.6_dbg/sql/sql_table.cc:8107
      #13 0x0000563bb0761b1f in mysql_alter_table (thd=thd@entry=0x14f124000db8, new_db=new_db@entry=0x14f124005828, new_name=new_name@entry=0x14f124005c28, create_info=create_info@entry=0x14f16843e4a0, table_list=<optimized out>, table_list@entry=0x14f1240127b0, alter_info=alter_info@entry=0x14f16843e3d0, order_num=0, order=0x0, ignore=false, if_exists=false) at /test/10.6_dbg/sql/sql_table.cc:10651
      #14 0x0000563bb07e7640 in Sql_cmd_alter_table::execute (this=<optimized out>, thd=0x14f124000db8) at /test/10.6_dbg/sql/structs.h:559
      #15 0x0000563bb068b4c6 in mysql_execute_command (thd=thd@entry=0x14f124000db8) at /test/10.6_dbg/sql/sql_parse.cc:5858
      #16 0x0000563bb0672072 in mysql_parse (thd=thd@entry=0x14f124000db8, rawbuf=<optimized out>, length=<optimized out>, parser_state=parser_state@entry=0x14f16843f3d0) at /test/10.6_dbg/sql/sql_parse.cc:7881
      #17 0x0000563bb06801ec in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x14f124000db8, packet=packet@entry=0x14f124008d39 "ALTER TABLE t ADD COLUMN (b INT)", packet_length=packet_length@entry=32) at /test/10.6_dbg/sql/sql_class.h:1293
      #18 0x0000563bb068352d in do_command (thd=0x14f124000db8) at /test/10.6_dbg/sql/sql_parse.cc:1348
      #19 0x0000563bb07df7fc in do_handle_one_connection (connect=<optimized out>, connect@entry=0x563bb3e9b9c8, put_in_cache=put_in_cache@entry=true) at /test/10.6_dbg/sql/sql_connect.cc:1410
      #20 0x0000563bb07dff03 in handle_one_connection (arg=arg@entry=0x563bb3e9b9c8) at /test/10.6_dbg/sql/sql_connect.cc:1312
      #21 0x0000563bb0c9588f in pfs_spawn_thread (arg=0x563bb3dc38b8) at /test/10.6_dbg/storage/perfschema/pfs.cc:2201
      #22 0x000014f16b50c609 in start_thread (arg=<optimized out>) at pthread_create.c:477
      #23 0x000014f16b0fb293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
      

      Nothing relevant observed in ASAN/UBSAN. No obvious problems on optimized builds.

      Attachments

        Issue Links

          Activity

            Roel Roel Van de Paar added a comment - - edited

            Alternative testcase

            SET GLOBAL innodb_limit_optimistic_insert_debug = 2;
            CREATE TABLE t1 (c1 VARCHAR(10));
            INSERT INTO t1 VALUES (41), (42), (43), (44);
            ALTER TABLE t1 ADD COLUMN (i INT);
            DELETE FROM t1;
            ALTER TABLE t1 ADD COLUMN (b INT);
            

            Roel Roel Van de Paar added a comment - - edited Alternative testcase SET GLOBAL innodb_limit_optimistic_insert_debug = 2; CREATE TABLE t1 (c1 VARCHAR(10)); INSERT INTO t1 VALUES (41), (42), (43), (44); ALTER TABLE t1 ADD COLUMN (i INT); DELETE FROM t1; ALTER TABLE t1 ADD COLUMN (b INT);
            Roel Roel Van de Paar added a comment - - edited

            rr trace available on rr box, based on alternative testcase

            cd /test/MD220121-mariadb-10.6.0-linux-x86_64-dbg/rr/mysqld-0
            rr replay "${PWD}"
            

            (rr) continue;
            Thread 2 received signal SIGABRT, Aborted.
            [Switching to Thread 3879826.3879889]
            __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
            50	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
            (rr) bt
            #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
            #1  0x00005e2304652859 in __GI_abort () at abort.c:79
            #2  0x00005e2304652729 in __assert_fail_base (fmt=0x5e23047e8588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", 
                assertion=0x55f34a45b210 "block->page.id().page_no() == index->page", 
                file=0x55f34a452bf0 "/test/10.6b_dbg/storage/innobase/handler/handler0alter.cc", line=5916, 
                function=<optimized out>) at assert.c:92
            #3  0x00005e2304663f36 in __GI___assert_fail (
                assertion=assertion@entry=0x55f34a45b210 "block->page.id().page_no() == index->page", 
                file=file@entry=0x55f34a452bf0 "/test/10.6b_dbg/storage/innobase/handler/handler0alter.cc", line=line@entry=5916, 
                function=function@entry=0x55f34a45ab80 "bool innobase_instant_try(const Alter_inplace_info*, ha_innobase_inplace_ctx*, const TABLE*, const TABLE*, trx_t*)") at assert.c:101
            #4  0x000055f349c82719 in innobase_instant_try (ha_alter_info=ha_alter_info@entry=0x14c53a440750, 
                ctx=ctx@entry=0x590534013f58, altered_table=altered_table@entry=0x14c53a4407f0, table=table@entry=0x59053406aa18, 
                trx=trx@entry=0x14c5441da378) at /test/10.6b_dbg/storage/innobase/include/buf0types.h:159
            #5  0x000055f349c84a7e in commit_try_norebuild (ha_alter_info=ha_alter_info@entry=0x14c53a440750, 
                ctx=ctx@entry=0x590534013f58, altered_table=altered_table@entry=0x14c53a4407f0, old_table=0x59053406aa18, 
                trx=trx@entry=0x14c5441da378, table_name=<optimized out>)
                at /test/10.6b_dbg/storage/innobase/handler/handler0alter.cc:10212
            #6  0x000055f349c6c8c4 in ha_innobase::commit_inplace_alter_table (this=0x59053406d580, altered_table=0x14c53a4407f0, 
                ha_alter_info=0x14c53a440750, commit=true) at /test/10.6b_dbg/storage/innobase/handler/handler0alter.cc:10975
            #7  0x000055f349838abd in handler::ha_commit_inplace_alter_table (this=0x59053406d580, 
                altered_table=altered_table@entry=0x14c53a4407f0, ha_alter_info=ha_alter_info@entry=0x14c53a440750, 
                commit=commit@entry=true) at /test/10.6b_dbg/sql/handler.cc:4848
            #8  0x000055f34962f3cc in mysql_inplace_alter_table (thd=thd@entry=0x590534000db8, table_list=0x5905340127b0, 
                table=table@entry=0x59053406aa18, altered_table=altered_table@entry=0x14c53a4407f0, 
                ha_alter_info=ha_alter_info@entry=0x14c53a440750, target_mdl_request=target_mdl_request@entry=0x14c53a440d40, 
                alter_ctx=0x14c53a441890) at /test/10.6b_dbg/sql/sql_table.cc:8107
            #9  0x000055f3496429ff in mysql_alter_table (thd=thd@entry=0x590534000db8, new_db=new_db@entry=0x590534005828, new_name=new_name@entry=0x590534005c28, create_info=create_info@entry=0x14c53a4424a0, table_list=<optimized out>, table_list@entry=0x5905340127b0, alter_info=alter_info@entry=0x14c53a4423d0, order_num=0, order=0x0, ignore=false, if_exists=false) at /test/10.6b_dbg/sql/sql_table.cc:10651
            #10 0x000055f3496c83bf in Sql_cmd_alter_table::execute (this=<optimized out>, thd=0x590534000db8) at /test/10.6b_dbg/sql/structs.h:559
            #11 0x000055f34956e1ec in mysql_execute_command (thd=thd@entry=0x590534000db8) at /test/10.6b_dbg/sql/sql_parse.cc:5858
            #12 0x000055f349554e6e in mysql_parse (thd=thd@entry=0x590534000db8, rawbuf=<optimized out>, length=<optimized out>, parser_state=parser_state@entry=0x14c53a4433d0) at /test/10.6b_dbg/sql/sql_parse.cc:7881
            #13 0x000055f349562f0d in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x590534000db8, packet=packet@entry=0x590534008d39 "ALTER TABLE t1 ADD COLUMN (b INT)", packet_length=packet_length@entry=33) at /test/10.6b_dbg/sql/sql_class.h:1293
            #14 0x000055f349566236 in do_command (thd=0x590534000db8) at /test/10.6b_dbg/sql/sql_parse.cc:1348
            #15 0x000055f3496c05eb in do_handle_one_connection (connect=<optimized out>, connect@entry=0x55f34d954578, put_in_cache=put_in_cache@entry=true) at /test/10.6b_dbg/sql/sql_connect.cc:1410
            #16 0x000055f3496c0cef in handle_one_connection (arg=arg@entry=0x55f34d954578) at /test/10.6b_dbg/sql/sql_connect.cc:1312
            #17 0x000055f349b74f27 in pfs_spawn_thread (arg=0x55f34d87c5a8) at /test/10.6b_dbg/storage/perfschema/pfs.cc:2201
            #18 0x000052580d6b1609 in start_thread (arg=<optimized out>) at pthread_create.c:477
            #19 0x00005e230474f293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
            

            Roel Roel Van de Paar added a comment - - edited rr trace available on rr box, based on alternative testcase cd /test/MD220121-mariadb-10.6.0-linux-x86_64-dbg/rr/mysqld-0 rr replay "${PWD}" (rr) continue; Thread 2 received signal SIGABRT, Aborted. [Switching to Thread 3879826.3879889] __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 50 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. (rr) bt #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 0x00005e2304652859 in __GI_abort () at abort.c:79 #2 0x00005e2304652729 in __assert_fail_base (fmt=0x5e23047e8588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x55f34a45b210 "block->page.id().page_no() == index->page", file=0x55f34a452bf0 "/test/10.6b_dbg/storage/innobase/handler/handler0alter.cc", line=5916, function=<optimized out>) at assert.c:92 #3 0x00005e2304663f36 in __GI___assert_fail ( assertion=assertion@entry=0x55f34a45b210 "block->page.id().page_no() == index->page", file=file@entry=0x55f34a452bf0 "/test/10.6b_dbg/storage/innobase/handler/handler0alter.cc", line=line@entry=5916, function=function@entry=0x55f34a45ab80 "bool innobase_instant_try(const Alter_inplace_info*, ha_innobase_inplace_ctx*, const TABLE*, const TABLE*, trx_t*)") at assert.c:101 #4 0x000055f349c82719 in innobase_instant_try (ha_alter_info=ha_alter_info@entry=0x14c53a440750, ctx=ctx@entry=0x590534013f58, altered_table=altered_table@entry=0x14c53a4407f0, table=table@entry=0x59053406aa18, trx=trx@entry=0x14c5441da378) at /test/10.6b_dbg/storage/innobase/include/buf0types.h:159 #5 0x000055f349c84a7e in commit_try_norebuild (ha_alter_info=ha_alter_info@entry=0x14c53a440750, ctx=ctx@entry=0x590534013f58, altered_table=altered_table@entry=0x14c53a4407f0, old_table=0x59053406aa18, trx=trx@entry=0x14c5441da378, table_name=<optimized out>) at /test/10.6b_dbg/storage/innobase/handler/handler0alter.cc:10212 #6 0x000055f349c6c8c4 in ha_innobase::commit_inplace_alter_table (this=0x59053406d580, altered_table=0x14c53a4407f0, ha_alter_info=0x14c53a440750, commit=true) at /test/10.6b_dbg/storage/innobase/handler/handler0alter.cc:10975 #7 0x000055f349838abd in handler::ha_commit_inplace_alter_table (this=0x59053406d580, altered_table=altered_table@entry=0x14c53a4407f0, ha_alter_info=ha_alter_info@entry=0x14c53a440750, commit=commit@entry=true) at /test/10.6b_dbg/sql/handler.cc:4848 #8 0x000055f34962f3cc in mysql_inplace_alter_table (thd=thd@entry=0x590534000db8, table_list=0x5905340127b0, table=table@entry=0x59053406aa18, altered_table=altered_table@entry=0x14c53a4407f0, ha_alter_info=ha_alter_info@entry=0x14c53a440750, target_mdl_request=target_mdl_request@entry=0x14c53a440d40, alter_ctx=0x14c53a441890) at /test/10.6b_dbg/sql/sql_table.cc:8107 #9 0x000055f3496429ff in mysql_alter_table (thd=thd@entry=0x590534000db8, new_db=new_db@entry=0x590534005828, new_name=new_name@entry=0x590534005c28, create_info=create_info@entry=0x14c53a4424a0, table_list=<optimized out>, table_list@entry=0x5905340127b0, alter_info=alter_info@entry=0x14c53a4423d0, order_num=0, order=0x0, ignore=false, if_exists=false) at /test/10.6b_dbg/sql/sql_table.cc:10651 #10 0x000055f3496c83bf in Sql_cmd_alter_table::execute (this=<optimized out>, thd=0x590534000db8) at /test/10.6b_dbg/sql/structs.h:559 #11 0x000055f34956e1ec in mysql_execute_command (thd=thd@entry=0x590534000db8) at /test/10.6b_dbg/sql/sql_parse.cc:5858 #12 0x000055f349554e6e in mysql_parse (thd=thd@entry=0x590534000db8, rawbuf=<optimized out>, length=<optimized out>, parser_state=parser_state@entry=0x14c53a4433d0) at /test/10.6b_dbg/sql/sql_parse.cc:7881 #13 0x000055f349562f0d in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x590534000db8, packet=packet@entry=0x590534008d39 "ALTER TABLE t1 ADD COLUMN (b INT)", packet_length=packet_length@entry=33) at /test/10.6b_dbg/sql/sql_class.h:1293 #14 0x000055f349566236 in do_command (thd=0x590534000db8) at /test/10.6b_dbg/sql/sql_parse.cc:1348 #15 0x000055f3496c05eb in do_handle_one_connection (connect=<optimized out>, connect@entry=0x55f34d954578, put_in_cache=put_in_cache@entry=true) at /test/10.6b_dbg/sql/sql_connect.cc:1410 #16 0x000055f3496c0cef in handle_one_connection (arg=arg@entry=0x55f34d954578) at /test/10.6b_dbg/sql/sql_connect.cc:1312 #17 0x000055f349b74f27 in pfs_spawn_thread (arg=0x55f34d87c5a8) at /test/10.6b_dbg/storage/perfschema/pfs.cc:2201 #18 0x000052580d6b1609 in start_thread (arg=<optimized out>) at pthread_create.c:477 #19 0x00005e230474f293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

            In the rr replay trace, I see that the table is empty except for the metadata record, but the leaf page is not the root page. The root page (number 3) contains a pointer to the empty child page (number 4). The assertion fails, because an empty table is assumed to consist of an empty leaf page.

            A similar false assumption was fixed as a part of MDEV-14126.
            Roel, please check if the patch on top of bb-10.6-MDEV-24653 would fix the problem. (It should cherry-pick cleanly to 10.5 as well, but not earlier versions.) And please preserve the rr replay trace (and the executables) for further analysis. I would like to see why we missed the opportunity to shrink the tree. In any case, I think that this corruption needs to be urgently fixed in our upcoming releases, hence I would propose my quick and simple patch, and investigate the reason for the non-canonical tree state later.

            marko Marko Mäkelä added a comment - In the rr replay trace, I see that the table is empty except for the metadata record, but the leaf page is not the root page. The root page (number 3) contains a pointer to the empty child page (number 4). The assertion fails, because an empty table is assumed to consist of an empty leaf page. A similar false assumption was fixed as a part of MDEV-14126 . Roel , please check if the patch on top of bb-10.6- MDEV-24653 would fix the problem. (It should cherry-pick cleanly to 10.5 as well, but not earlier versions.) And please preserve the rr replay trace (and the executables) for further analysis. I would like to see why we missed the opportunity to shrink the tree. In any case, I think that this corruption needs to be urgently fixed in our upcoming releases, hence I would propose my quick and simple patch, and investigate the reason for the non-canonical tree state later.
            Roel Roel Van de Paar added a comment - - edited

            I tested first against bb-10.6-MDEV-24653 integrally (commit a6937dfa1d7868205e3c67ab5aa0fe4d75815c09) and confirmed bug does not shows there with both single threaded plus repeated runs and multi-threaded plus repeated runs. Bug did not show. This is not exclusive/reliable, as alluded to in Marko's comment (requirement to patch previously used version with patch, and that (RV add) on the same machine), given that newer runs also did not show the issue thus far (strike-through: now have, ref below). Thus tested that at length, ref next comments.

            Roel Roel Van de Paar added a comment - - edited I tested first against bb-10.6- MDEV-24653 integrally (commit a6937dfa1d7868205e3c67ab5aa0fe4d75815c09) and confirmed bug does not shows there with both single threaded plus repeated runs and multi-threaded plus repeated runs. Bug did not show. This is not exclusive/reliable, as alluded to in Marko's comment (requirement to patch previously used version with patch, and that (RV add) on the same machine), given that newer runs also did not show the issue thus far (strike-through: now have, ref below). Thus tested that at length, ref next comments.
            Roel Roel Van de Paar added a comment - - edited

            Confirmed that the version used originally in this bug (9118fd360a3da0bba521caf2a35c424968235ac4) with the patch at the top of bb-10.6-MDEV-24653 seems to fix this bug (no crashes observed), and no other oddities of any kind were observed. Confirmed with both single threaded plus repeated runs and multi-threaded plus repeated runs. Reverse confirmed also. Patch looks good marko. Thank you

            Roel Roel Van de Paar added a comment - - edited Confirmed that the version used originally in this bug (9118fd360a3da0bba521caf2a35c424968235ac4) with the patch at the top of bb-10.6- MDEV-24653 seems to fix this bug (no crashes observed), and no other oddities of any kind were observed. Confirmed with both single threaded plus repeated runs and multi-threaded plus repeated runs. Reverse confirmed also. Patch looks good marko . Thank you
            Roel Roel Van de Paar added a comment - - edited

            Tested the same on 10.5 (rev 139c85aafd4e4938f95843d44a455265a49b572e where the issue reproduces fine) with the same patch. Bug also disappears. Same tests executed. Patch thus works on 10.5 also.

            Roel Roel Van de Paar added a comment - - edited Tested the same on 10.5 (rev 139c85aafd4e4938f95843d44a455265a49b572e where the issue reproduces fine) with the same patch. Bug also disappears. Same tests executed. Patch thus works on 10.5 also.
            Roel Roel Van de Paar added a comment - - edited

            Non-patched version 10.6.0 at rev 9118fd360a3da0bba521caf2a35c424968235ac4 (dbg, build 19-Jan-21) still shows the issue. It may be that issue reproducibility depends on the hardware used.

            Roel Roel Van de Paar added a comment - - edited Non-patched version 10.6.0 at rev 9118fd360a3da0bba521caf2a35c424968235ac4 (dbg, build 19-Jan-21) still shows the issue. It may be that issue reproducibility depends on the hardware used.
            Roel Roel Van de Paar added a comment - - edited

            Non-patched version 10.6.0 at 0d7380fdac1add2be3fdb57519ccd8ac9c8e12bc (head of trunk, build today) still shows the issue. So yes seems issue reproducibility is dependent on hardware used and not on later versions. Update: hardware seems not relevant either. It is not clear to me at this point what causes the issue to reproduce in some instances and not in others. Perhaps marko knows. The fix testing is however fully stable it seems as I used both non-patched and patched builds of the same version, and issue consistently reproduces on non-patched versions and never reproduces on patched versions.

            Roel Roel Van de Paar added a comment - - edited Non-patched version 10.6.0 at 0d7380fdac1add2be3fdb57519ccd8ac9c8e12bc (head of trunk, build today) still shows the issue. So yes seems issue reproducibility is dependent on hardware used and not on later versions. Update: hardware seems not relevant either. It is not clear to me at this point what causes the issue to reproduce in some instances and not in others. Perhaps marko knows. The fix testing is however fully stable it seems as I used both non-patched and patched builds of the same version, and issue consistently reproduces on non-patched versions and never reproduces on patched versions.
            Roel Roel Van de Paar added a comment - - edited

            TLDR; patch resolves issue on both 10.6 and 10.5.

            It is otherwise unclear as to why some non-patched mysqld's do not reproduce the issue. It seemed version, hardware or cmake options related, but nothing of those was conclusive, and the cause for this is likely elsewhere.

            Roel Roel Van de Paar added a comment - - edited TLDR; patch resolves issue on both 10.6 and 10.5. It is otherwise unclear as to why some non-patched mysqld's do not reproduce the issue. It seemed version, hardware or cmake options related, but nothing of those was conclusive, and the cause for this is likely elsewhere.

            Roel, I created what I believe is a reproducible test case for this. The randomness that you experienced was probably related to the timing of the purge of history. I added an explicit wait for it.

            --source include/have_innodb.inc
            --source include/have_debug.inc
             
            SET @saved_frequency = @@GLOBAL.innodb_purge_rseg_truncate_frequency;
            SET @saved_limit = @@GLOBAL.innodb_limit_optimistic_insert_debug;
            SET GLOBAL innodb_purge_rseg_truncate_frequency = 1;
            SET GLOBAL innodb_limit_optimistic_insert_debug = 2;
             
            CREATE TABLE t1 (a INT PRIMARY KEY) ENGINE=InnoDB;
            INSERT INTO t1 VALUES (1),(2),(3),(4);
            ALTER TABLE t1 ADD COLUMN b INT;
            DELETE FROM t1;
            --source include/wait_all_purged.inc
            ALTER TABLE t1 ADD COLUMN c INT;
             
            SELECT * FROM t1;
            DROP TABLE t1;
            SET GLOBAL innodb_purge_rseg_truncate_frequency = @saved_frequency;
            SET GLOBAL innodb_limit_optimistic_insert_debug = @saved_limit;
            

            Using this test case, I figured out why we end up with this situation. Because of innodb_limit_optimistic_insert_debug = 2, in the end we will only have the hidden ADD COLUMN metadata record in page 4. The right sibling of the page is deleted in purge:

            10.3 4e503aec7feaeb2297035f565ca92c98d4557577

            #2  0x0000556d400afcb3 in btr_page_set_next (page=0x7f58c7720000 "", 
                page_zip=0x0, next=4294967295, mtr=0x7f589f7fd280)
                at /mariadb/10.3/storage/innobase/include/btr0btr.ic:156
            #3  0x0000556d400bae6c in btr_level_list_remove_func (space=5, page_size=
                  @0x7f589f7fc8e0: {m_physical = 16384, m_logical = 16384, m_is_compressed = 0}, page=0x7f58c7724000 "", index=0x7f58bc0801e0, mtr=0x7f589f7fd280)
                at /mariadb/10.3/storage/innobase/btr/btr0btr.cc:3194
            #4  0x0000556d400bfa0a in btr_discard_page (cursor=0x556d42947578, 
                mtr=0x7f589f7fd280) at /mariadb/10.3/storage/innobase/btr/btr0btr.cc:4215
            #5  0x0000556d400f59f7 in btr_cur_pessimistic_delete (err=0x7f589f7fcfe0, 
                has_reserved_extents=0, cursor=0x556d42947578, flags=0, rollback=false, 
                mtr=0x7f589f7fd280) at /mariadb/10.3/storage/innobase/btr/btr0cur.cc:5772
            #6  0x0000556d3ffb9d0c in row_purge_remove_clust_if_poss_low (
                node=0x556d429474d0, mode=65569)
                at /mariadb/10.3/storage/innobase/row/row0purge.cc:149
            #7  0x0000556d3ffb6771 in row_purge_remove_clust_if_poss (node=0x556d429474d0)
                at /mariadb/10.3/storage/innobase/row/row0purge.cc:198
            #8  0x0000556d3ffb4e4e in row_purge_del_mark (node=0x556d429474d0) at /mariadb/10.3/storage/innobase/row/row0purge.cc:781
            

            At this point, we are missing the opportunity to raise the surviving sibling page up by one level, which would be the opposite of to the btr_page_split_and_insert() operation.

            I think that my original proposed fix is the lowest-risk resolution for this bug. Optimizing the B-tree space management (in the above stack trace) would be rather risky and best done in a development branch, not in GA versions.

            marko Marko Mäkelä added a comment - Roel , I created what I believe is a reproducible test case for this. The randomness that you experienced was probably related to the timing of the purge of history. I added an explicit wait for it. --source include/have_innodb.inc --source include/have_debug.inc   SET @saved_frequency = @@ GLOBAL .innodb_purge_rseg_truncate_frequency; SET @saved_limit = @@ GLOBAL .innodb_limit_optimistic_insert_debug; SET GLOBAL innodb_purge_rseg_truncate_frequency = 1; SET GLOBAL innodb_limit_optimistic_insert_debug = 2;   CREATE TABLE t1 (a INT PRIMARY KEY ) ENGINE=InnoDB; INSERT INTO t1 VALUES (1),(2),(3),(4); ALTER TABLE t1 ADD COLUMN b INT ; DELETE FROM t1; --source include/wait_all_purged.inc ALTER TABLE t1 ADD COLUMN c INT ;   SELECT * FROM t1; DROP TABLE t1; SET GLOBAL innodb_purge_rseg_truncate_frequency = @saved_frequency; SET GLOBAL innodb_limit_optimistic_insert_debug = @saved_limit; Using this test case, I figured out why we end up with this situation. Because of innodb_limit_optimistic_insert_debug = 2 , in the end we will only have the hidden ADD COLUMN metadata record in page 4. The right sibling of the page is deleted in purge: 10.3 4e503aec7feaeb2297035f565ca92c98d4557577 #2 0x0000556d400afcb3 in btr_page_set_next (page=0x7f58c7720000 "", page_zip=0x0, next=4294967295, mtr=0x7f589f7fd280) at /mariadb/10.3/storage/innobase/include/btr0btr.ic:156 #3 0x0000556d400bae6c in btr_level_list_remove_func (space=5, page_size= @0x7f589f7fc8e0: {m_physical = 16384, m_logical = 16384, m_is_compressed = 0}, page=0x7f58c7724000 "", index=0x7f58bc0801e0, mtr=0x7f589f7fd280) at /mariadb/10.3/storage/innobase/btr/btr0btr.cc:3194 #4 0x0000556d400bfa0a in btr_discard_page (cursor=0x556d42947578, mtr=0x7f589f7fd280) at /mariadb/10.3/storage/innobase/btr/btr0btr.cc:4215 #5 0x0000556d400f59f7 in btr_cur_pessimistic_delete (err=0x7f589f7fcfe0, has_reserved_extents=0, cursor=0x556d42947578, flags=0, rollback=false, mtr=0x7f589f7fd280) at /mariadb/10.3/storage/innobase/btr/btr0cur.cc:5772 #6 0x0000556d3ffb9d0c in row_purge_remove_clust_if_poss_low ( node=0x556d429474d0, mode=65569) at /mariadb/10.3/storage/innobase/row/row0purge.cc:149 #7 0x0000556d3ffb6771 in row_purge_remove_clust_if_poss (node=0x556d429474d0) at /mariadb/10.3/storage/innobase/row/row0purge.cc:198 #8 0x0000556d3ffb4e4e in row_purge_del_mark (node=0x556d429474d0) at /mariadb/10.3/storage/innobase/row/row0purge.cc:781 At this point, we are missing the opportunity to raise the surviving sibling page up by one level, which would be the opposite of to the btr_page_split_and_insert() operation. I think that my original proposed fix is the lowest-risk resolution for this bug. Optimizing the B-tree space management (in the above stack trace) would be rather risky and best done in a development branch, not in GA versions.

            Thank you marko.

            Roel Roel Van de Paar added a comment - Thank you marko .

            I adjusted btr_pcur_store_position() as well.

            marko Marko Mäkelä added a comment - I adjusted btr_pcur_store_position() as well.

            This fix is now merged up to the 10.6 branch. I filed MDEV-24673 for the failure to shrink the index tree.

            marko Marko Mäkelä added a comment - This fix is now merged up to the 10.6 branch. I filed MDEV-24673 for the failure to shrink the index tree.

            People

              marko Marko Mäkelä
              Roel Roel Van de Paar
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.