[MDEV-24653] Assertion `block->page.id().page_no() == index->page' failed in innobase_instant_try Created: 2021-01-22  Updated: 2021-03-22  Resolved: 2021-01-25

Status: Closed
Project: MariaDB Server
Component/s: Data Definition - Alter Table, Storage Engine - InnoDB
Affects Version/s: 10.3, 10.4, 10.5, 10.6
Fix Version/s: 10.3.28, 10.4.18, 10.5.9

Type: Bug Priority: Blocker
Reporter: Roel Van de Paar Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: not-10.2, regression, rr-profile

Issue Links:
Problem/Incident
is caused by MDEV-11369 Instant add column for InnoDB Closed
Relates
relates to MDEV-24673 btr_cur_pessimistic_delete() fails to... Open
relates to MDEV-14126 Assertion `page_get_page_no(page) == ... Closed
relates to MDEV-24796 Assertion `page_has_next(block->frame... Closed

 Description   

Possibly related to MDEV-22988, MDEV-11369. Replay via CLI required. SOURCE and Bash redirect to client do not work. CLI reproduces sporadically. MTR may not work either.

10.6.0 9118fd360a3da0bba521caf2a35c424968235ac4 (Debug)

10.6.0>SET GLOBAL innodb_limit_optimistic_insert_debug=2;
Query OK, 0 rows affected (0.000 sec)
 
10.6.0>CREATE TABLE t (c INT) ENGINE=InnoDB;
Query OK, 0 rows affected (0.014 sec)
 
10.6.0>INSERT INTO t VALUES (1),(2),(3),(4);
Query OK, 4 rows affected (0.007 sec)
Records: 4  Duplicates: 0  Warnings: 0
 
10.6.0>ALTER TABLE t ADD COLUMN (a INT);
Query OK, 0 rows affected (0.005 sec)
Records: 0  Duplicates: 0  Warnings: 0
 
10.6.0>DELETE FROM t;
Query OK, 4 rows affected (0.001 sec)
 
10.6.0>ALTER TABLE t ADD COLUMN (b INT);
ERROR 2013 (HY000): Lost connection to MySQL server during query

SET GLOBAL innodb_limit_optimistic_insert_debug=2;
CREATE TABLE t (c INT) ENGINE=MyISAM;
INSERT INTO t VALUES (1),(2),(3),(4);
ALTER TABLE t ADD COLUMN (a INT);
DELETE FROM t;
ALTER TABLE t ADD COLUMN (b INT);

Leads to:

10.6.0 9118fd360a3da0bba521caf2a35c424968235ac4 (Debug)

mysqld: /test/10.6_dbg/storage/innobase/handler/handler0alter.cc:5916: bool innobase_instant_try(const Alter_inplace_info*, ha_innobase_inplace_ctx*, const TABLE*, const TABLE*, trx_t*): Assertion `block->page.id().page_no() == index->page' failed.

10.6.0 9118fd360a3da0bba521caf2a35c424968235ac4 (Debug)

Core was generated by `/test/MD010121-mariadb-10.6.0-linux-x86_64-dbg/bin/mysqld --no-defaults --core-'.
Program terminated with signal SIGABRT, Aborted.
#0  __pthread_kill (threadid=<optimized out>, signo=signo@entry=6)
    at ../sysdeps/unix/sysv/linux/pthread_kill.c:56
[Current thread is 1 (Thread 0x14f168440700 (LWP 2019357))]
(gdb) bt
#0  __pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ../sysdeps/unix/sysv/linux/pthread_kill.c:56
#1  0x0000563bb11b60d7 in my_write_core (sig=sig@entry=6) at /test/10.6_dbg/mysys/stacktrace.c:424
#2  0x0000563bb094aab1 in handle_fatal_signal (sig=6) at /test/10.6_dbg/sql/signal_handler.cc:330
#3  <signal handler called>
#4  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#5  0x000014f16affe859 in __GI_abort () at abort.c:79
#6  0x000014f16affe729 in __assert_fail_base (fmt=0x14f16b194588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x563bb156ef48 "block->page.id().page_no() == index->page", file=0x563bb1566920 "/test/10.6_dbg/storage/innobase/handler/handler0alter.cc", line=5916, function=<optimized out>) at assert.c:92
#7  0x000014f16b00ff36 in __GI___assert_fail (assertion=assertion@entry=0x563bb156ef48 "block->page.id().page_no() == index->page", file=file@entry=0x563bb1566920 "/test/10.6_dbg/storage/innobase/handler/handler0alter.cc", line=line@entry=5916, function=function@entry=0x563bb156e8b8 "bool innobase_instant_try(const Alter_inplace_info*, ha_innobase_inplace_ctx*, const TABLE*, const TABLE*, trx_t*)") at assert.c:101
#8  0x0000563bb0da1f33 in innobase_instant_try (ha_alter_info=ha_alter_info@entry=0x14f16843c750, ctx=ctx@entry=0x14f124013f48, altered_table=altered_table@entry=0x14f16843c7f0, table=table@entry=0x14f12406bd28, trx=trx@entry=0x14f1686fc378) at /test/10.6_dbg/storage/innobase/include/buf0types.h:159
#9  0x0000563bb0da424e in commit_try_norebuild (ha_alter_info=ha_alter_info@entry=0x14f16843c750, ctx=ctx@entry=0x14f124013f48, altered_table=altered_table@entry=0x14f16843c7f0, old_table=0x14f12406bd28, trx=trx@entry=0x14f1686fc378, table_name=<optimized out>) at /test/10.6_dbg/storage/innobase/handler/handler0alter.cc:10212
#10 0x0000563bb0d8c147 in ha_innobase::commit_inplace_alter_table (this=0x14f1240233f0, altered_table=0x14f16843c7f0, ha_alter_info=0x14f16843c750, commit=true) at /test/10.6_dbg/storage/innobase/handler/handler0alter.cc:10975
#11 0x0000563bb09592c5 in handler::ha_commit_inplace_alter_table (this=0x14f1240233f0, altered_table=altered_table@entry=0x14f16843c7f0, ha_alter_info=ha_alter_info@entry=0x14f16843c750, commit=commit@entry=true) at /test/10.6_dbg/sql/handler.cc:4848
#12 0x0000563bb074e43e in mysql_inplace_alter_table (thd=thd@entry=0x14f124000db8, table_list=0x14f1240127b0, table=table@entry=0x14f12406bd28, altered_table=altered_table@entry=0x14f16843c7f0, ha_alter_info=ha_alter_info@entry=0x14f16843c750, target_mdl_request=target_mdl_request@entry=0x14f16843cd40, alter_ctx=0x14f16843d890) at /test/10.6_dbg/sql/sql_table.cc:8107
#13 0x0000563bb0761b1f in mysql_alter_table (thd=thd@entry=0x14f124000db8, new_db=new_db@entry=0x14f124005828, new_name=new_name@entry=0x14f124005c28, create_info=create_info@entry=0x14f16843e4a0, table_list=<optimized out>, table_list@entry=0x14f1240127b0, alter_info=alter_info@entry=0x14f16843e3d0, order_num=0, order=0x0, ignore=false, if_exists=false) at /test/10.6_dbg/sql/sql_table.cc:10651
#14 0x0000563bb07e7640 in Sql_cmd_alter_table::execute (this=<optimized out>, thd=0x14f124000db8) at /test/10.6_dbg/sql/structs.h:559
#15 0x0000563bb068b4c6 in mysql_execute_command (thd=thd@entry=0x14f124000db8) at /test/10.6_dbg/sql/sql_parse.cc:5858
#16 0x0000563bb0672072 in mysql_parse (thd=thd@entry=0x14f124000db8, rawbuf=<optimized out>, length=<optimized out>, parser_state=parser_state@entry=0x14f16843f3d0) at /test/10.6_dbg/sql/sql_parse.cc:7881
#17 0x0000563bb06801ec in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x14f124000db8, packet=packet@entry=0x14f124008d39 "ALTER TABLE t ADD COLUMN (b INT)", packet_length=packet_length@entry=32) at /test/10.6_dbg/sql/sql_class.h:1293
#18 0x0000563bb068352d in do_command (thd=0x14f124000db8) at /test/10.6_dbg/sql/sql_parse.cc:1348
#19 0x0000563bb07df7fc in do_handle_one_connection (connect=<optimized out>, connect@entry=0x563bb3e9b9c8, put_in_cache=put_in_cache@entry=true) at /test/10.6_dbg/sql/sql_connect.cc:1410
#20 0x0000563bb07dff03 in handle_one_connection (arg=arg@entry=0x563bb3e9b9c8) at /test/10.6_dbg/sql/sql_connect.cc:1312
#21 0x0000563bb0c9588f in pfs_spawn_thread (arg=0x563bb3dc38b8) at /test/10.6_dbg/storage/perfschema/pfs.cc:2201
#22 0x000014f16b50c609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#23 0x000014f16b0fb293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Nothing relevant observed in ASAN/UBSAN. No obvious problems on optimized builds.



 Comments   
Comment by Roel Van de Paar [ 2021-01-22 ]

Alternative testcase

SET GLOBAL innodb_limit_optimistic_insert_debug = 2;
CREATE TABLE t1 (c1 VARCHAR(10));
INSERT INTO t1 VALUES (41), (42), (43), (44);
ALTER TABLE t1 ADD COLUMN (i INT);
DELETE FROM t1;
ALTER TABLE t1 ADD COLUMN (b INT);

Comment by Roel Van de Paar [ 2021-01-22 ]

rr trace available on rr box, based on alternative testcase

cd /test/MD220121-mariadb-10.6.0-linux-x86_64-dbg/rr/mysqld-0
rr replay "${PWD}"

(rr) continue;
Thread 2 received signal SIGABRT, Aborted.
[Switching to Thread 3879826.3879889]
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(rr) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00005e2304652859 in __GI_abort () at abort.c:79
#2  0x00005e2304652729 in __assert_fail_base (fmt=0x5e23047e8588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", 
    assertion=0x55f34a45b210 "block->page.id().page_no() == index->page", 
    file=0x55f34a452bf0 "/test/10.6b_dbg/storage/innobase/handler/handler0alter.cc", line=5916, 
    function=<optimized out>) at assert.c:92
#3  0x00005e2304663f36 in __GI___assert_fail (
    assertion=assertion@entry=0x55f34a45b210 "block->page.id().page_no() == index->page", 
    file=file@entry=0x55f34a452bf0 "/test/10.6b_dbg/storage/innobase/handler/handler0alter.cc", line=line@entry=5916, 
    function=function@entry=0x55f34a45ab80 "bool innobase_instant_try(const Alter_inplace_info*, ha_innobase_inplace_ctx*, const TABLE*, const TABLE*, trx_t*)") at assert.c:101
#4  0x000055f349c82719 in innobase_instant_try (ha_alter_info=ha_alter_info@entry=0x14c53a440750, 
    ctx=ctx@entry=0x590534013f58, altered_table=altered_table@entry=0x14c53a4407f0, table=table@entry=0x59053406aa18, 
    trx=trx@entry=0x14c5441da378) at /test/10.6b_dbg/storage/innobase/include/buf0types.h:159
#5  0x000055f349c84a7e in commit_try_norebuild (ha_alter_info=ha_alter_info@entry=0x14c53a440750, 
    ctx=ctx@entry=0x590534013f58, altered_table=altered_table@entry=0x14c53a4407f0, old_table=0x59053406aa18, 
    trx=trx@entry=0x14c5441da378, table_name=<optimized out>)
    at /test/10.6b_dbg/storage/innobase/handler/handler0alter.cc:10212
#6  0x000055f349c6c8c4 in ha_innobase::commit_inplace_alter_table (this=0x59053406d580, altered_table=0x14c53a4407f0, 
    ha_alter_info=0x14c53a440750, commit=true) at /test/10.6b_dbg/storage/innobase/handler/handler0alter.cc:10975
#7  0x000055f349838abd in handler::ha_commit_inplace_alter_table (this=0x59053406d580, 
    altered_table=altered_table@entry=0x14c53a4407f0, ha_alter_info=ha_alter_info@entry=0x14c53a440750, 
    commit=commit@entry=true) at /test/10.6b_dbg/sql/handler.cc:4848
#8  0x000055f34962f3cc in mysql_inplace_alter_table (thd=thd@entry=0x590534000db8, table_list=0x5905340127b0, 
    table=table@entry=0x59053406aa18, altered_table=altered_table@entry=0x14c53a4407f0, 
    ha_alter_info=ha_alter_info@entry=0x14c53a440750, target_mdl_request=target_mdl_request@entry=0x14c53a440d40, 
    alter_ctx=0x14c53a441890) at /test/10.6b_dbg/sql/sql_table.cc:8107
#9  0x000055f3496429ff in mysql_alter_table (thd=thd@entry=0x590534000db8, new_db=new_db@entry=0x590534005828, new_name=new_name@entry=0x590534005c28, create_info=create_info@entry=0x14c53a4424a0, table_list=<optimized out>, table_list@entry=0x5905340127b0, alter_info=alter_info@entry=0x14c53a4423d0, order_num=0, order=0x0, ignore=false, if_exists=false) at /test/10.6b_dbg/sql/sql_table.cc:10651
#10 0x000055f3496c83bf in Sql_cmd_alter_table::execute (this=<optimized out>, thd=0x590534000db8) at /test/10.6b_dbg/sql/structs.h:559
#11 0x000055f34956e1ec in mysql_execute_command (thd=thd@entry=0x590534000db8) at /test/10.6b_dbg/sql/sql_parse.cc:5858
#12 0x000055f349554e6e in mysql_parse (thd=thd@entry=0x590534000db8, rawbuf=<optimized out>, length=<optimized out>, parser_state=parser_state@entry=0x14c53a4433d0) at /test/10.6b_dbg/sql/sql_parse.cc:7881
#13 0x000055f349562f0d in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x590534000db8, packet=packet@entry=0x590534008d39 "ALTER TABLE t1 ADD COLUMN (b INT)", packet_length=packet_length@entry=33) at /test/10.6b_dbg/sql/sql_class.h:1293
#14 0x000055f349566236 in do_command (thd=0x590534000db8) at /test/10.6b_dbg/sql/sql_parse.cc:1348
#15 0x000055f3496c05eb in do_handle_one_connection (connect=<optimized out>, connect@entry=0x55f34d954578, put_in_cache=put_in_cache@entry=true) at /test/10.6b_dbg/sql/sql_connect.cc:1410
#16 0x000055f3496c0cef in handle_one_connection (arg=arg@entry=0x55f34d954578) at /test/10.6b_dbg/sql/sql_connect.cc:1312
#17 0x000055f349b74f27 in pfs_spawn_thread (arg=0x55f34d87c5a8) at /test/10.6b_dbg/storage/perfschema/pfs.cc:2201
#18 0x000052580d6b1609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#19 0x00005e230474f293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Comment by Marko Mäkelä [ 2021-01-22 ]

In the rr replay trace, I see that the table is empty except for the metadata record, but the leaf page is not the root page. The root page (number 3) contains a pointer to the empty child page (number 4). The assertion fails, because an empty table is assumed to consist of an empty leaf page.

A similar false assumption was fixed as a part of MDEV-14126.
Roel, please check if the patch on top of bb-10.6-MDEV-24653 would fix the problem. (It should cherry-pick cleanly to 10.5 as well, but not earlier versions.) And please preserve the rr replay trace (and the executables) for further analysis. I would like to see why we missed the opportunity to shrink the tree. In any case, I think that this corruption needs to be urgently fixed in our upcoming releases, hence I would propose my quick and simple patch, and investigate the reason for the non-canonical tree state later.

Comment by Roel Van de Paar [ 2021-01-23 ]

I tested first against bb-10.6-MDEV-24653 integrally (commit a6937dfa1d7868205e3c67ab5aa0fe4d75815c09) and confirmed bug does not shows there with both single threaded plus repeated runs and multi-threaded plus repeated runs. Bug did not show. This is not exclusive/reliable, as alluded to in Marko's comment (requirement to patch previously used version with patch, and that (RV add) on the same machine), given that newer runs also did not show the issue thus far (strike-through: now have, ref below). Thus tested that at length, ref next comments.

Comment by Roel Van de Paar [ 2021-01-23 ]

Confirmed that the version used originally in this bug (9118fd360a3da0bba521caf2a35c424968235ac4) with the patch at the top of bb-10.6-MDEV-24653 seems to fix this bug (no crashes observed), and no other oddities of any kind were observed. Confirmed with both single threaded plus repeated runs and multi-threaded plus repeated runs. Reverse confirmed also. Patch looks good marko. Thank you

Comment by Roel Van de Paar [ 2021-01-23 ]

Tested the same on 10.5 (rev 139c85aafd4e4938f95843d44a455265a49b572e where the issue reproduces fine) with the same patch. Bug also disappears. Same tests executed. Patch thus works on 10.5 also.

Comment by Roel Van de Paar [ 2021-01-23 ]

Non-patched version 10.6.0 at rev 9118fd360a3da0bba521caf2a35c424968235ac4 (dbg, build 19-Jan-21) still shows the issue. It may be that issue reproducibility depends on the hardware used.

Comment by Roel Van de Paar [ 2021-01-23 ]

Non-patched version 10.6.0 at 0d7380fdac1add2be3fdb57519ccd8ac9c8e12bc (head of trunk, build today) still shows the issue. So yes seems issue reproducibility is dependent on hardware used and not on later versions. Update: hardware seems not relevant either. It is not clear to me at this point what causes the issue to reproduce in some instances and not in others. Perhaps marko knows. The fix testing is however fully stable it seems as I used both non-patched and patched builds of the same version, and issue consistently reproduces on non-patched versions and never reproduces on patched versions.

Comment by Roel Van de Paar [ 2021-01-23 ]

TLDR; patch resolves issue on both 10.6 and 10.5.

It is otherwise unclear as to why some non-patched mysqld's do not reproduce the issue. It seemed version, hardware or cmake options related, but nothing of those was conclusive, and the cause for this is likely elsewhere.

Comment by Marko Mäkelä [ 2021-01-25 ]

Roel, I created what I believe is a reproducible test case for this. The randomness that you experienced was probably related to the timing of the purge of history. I added an explicit wait for it.

--source include/have_innodb.inc
--source include/have_debug.inc
 
SET @saved_frequency = @@GLOBAL.innodb_purge_rseg_truncate_frequency;
SET @saved_limit = @@GLOBAL.innodb_limit_optimistic_insert_debug;
SET GLOBAL innodb_purge_rseg_truncate_frequency = 1;
SET GLOBAL innodb_limit_optimistic_insert_debug = 2;
 
CREATE TABLE t1 (a INT PRIMARY KEY) ENGINE=InnoDB;
INSERT INTO t1 VALUES (1),(2),(3),(4);
ALTER TABLE t1 ADD COLUMN b INT;
DELETE FROM t1;
--source include/wait_all_purged.inc
ALTER TABLE t1 ADD COLUMN c INT;
 
SELECT * FROM t1;
DROP TABLE t1;
SET GLOBAL innodb_purge_rseg_truncate_frequency = @saved_frequency;
SET GLOBAL innodb_limit_optimistic_insert_debug = @saved_limit;

Using this test case, I figured out why we end up with this situation. Because of innodb_limit_optimistic_insert_debug = 2, in the end we will only have the hidden ADD COLUMN metadata record in page 4. The right sibling of the page is deleted in purge:

10.3 4e503aec7feaeb2297035f565ca92c98d4557577

#2  0x0000556d400afcb3 in btr_page_set_next (page=0x7f58c7720000 "", 
    page_zip=0x0, next=4294967295, mtr=0x7f589f7fd280)
    at /mariadb/10.3/storage/innobase/include/btr0btr.ic:156
#3  0x0000556d400bae6c in btr_level_list_remove_func (space=5, page_size=
      @0x7f589f7fc8e0: {m_physical = 16384, m_logical = 16384, m_is_compressed = 0}, page=0x7f58c7724000 "", index=0x7f58bc0801e0, mtr=0x7f589f7fd280)
    at /mariadb/10.3/storage/innobase/btr/btr0btr.cc:3194
#4  0x0000556d400bfa0a in btr_discard_page (cursor=0x556d42947578, 
    mtr=0x7f589f7fd280) at /mariadb/10.3/storage/innobase/btr/btr0btr.cc:4215
#5  0x0000556d400f59f7 in btr_cur_pessimistic_delete (err=0x7f589f7fcfe0, 
    has_reserved_extents=0, cursor=0x556d42947578, flags=0, rollback=false, 
    mtr=0x7f589f7fd280) at /mariadb/10.3/storage/innobase/btr/btr0cur.cc:5772
#6  0x0000556d3ffb9d0c in row_purge_remove_clust_if_poss_low (
    node=0x556d429474d0, mode=65569)
    at /mariadb/10.3/storage/innobase/row/row0purge.cc:149
#7  0x0000556d3ffb6771 in row_purge_remove_clust_if_poss (node=0x556d429474d0)
    at /mariadb/10.3/storage/innobase/row/row0purge.cc:198
#8  0x0000556d3ffb4e4e in row_purge_del_mark (node=0x556d429474d0) at /mariadb/10.3/storage/innobase/row/row0purge.cc:781

At this point, we are missing the opportunity to raise the surviving sibling page up by one level, which would be the opposite of to the btr_page_split_and_insert() operation.

I think that my original proposed fix is the lowest-risk resolution for this bug. Optimizing the B-tree space management (in the above stack trace) would be rather risky and best done in a development branch, not in GA versions.

Comment by Roel Van de Paar [ 2021-01-25 ]

Thank you marko.

Comment by Marko Mäkelä [ 2021-01-25 ]

I adjusted btr_pcur_store_position() as well.

Comment by Marko Mäkelä [ 2021-01-25 ]

This fix is now merged up to the 10.6 branch. I filed MDEV-24673 for the failure to shrink the index tree.

Generated at Thu Feb 08 09:31:37 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.