[MDEV-25522] Assertion `page_is_leaf(block->frame)' in lock_rec_print Created: 2021-04-26  Updated: 2021-04-28  Resolved: 2021-04-28

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.6.0
Fix Version/s: 10.6.1

Type: Bug Priority: Blocker
Reporter: Matthias Leich Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: regression-10.6, rr-profile-analyzed

Issue Links:
Problem/Incident
is caused by MDEV-24589 DROP TABLE is not crash-safe Closed

 Description   

Problem hit during RQG testing on 10.6.
mysqld: /data/Server/10.6alpha/storage/innobase/lock/lock0lock.cc:3985: void lock_rec_print(FILE*, const ib_lock_t*, mtr_t&): Assertion `page_is_leaf(block->frame)' failed.
 
Query (0x62b0002a7238): DELETE FROM t7 WHERE col_int = 4 OR col_int IS NULL
Status: NOT_KILLED
 
(rr) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007fffd6cdf859 in __GI_abort () at abort.c:79
#2  0x00007fffd6cdf729 in __assert_fail_base (fmt=0x7fffd6e75588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x55a78b92b040 "page_is_leaf(block->frame)", file=0x55a78b9250c0 "/data/Server/10.6alpha/storage/innobase/lock/lock0lock.cc", 
    line=3985, function=<optimized out>) at assert.c:92
#3  0x00007fffd6cf0f36 in __GI___assert_fail (assertion=0x55a78b92b040 "page_is_leaf(block->frame)", file=0x55a78b9250c0 "/data/Server/10.6alpha/storage/innobase/lock/lock0lock.cc", line=3985, 
    function=0x55a78b92ae20 "void lock_rec_print(FILE*, const ib_lock_t*, mtr_t&)") at assert.c:101
#4  0x000055a78a42777f in lock_rec_print (file=0x615000002880, lock=0x6cdb7c397ef0, mtr=...) at /data/Server/10.6alpha/storage/innobase/lock/lock0lock.cc:3985
#5  0x000055a78a434380 in Deadlock::print (lock=...) at /data/Server/10.6alpha/storage/innobase/lock/lock0lock.cc:5746
#6  0x000055a78a434ede in Deadlock::report (trx=0x6cdb7c3a14c0, current_trx=true) at /data/Server/10.6alpha/storage/innobase/lock/lock0lock.cc:5866
#7  0x000055a78a435a0e in Deadlock::check_and_resolve (trx=0x6cdb7c3a14c0) at /data/Server/10.6alpha/storage/innobase/lock/lock0lock.cc:5943
#8  0x000055a78a41651d in lock_wait (thr=0x62100026f6c8) at /data/Server/10.6alpha/storage/innobase/lock/lock0lock.cc:1766
#9  0x000055a78a612fd3 in row_mysql_handle_errors (new_err=0x47010f93d240, trx=0x6cdb7c3a14c0, thr=0x62100026f6c8, savept=0x0) at /data/Server/10.6alpha/storage/innobase/row/row0mysql.cc:745
#10 0x000055a78a6a868b in row_search_mvcc (buf=0x61a0002034b8 "\350'", mode=PAGE_CUR_G, prebuilt=0x62100026e988, match_mode=0, direction=1) at /data/Server/10.6alpha/storage/innobase/row/row0sel.cc:5758
#11 0x000055a78a2b1e01 in ha_innobase::general_fetch (this=0x61d000a8b6b8, buf=0x61a0002034b8 "\350'", direction=1, match_mode=0) at /data/Server/10.6alpha/storage/innobase/handler/ha_innodb.cc:8875
#12 0x000055a78a2b2877 in ha_innobase::rnd_next (this=0x61d000a8b6b8, buf=0x61a0002034b8 "\350'") at /data/Server/10.6alpha/storage/innobase/handler/ha_innodb.cc:9079
#13 0x000055a78999515a in handler::ha_rnd_next (this=0x61d000a8b6b8, buf=0x61a0002034b8 "\350'") at /data/Server/10.6alpha/sql/handler.cc:3068
#14 0x000055a789da536f in rr_sequential (info=0x47010f93e750) at /data/Server/10.6alpha/sql/records.cc:519
#15 0x000055a78907bd7e in READ_RECORD::read_record (this=0x47010f93e750) at /data/Server/10.6alpha/sql/records.h:81
#16 0x000055a789df22aa in mysql_delete (thd=0x62b0002a0218, table_list=0x62b0002a73c0, conds=0x62b0002a8138, order_list=0x62b0002a4e30, limit=18446744073709551614, options=0, result=0x0) at /data/Server/10.6alpha/sql/sql_delete.cc:782
#17 0x000055a789210b42 in mysql_execute_command (thd=0x62b0002a0218) at /data/Server/10.6alpha/sql/sql_parse.cc:4801
#18 0x000055a789225738 in mysql_parse (thd=0x62b0002a0218, rawbuf=0x62b0002a7238 "DELETE FROM t7 WHERE col_int = 4 OR col_int IS NULL  /* E_R Thread27 QNO 111 CON_ID 42 */", length=89, parser_state=0x47010f93fb20)
    at /data/Server/10.6alpha/sql/sql_parse.cc:8018
#19 0x000055a7891fe022 in dispatch_command (command=COM_QUERY, thd=0x62b0002a0218, packet=0x629000e7e219 " DELETE FROM t7 WHERE col_int = 4 OR col_int IS NULL  /* E_R Thread27 QNO 111 CON_ID 42 */ ", packet_length=91, blocking=true)
    at /data/Server/10.6alpha/sql/sql_parse.cc:1897
#20 0x000055a7891fb3f7 in do_command (thd=0x62b0002a0218, blocking=true) at /data/Server/10.6alpha/sql/sql_parse.cc:1406
#21 0x000055a7895dfb70 in do_handle_one_connection (connect=0x608000003bb8, put_in_cache=true) at /data/Server/10.6alpha/sql/sql_connect.cc:1410
#22 0x000055a7895df4d2 in handle_one_connection (arg=0x608000003bb8) at /data/Server/10.6alpha/sql/sql_connect.cc:1312
#23 0x00007fffd781d609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#24 0x00007fffd6ddc293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(rr)
 
sdp:/data/Results/1619433372/TBR-1034/dev/shm/vardir/1619433372/88/1/rr
_RR_TRACE_DIR="." rr replay --mark-stdio
 
10.6 1a647b700f6b72dc97211510a5d0c647d5d3d911 2021-04-23T10:07:08+03:00
 
RQG
-------
git clone https://github.com/mleich1/rqg --branch experimental RQG
 
perl rqg.pl \
--grammar=conf/mariadb/table_stress_innodb_nocopy1.yy \
--gendata=conf/mariadb/table_stress.zz \
--gendata_sql=conf/mariadb/table_stress.sql \
--redefine=conf/mariadb/xa.yy \
--mysqld=--loose-innodb_lock_schedule_algorithm=fcfs \
--mysqld=--loose-idle_write_transaction_timeout=0 \
--mysqld=--loose-idle_transaction_timeout=0 \
--mysqld=--loose-idle_readonly_transaction_timeout=0 \
--mysqld=--connect_timeout=60 \
--mysqld=--interactive_timeout=28800 \
--mysqld=--slave_net_timeout=60 \
--mysqld=--net_read_timeout=30 \
--mysqld=--net_write_timeout=60 \
--mysqld=--loose-table_lock_wait_timeout=50 \
--mysqld=--wait_timeout=28800 \
--mysqld=--lock-wait-timeout=86400 \
--mysqld=--innodb-lock-wait-timeout=50 \
--no-mask \
--queries=10000000 \
--seed=random \
--reporters=Backtrace \
--reporters=ErrorLog \
--reporters=Deadlock1 \
--validators=None \
--mysqld=--log_output=none \
--mysqld=--log-bin \
--mysqld=--log_bin_trust_function_creators=1 \
--mysqld=--loose-debug_assert_on_not_freed_memory=0 \
--engine=InnoDB \
--restart_timeout=240 \
--mysqld=--plugin-load-add=file_key_management.so \
--mysqld=--loose-file-key-management-filename=$RQG_HOME/home/mleich/RQG_O/conf/mariadb/encryption_keys.txt \
--duration=300 \
--mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \
--mysqld=--innodb_stats_persistent=on \
--mysqld=--innodb_adaptive_hash_index=on \
--mysqld=--loose-innodb_evict_tables_on_commit_debug=off \
--mysqld=--loose-max-statement-time=30 \
--threads=33 \
--mysqld=--innodb-use-native-aio=0 \
--rr=Extended \
--rr_options="--chaos --wait" \
--mysqld=--innodb_page_size=32K \
--mysqld=--innodb-buffer-pool-size=24M \
--duration=300 \
--no_mask \
--workdir=<local settings> \
--vardir=<local settings> \
--mtr-build-thread=<local settings> \
--basedir1=<local settings> \
--script_debug=_nix_



 Comments   
Comment by Matthias Leich [ 2021-04-26 ]

Error pattern for RQG
[ 'TBR-1034-MDEV-25522' , 'mysqld: .{1,150}lock0lock.cc.{1,30} void lock_rec_print.{1,100}: Assertion \`page_is_leaf\(block->frame\)\' failed.' ],
Archive: sdp:/data/Results/1619433372/002108.tar.xz

Comment by Marko Mäkelä [ 2021-04-26 ]

The problem is that when we are purging a DROP INDEX operation (or the rollback of a failed online ADD UNIQUE INDEX), we fail to discard the locks that existed on the index pages. In the execution trace, page 4 was a secondary index root page when the explicit lock object was created. It would later be reallocated as a BLOB page and filled with some data, which would trigger the debug assertion failure when the stale record lock was accessed.

Comment by Marko Mäkelä [ 2021-04-27 ]

At the time the record lock was created, we had index->online_status=0 but index->uncommitted. During the record lock creation, the ALTER TABLE was waiting for a metadata lock in the SQL layer:

mariadb-10.6.0

#14 0x000055a7895fee54 in MDL_wait::timed_wait (this=0x62b00023e348, owner=0x62b00023e2f8, abs_timeout=0x46fc39e22bf0, set_status_on_timeout=true, wait_state_name=0x55a78c4d0b10 <MDL_key::m_namespace_to_wait_state_name+48>) at /data/Server/10.6alpha/sql/mdl.cc:1166
#15 0x000055a7896031d5 in MDL_context::acquire_lock (this=0x62b00023e348, mdl_request=0x46fc39e22d40, lock_wait_timeout=2) at /data/Server/10.6alpha/sql/mdl.cc:2356
#16 0x000055a789603cec in MDL_context::upgrade_shared_lock (this=0x62b00023e348, mdl_ticket=0x607000091e30, new_type=MDL_EXCLUSIVE, lock_wait_timeout=2) at /data/Server/10.6alpha/sql/mdl.cc:2535
#17 0x000055a789092643 in wait_while_table_is_used (thd=0x62b00023e218, table=0x619000642898, function=HA_EXTRA_PREPARE_FOR_RENAME) at /data/Server/10.6alpha/sql/sql_base.cc:1311
#18 0x000055a789472337 in mysql_inplace_alter_table (thd=0x62b00023e218, table_list=0x62b000245438, table=0x619000642898, altered_table=0x46fc39e24840, ha_alter_info=0x46fc39e24150, target_mdl_request=0x46fc39e24240, alter_ctx=0x46fc39e252b0) at /data/Server/10.6alpha/sql/sql_table.cc:8133
#19 0x000055a78948476b in mysql_alter_table (thd=0x62b00023e218, new_db=0x62b000242c38, new_name=0x62b000243060, create_info=0x46fc39e26640, table_list=0x62b000245438, alter_info=0x46fc39e26510, order_num=0, order=0x0, ignore=false, if_exists=false) at /data/Server/10.6alpha/sql/sql_table.cc:10788
#20 0x000055a7895f81ef in Sql_cmd_alter_table::execute (this=0x62b000245c98, thd=0x62b00023e218) at /data/Server/10.6alpha/sql/sql_alter.cc:540
#21 0x000055a7892192ad in mysql_execute_command (thd=0x62b00023e218) at /data/Server/10.6alpha/sql/sql_parse.cc:5986
#22 0x000055a789225738 in mysql_parse (thd=0x62b00023e218, rawbuf=0x62b000245238 "ALTER TABLE t1 ADD UNIQUE INDEX IF NOT EXISTS `Marvão_uidx1` ( col_text(9) ), LOCK = DEFAULT, ALGORITHM = NOCOPY  /* E_R Thread20 QNO 27 CON_ID 35 */", length=150, parser_state=0x46fc39e27b20) at /data/Server/10.6alpha/sql/sql_parse.cc:8018

By design, one of the MDL holders was the INSERT:

#0  lock_rec_create_low (c_lock=0x61900001a6a8, type_mode=258, page_id=..., page=0x72134b670000 "", heap_no=3, index=0x61600495f108, trx=0x6cdb7c397a58, holds_trx_mutex=true)
    at /data/Server/10.6alpha/storage/innobase/lock/lock0lock.cc:1185
#1  0x000055a78a4133e2 in lock_rec_enqueue_waiting (c_lock=0x61900001a6a8, type_mode=2, id=..., page=0x72134b670000 "", heap_no=3, index=0x61600495f108, thr=0x62100022fac8, prdt=0x0)
    at /data/Server/10.6alpha/storage/innobase/lock/lock0lock.cc:1280
#2  0x000055a78a414df7 in lock_rec_lock (impl=false, mode=2, block=0x72134ab0f728, heap_no=3, index=0x61600495f108, thr=0x62100022fac8) at /data/Server/10.6alpha/storage/innobase/lock/lock0lock.cc:1516
#3  0x000055a78a430592 in lock_sec_rec_read_check_and_lock (flags=0, block=0x72134ab0f728, rec=0x72134b670093 "444444444\200", index=0x61600495f108, offsets=0x32717e863920, mode=LOCK_S, gap_mode=0, 
    thr=0x62100022fac8) at /data/Server/10.6alpha/storage/innobase/lock/lock0lock.cc:5177
#4  0x000055a78a5c74a2 in row_ins_set_shared_rec_lock (type=0, block=0x72134ab0f728, rec=0x72134b670093 "444444444\200", index=0x61600495f108, offsets=0x32717e863920, thr=0x62100022fac8)
    at /data/Server/10.6alpha/storage/innobase/row/row0ins.cc:1407
#5  0x000055a78a5ca4de in row_ins_scan_sec_index_for_duplicate (flags=0, index=0x61600495f108, entry=0x616006865408, thr=0x62100022fac8, s_latch=true, mtr=0x32717e864090, offsets_heap=0x6190004ebb80)
    at /data/Server/10.6alpha/storage/innobase/row/row0ins.cc:2147
#6  0x000055a78a5cfae3 in row_ins_sec_index_entry_low (flags=0, mode=2, index=0x61600495f108, offsets_heap=0x6190004ebb80, heap=0x6190004ec080, entry=0x616006865408, trx_id=0, thr=0x62100022fac8)
    at /data/Server/10.6alpha/storage/innobase/row/row0ins.cc:3072
#7  0x000055a78a5d0f27 in row_ins_sec_index_entry (index=0x61600495f108, entry=0x616006865408, thr=0x62100022fac8, check_foreign=true) at /data/Server/10.6alpha/storage/innobase/row/row0ins.cc:3346
#8  0x000055a78a5d1306 in row_ins_index_entry (index=0x61600495f108, entry=0x616006865408, thr=0x62100022fac8) at /data/Server/10.6alpha/storage/innobase/row/row0ins.cc:3394
#9  0x000055a78a5d2334 in row_ins_index_entry_step (node=0x62100022f5f8, thr=0x62100022fac8) at /data/Server/10.6alpha/storage/innobase/row/row0ins.cc:3561
#10 0x000055a78a5d2cf3 in row_ins (node=0x62100022f5f8, thr=0x62100022fac8) at /data/Server/10.6alpha/storage/innobase/row/row0ins.cc:3707
#11 0x000055a78a5d3e1b in row_ins_step (thr=0x62100022fac8) at /data/Server/10.6alpha/storage/innobase/row/row0ins.cc:3853
#12 0x000055a78a617634 in row_insert_for_mysql (mysql_rec=0x61a0002046b8 "\350\061", prebuilt=0x62100022ed88, ins_mode=ROW_INS_NORMAL) at /data/Server/10.6alpha/storage/innobase/row/row0mysql.cc:1395
#13 0x000055a78a2a9482 in ha_innobase::write_row (this=0x61d000d232b8, record=0x61a0002046b8 "\350\061") at /data/Server/10.6alpha/storage/innobase/handler/ha_innodb.cc:7430
#14 0x000055a7899b19c8 in handler::ha_write_row (this=0x61d000d232b8, buf=0x61a0002046b8 "\350\061") at /data/Server/10.6alpha/sql/handler.cc:7162
#15 0x000055a78915b369 in write_record (thd=0x62b0001ab218, table=0x6190004e9d98, info=0x32717e865eb0, sink=0x0) at /data/Server/10.6alpha/sql/sql_insert.cc:2106
#16 0x000055a789153d49 in mysql_insert (thd=0x62b0001ab218, table_list=0x62b0001b24b8, fields=..., values_list=..., update_fields=..., update_values=..., duplic=DUP_ERROR, ignore=false, result=0x0)
    at /data/Server/10.6alpha/sql/sql_insert.cc:1099
#17 0x000055a78920e80c in mysql_execute_command (thd=0x62b0001ab218) at /data/Server/10.6alpha/sql/sql_parse.cc:4559
#18 0x000055a789225738 in mysql_parse (thd=0x62b0001ab218, 
    rawbuf=0x62b0001b2238 "INSERT INTO t1 (col1,col2, col_int, col_string, col_text) VALUES ( 49, 49, 49, REPEAT(SUBSTR(CAST( 49 AS CHAR),1,1), 10), REPEAT(SUBSTR(CAST( 49 - 1 AS CHAR),1,1), @fill_amount) ) /* E_R Thread10 QNO "..., length=215, parser_state=0x32717e866b20) at /data/Server/10.6alpha/sql/sql_parse.cc:8018

Yes, it may feel a little strange to allow locks to be created on an uncommitted index, but that is how the logic always worked.

What would have happened before MDEV-24589 was implemented? We would have renamed the SYS_INDEXES.NAME for this uncommitted index to start with the byte 0xff, which is an invalid UTF-8 sequence, and that placeholder entry along with the index tree would have been removed later, while the table is not being used by any concurrent connection.

Discarding locks does not feel nice, but it seems that in this case we either have to do that or avoid acquiring locks on uncommitted indexes in the first place. It feels safer to break the locks when the uncommitted index pages are purged.

Comment by Marko Mäkelä [ 2021-04-27 ]

I think that the simplest and most efficient fix is to discard the record locks on the uncommitted index already in row_purge_remove_clust_if_poss_low() while not holding a latch on the SYS_INDEXES record. The code seems to be reasonably well covered by the test stress.ddl_innodb.

Generated at Thu Feb 08 09:38:20 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.