[MDEV-24671] Assertion failure in lock_wait_table_reserve_slot() Created: 2021-01-25  Updated: 2023-10-03  Resolved: 2021-01-27

Status: Closed
Project: MariaDB Server
Component/s: Locking, Replication, Storage Engine - InnoDB, XA
Affects Version/s: 10.6
Fix Version/s: 10.6.0

Type: Bug Priority: Blocker
Reporter: Roel Van de Paar Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: not-10.2, not-10.3, not-10.4, not-10.5, regression

Issue Links:
Blocks
blocks MDEV-20612 Improve InnoDB lock_sys scalability Closed
blocks MDEV-24738 Improve the InnoDB deadlock checker Closed
Problem/Incident
causes MDEV-24789 Performance regression after MDEV-24671 Closed
causes MDEV-24985 Shutdown fails to abort current InnoD... Closed
causes MDEV-29311 Server Status Innodb_row_lock_time% i... Closed
causes MDEV-32096 Parallel replication lags because inn... Closed
Relates
relates to MDEV-24948 thd_need_wait_reports() hurts perform... Open
relates to MDEV-24915 Galera conflict resolution is unneces... Closed

 Description   

Tables are auto-defined as InnoDB, no --log-bin option needed. Opt+dbg affected.

CREATE TABLE t1 (c1 VARCHAR(10));
CREATE DEFINER=root@localhost EVENT e1 ON SCHEDULE EVERY '1' SECOND COMMENT 'a' DO DELETE FROM t1; 
SET GLOBAL innodb_lock_wait_timeout=347;
XA START 'b';
SET SESSION max_statement_time=65535;
INSERT INTO t1 VALUES (1),(2),(1);
SET GLOBAL event_scheduler=on;
CHANGE MASTER TO master_host='127.0.0.1';
START SLAVE sql_thread;
SELECT MASTER_POS_WAIT('MASTER-bin.000001', 1116, 300);

Leads to:

10.6.0 9118fd360a3da0bba521caf2a35c424968235ac4 (Debug)

2021-01-25 14:10:05 0 [Note] /test/MD010121-mariadb-10.6.0-linux-x86_64-dbg/bin/mysqld: ready for connections.
Version: '10.6.0-MariaDB-debug'  socket: '/test/MD010121-mariadb-10.6.0-linux-x86_64-dbg/socket.sock'  port: 14429  MariaDB Server
2021-01-25 14:10:41 5 [Note] Event Scheduler: scheduler thread started with id 5
2021-01-25 14:10:41 4 [Note] Master connection name: ''  Master_info_file: 'master.info'  Relay_info_file: 'relay-log.info'
2021-01-25 14:10:41 4 [Warning] Neither --relay-log nor --relay-log-index were used; so replication may break when this MySQL server acts as a slave and has his hostname changed!! Please use '--log-basename=#' or '--relay-log=qa-roel-2-relay-bin' to avoid this problem.
2021-01-25 14:10:41 4 [Note] 'CHANGE MASTER TO executed'. Previous state master_host='', master_port='3306', master_log_file='', master_log_pos='4'. New state master_host='127.0.0.1', master_port='3306', master_log_file='', master_log_pos='4'.
2021-01-25 14:10:41 7 [Note] Slave SQL thread initialized, starting replication in log 'FIRST' at position 0, relay log './qa-roel-2-relay-bin.000001' position: 4
2021-01-25 14:15:22 288 [ERROR] InnoDB: There appear to be 281 user threads currently waiting inside InnoDB, which is the upper limit. Cannot continue operation. Before aborting, we print a list of waiting threads.
Slot 0: in use 1, timeout 347, time 281
... counts down in time ...
Slot 279: in use 1, timeout 347, time 2
Slot 280: in use 1, timeout 347, time 1
2021-01-25 14:15:22 0x14546116a700  InnoDB: Assertion failure in file /test/10.6_dbg/storage/innobase/lock/lock0wait.cc line 171
InnoDB: We intentionally generate a memory trap.
...
210125 14:15:22 [ERROR] mysqld got signal 6 ;
...
Query (0x1452fc019050): DELETE FROM t1

10.6.0 9118fd360a3da0bba521caf2a35c424968235ac4 (Debug)

Core was generated by `/test/MD010121-mariadb-10.6.0-linux-x86_64-dbg/bin/mysqld --no-defaults --core-'.
Program terminated with signal SIGABRT, Aborted.
#0  __pthread_kill (threadid=<optimized out>, signo=signo@entry=6)
    at ../sysdeps/unix/sysv/linux/pthread_kill.c:56
[Current thread is 1 (Thread 0x14e1b27f9700 (LWP 2305061))]
(gdb) bt
#0  __pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ../sysdeps/unix/sysv/linux/pthread_kill.c:56
#1  0x000055598f0350d7 in my_write_core (sig=sig@entry=6) at /test/10.6_dbg/mysys/stacktrace.c:424
#2  0x000055598e7c9ab1 in handle_fatal_signal (sig=6) at /test/10.6_dbg/sql/signal_handler.cc:330
#3  <signal handler called>
#4  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#5  0x000014e51947c859 in __GI_abort () at abort.c:79
#6  0x000055598ee1478f in ut_dbg_assertion_failed (expr=expr@entry=0x0, file=file@entry=0x55598f3fa818 "/test/10.6_dbg/storage/innobase/lock/lock0wait.cc", line=line@entry=171) at /test/10.6_dbg/storage/innobase/ut/ut0dbg.cc:60
#7  0x000055598ec6fe57 in lock_wait_table_reserve_slot (thr=thr@entry=0x14e18801c550, wait_timeout=wait_timeout@entry=347) at /test/10.6_dbg/storage/innobase/lock/lock0wait.cc:171
#8  0x000055598ec701e6 in lock_wait_suspend_thread (thr=thr@entry=0x14e18801c550) at /test/10.6_dbg/storage/innobase/lock/lock0wait.cc:267
#9  0x000055598ed4d94e in row_mysql_handle_errors (new_err=new_err@entry=0x14e1b27f5aac, trx=trx@entry=0x14e511b5a7e8, thr=thr@entry=0x14e18801c550, savept=savept@entry=0x0) at /test/10.6_dbg/storage/innobase/row/row0mysql.cc:736
#10 0x000055598ed98af6 in row_search_mvcc (buf=buf@entry=0x14e18801b1e8 "\377", mode=PAGE_CUR_G, prebuilt=0x14e18801bea8, match_mode=<optimized out>, direction=direction@entry=0) at /test/10.6_dbg/storage/innobase/row/row0sel.cc:5666
#11 0x000055598ebd8ecc in ha_innobase::index_read (this=this@entry=0x14e18801b650, buf=buf@entry=0x14e18801b1e8 "\377", key_ptr=key_ptr@entry=0x0, key_len=key_len@entry=0, find_flag=find_flag@entry=HA_READ_AFTER_KEY) at /test/10.6_dbg/storage/innobase/handler/ha_innodb.cc:8561
#12 0x000055598ebd9152 in ha_innobase::index_first (this=this@entry=0x14e18801b650, buf=buf@entry=0x14e18801b1e8 "\377") at /test/10.6_dbg/storage/innobase/handler/ha_innodb.cc:8923
#13 0x000055598ebd91db in ha_innobase::rnd_next (this=0x14e18801b650, buf=0x14e18801b1e8 "\377") at /test/10.6_dbg/storage/innobase/handler/ha_innodb.cc:9016
#14 0x000055598e7d1796 in handler::ha_rnd_next (this=0x14e18801b650, buf=0x14e18801b1e8 "\377") at /test/10.6_dbg/sql/handler.cc:3066
#15 0x000055598e9a6462 in rr_sequential (info=0x14e1b27f69e0) at /test/10.6_dbg/sql/records.h:82
#16 0x000055598e9c3cc5 in READ_RECORD::read_record (this=0x14e1b27f69e0) at /test/10.6_dbg/sql/records.h:81
#17 mysql_delete (thd=thd@entry=0x14e4d0d599e8, table_list=0x14e1880121c0, conds=<optimized out>, order_list=order_list@entry=0x14e188014a78, limit=18446744073709551615, options=<optimized out>, result=<optimized out>) at /test/10.6_dbg/sql/sql_delete.cc:774
#18 0x000055598e5071c8 in mysql_execute_command (thd=0x14e4d0d599e8) at /test/10.6_dbg/sql/sql_limit.h:67
#19 0x000055598e420a23 in sp_instr_stmt::exec_core (this=0x14e188012890, thd=<optimized out>, nextp=0x14e1b27f7bfc) at /test/10.6_dbg/sql/sp_head.cc:3768
#20 0x000055598e42e213 in sp_lex_keeper::reset_lex_and_exec_core (this=this@entry=0x14e1880128d8, thd=thd@entry=0x14e4d0d599e8, nextp=nextp@entry=0x14e1b27f7bfc, open_tables=open_tables@entry=false, instr=instr@entry=0x14e188012890) at /test/10.6_dbg/sql/sp_head.cc:3495
#21 0x000055598e42edde in sp_instr_stmt::execute (this=0x14e188012890, thd=0x14e4d0d599e8, nextp=0x14e1b27f7bfc) at /test/10.6_dbg/sql/sp_head.cc:3674
#22 0x000055598e426a00 in sp_head::execute (this=this@entry=0x14e188011540, thd=thd@entry=0x14e4d0d599e8, merge_da_on_success=merge_da_on_success@entry=true) at /test/10.6_dbg/sql/sp_head.cc:1437
#23 0x000055598e428aa2 in sp_head::execute_procedure (this=this@entry=0x14e188011540, thd=thd@entry=0x14e4d0d599e8, args=args@entry=0x14e1b27f83b0) at /test/10.6_dbg/sql/sp_head.cc:2449
#24 0x000055598e6351b5 in Event_job_data::execute (this=this@entry=0x14e1b27f8b40, thd=thd@entry=0x14e4d0d599e8, drop=<optimized out>) at /test/10.6_dbg/sql/event_data_objects.cc:1484
#25 0x000055598e9f111d in Event_worker_thread::run (this=this@entry=0x14e1b27f8ddf, thd=thd@entry=0x14e4d0d599e8, event=event@entry=0x14e4d0d59570) at /test/10.6_dbg/sql/event_scheduler.cc:323
#26 0x000055598e9f1309 in event_worker_thread (arg=arg@entry=0x14e4d0d59570) at /test/10.6_dbg/sql/event_scheduler.cc:268
#27 0x000055598eb1488f in pfs_spawn_thread (arg=0x14e4d0d65488) at /test/10.6_dbg/storage/perfschema/pfs.cc:2201
#28 0x000014e51998a609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#29 0x000014e519579293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

10.6.0 9118fd360a3da0bba521caf2a35c424968235ac4 (Optimized)

Core was generated by `/test/MD010121-mariadb-10.6.0-linux-x86_64-opt/bin/mysqld --no-defaults --core-'.
Program terminated with signal SIGABRT, Aborted.
#0  __pthread_kill (threadid=<optimized out>, signo=signo@entry=6)
    at ../sysdeps/unix/sysv/linux/pthread_kill.c:56
[Current thread is 1 (Thread 0x14b043a6e700 (LWP 35997))]
(gdb) bt
#0  __pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ../sysdeps/unix/sysv/linux/pthread_kill.c:56
#1  0x0000557b579c666f in my_write_core (sig=sig@entry=6) at /test/10.6_opt/mysys/stacktrace.c:424
#2  0x0000557b5743bd20 in handle_fatal_signal (sig=6) at /test/10.6_opt/sql/signal_handler.cc:330
#3  <signal handler called>
#4  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#5  0x000014b4628fc859 in __GI_abort () at abort.c:79
#6  0x0000557b57101db2 in ut_dbg_assertion_failed (expr=expr@entry=0x0, file=file@entry=0x557b57cc2d90 "/test/10.6_opt/storage/innobase/lock/lock0wait.cc", line=line@entry=171) at /test/10.6_opt/storage/innobase/ut/ut0dbg.cc:60
#7  0x0000557b570e0c6c in lock_wait_table_reserve_slot (thr=<optimized out>, wait_timeout=<optimized out>) at /test/10.6_opt/storage/innobase/lock/lock0wait.cc:171
#8  0x0000557b577b1e7c in lock_wait_suspend_thread (thr=thr@entry=0x14b0cc016f20) at /test/10.6_opt/storage/innobase/lock/lock0wait.cc:267
#9  0x0000557b57814180 in row_mysql_handle_errors (new_err=new_err@entry=0x14b043a6aca4, trx=trx@entry=0x14b45b0083f8, thr=thr@entry=0x14b0cc016f20, savept=savept@entry=0x0) at /test/10.6_opt/storage/innobase/row/row0mysql.cc:736
#10 0x0000557b57830c89 in row_search_mvcc (buf=buf@entry=0x14b0cc015c78 "\377", mode=mode@entry=PAGE_CUR_G, prebuilt=0x14b0cc0168a0, match_mode=match_mode@entry=0, direction=direction@entry=0) at /test/10.6_opt/storage/innobase/row/row0sel.cc:5666
#11 0x0000557b5776ce00 in ha_innobase::index_read (find_flag=HA_READ_AFTER_KEY, key_len=0, key_ptr=0x0, buf=0x14b0cc015c78 "\377", this=0x14b0cc016070) at /test/10.6_opt/storage/innobase/handler/ha_innodb.cc:8561
#12 ha_innobase::index_first (buf=0x14b0cc015c78 "\377", this=0x14b0cc016070) at /test/10.6_opt/storage/innobase/handler/ha_innodb.cc:8923
#13 ha_innobase::rnd_next (buf=0x14b0cc015c78 "\377", this=0x14b0cc016070) at /test/10.6_opt/storage/innobase/handler/ha_innodb.cc:9016
#14 ha_innobase::rnd_next (this=0x14b0cc016070, buf=0x14b0cc015c78 "\377") at /test/10.6_opt/storage/innobase/handler/ha_innodb.cc:9006
#15 0x0000557b5744227f in handler::ha_rnd_next (this=0x14b0cc016070, buf=0x14b0cc015c78 "\377") at /test/10.6_opt/sql/handler.cc:3066
#16 0x0000557b575961a6 in rr_sequential (info=0x14b043a6bae0) at /test/10.6_opt/sql/records.h:82
#17 0x0000557b575abed2 in READ_RECORD::read_record (this=0x14b043a6bae0) at /test/10.6_opt/sql/records.h:81
#18 mysql_delete (thd=thd@entry=0x14b404ca5fc8, table_list=0x14b0cc00d7d0, conds=<optimized out>, order_list=order_list@entry=0x14b0cc00fc98, limit=18446744073709551615, options=<optimized out>, result=0x0) at /test/10.6_opt/sql/sql_delete.cc:774
#19 0x0000557b57231ee0 in mysql_execute_command (thd=0x14b404ca5fc8) at /test/10.6_opt/sql/sql_limit.h:67
#20 0x0000557b57186349 in sp_instr_stmt::exec_core (this=0x14b0cc00dea0, thd=<optimized out>, nextp=0x14b043a6cc84) at /test/10.6_opt/sql/sp_head.cc:3768
#21 0x0000557b5718f275 in sp_lex_keeper::reset_lex_and_exec_core (this=0x14b0cc00dee8, thd=0x14b404ca5fc8, nextp=0x14b043a6cc84, open_tables=<optimized out>, instr=0x14b0cc00dea0) at /test/10.6_opt/sql/sp_head.cc:3495
#22 0x0000557b5718fc91 in sp_instr_stmt::execute (this=0x14b0cc00dea0, thd=0x14b404ca5fc8, nextp=0x14b043a6cc84) at /test/10.6_opt/sql/sp_head.cc:3674
#23 0x0000557b57189bdc in sp_head::execute (this=0x14b0cc00cb50, thd=0x14b404ca5fc8, merge_da_on_success=true) at /test/10.6_opt/sql/sp_head.cc:1437
#24 0x0000557b5718b2aa in sp_head::execute_procedure (this=this@entry=0x14b0cc00cb50, thd=thd@entry=0x14b404ca5fc8, args=args@entry=0x14b043a6d3e0) at /test/10.6_opt/sql/sp_head.cc:2449
#25 0x0000557b573177a4 in Event_job_data::execute (this=this@entry=0x14b043a6db40, thd=thd@entry=0x14b404ca5fc8, drop=<optimized out>) at /test/10.6_opt/sql/event_data_objects.cc:1484
#26 0x0000557b575c9bd5 in Event_worker_thread::run (this=0x14b043a6dddf, thd=0x14b404ca5fc8, event=0x14b404c9a720) at /test/10.6_opt/sql/event_scheduler.cc:323
#27 0x0000557b575c9cbf in event_worker_thread (arg=arg@entry=0x14b404c9a720) at /test/10.6_opt/sql/event_scheduler.cc:268
#28 0x0000557b576b9d76 in pfs_spawn_thread (arg=0x14b404cb1418) at /test/10.6_opt/storage/perfschema/pfs.cc:2201
#29 0x000014b462e0a609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#30 0x000014b4629f9293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Bug confirmed present in:
MariaDB: 10.6.0 (dbg), 10.6.0 (opt)

Bug (or feature/syntax) confirmed not present in:
MariaDB: 10.2.37 (dbg), 10.2.37 (opt), 10.3.28 (dbg), 10.3.28 (opt), 10.4.18 (dbg), 10.4.18 (opt), 10.5.9 (dbg), 10.5.9 (opt)
MySQL: 5.5.62 (dbg), 5.5.62 (opt), 5.6.50 (dbg), 5.6.50 (opt), 5.7.32 (dbg), 5.7.32 (opt), 8.0.22 (dbg), 8.0.22 (opt)

10.5.9 Is not affected:

10.5.9 139c85aafd4e4938f95843d44a455265a49b572e (Debug)

10.5.9>SELECT MASTER_POS_WAIT('MASTER-bin.000001', 1116, 300);
+-------------------------------------------------+
| MASTER_POS_WAIT('MASTER-bin.000001', 1116, 300) |
+-------------------------------------------------+
|                                              -1 |
+-------------------------------------------------+
1 row in set (5 min 0.001 sec)



 Comments   
Comment by Marko Mäkelä [ 2021-01-25 ]

This code will be refactored in MDEV-20612, as part of removing lock_wait_timeout_task. The new function lock_wait() will simply invoke mysql_cond_timedwait() or mysql_cond_wait() on trx->lock.cond while holding lock_sys.wait_mutex. There will be no separate lock wait array and no code to deal with it.

That said, I wonder why you were not able to repeat this in 10.5 or earlier.

Comment by Marko Mäkelä [ 2021-01-26 ]

I think that the easiest fix is to remove the ‘slots’  altogether and simply rewrite lock waits around mysql_cond_wait(&trx->lock.cond, &lock_sys.wait_mutex). That is a prerequisite for MDEV-20612 as well.

Comment by Matthias Leich [ 2021-01-27 ]

Results of RQG testing on origin/bb-10.6-MDEV-24671 78c174785a7f05b4e97470be4f68d6326e6ef51f 2021-01-26T17:48:14+02:00
Only failures which were seen on other 10.6 trees too.

Comment by Roel Van de Paar [ 2021-01-27 ]

Marko asked if I could maybe give bb-10.6-MDEV-24671 a spin too. I see five four three two zero seemingly new issues not present in an equal duration 6.0 run, nor seen in other previous 6.0 runs:

[REF_NEXT_COMMENT] !node->table->skip_alter_undo|SIGABRT|row_upd_clust_rec|row_upd_clust_step|row_upd|row_upd_step                                                                          
[REF_NEXT_COMMENT] (block)->index || (block)->n_pointers == 0|SIGABRT|ut_dbg_assertion_failed|btr_search_drop_page_hash_index|btr_free_root|btr_free                                        
[REF_NEXT_COMMENT] (block)->index || (block)->n_pointers == 0|SIGABRT|ut_dbg_assertion_failed|btr_search_drop_page_hash_index|btr_search_build_page_hash_index|btr_search_info_update_slow  
[REF_NEXT_COMMENT] thr->graph->trx->id == trx_read_trx_id( static_cast<const byte*>( trx_id->data)) || static_cast<ins_node_t*>( thr->run_node)->bulk_insert|SIGABRT|btr_cur_optimistic_insert|row_ins_clust_index_entry_low|row_ins_clust_index_entry|row_ins_index_entry
[REF_NEXT_COMMENT] thr_get_trx(thr)->id || index->table->no_rollback()|SIGABRT|row_ins_index_entry|row_ins_index_entry_step|row_ins|row_ins_step

Reducing. Branch rev: 78c174785a7f05b4e97470be4f68d6326e6ef51f

Comment by Roel Van de Paar [ 2021-01-27 ]

Further findings:

!node->table->skip_alter_undo|SIGABRT|row_upd_clust_rec|row_upd_clust_step|row_upd|row_upd_step                                                                          
(block)->index || (block)->n_pointers == 0|SIGABRT|ut_dbg_assertion_failed|btr_search_drop_page_hash_index|btr_free_root|btr_free                                        
(block)->index || (block)->n_pointers == 0|SIGABRT|ut_dbg_assertion_failed|btr_search_drop_page_hash_index|btr_search_build_page_hash_index|btr_search_info_update_slow  
thr->graph->trx->id == trx_read_trx_id( static_cast<const byte*>( trx_id->data)) || static_cast<ins_node_t*>( thr->run_node)->bulk_insert|SIGABRT|btr_cur_optimistic_insert|row_ins_clust_index_entry_low|row_ins_clust_index_entry|row_ins_index_entry
thr_get_trx(thr)->id || index->table->no_rollback()|SIGABRT|row_ins_index_entry|row_ins_index_entry_step|row_ins|row_ins_step

> Were all found in normal 6.0 also: MDEV-24715, MDEV-22255 subcase, MDEV-24716, MDEV-20203 subcase

Generated at Thu Feb 08 09:31:46 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.