Details
-
Bug
-
Status: Confirmed (View Workflow)
-
Critical
-
Resolution: Unresolved
-
10.11, 11.4, 11.8, 12.1, 12.2
-
Can result in hang or crash
Description
MDEV-24621 introduced a lock order inversion bug. It is somewhat obscured, because the call stack involves multiple mtr_t objects.
saahil produced the following stack trace of a hang, which occurred during a test of the final version of MDEV-37244:
ssh sdp
|
gdb -c /data/results/1760698156/MDEV-37900/1/data/core /Server_bin/MDEV-37244_debug_Og/bin/mariadbd
|
|
11.4 6087bdae8ce30614b9933c258238c80d440bccad |
Thread 11 (Thread 0x702a283116c0 (LWP 3968788)):
|
#0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
|
#1 0x0000584418f74ee0 in ssux_lock_impl<true>::wait (this=this@entry=0x7029c0ceba48, lk=lk@entry=0x80000001) at /data/Server/MDEV-37244/storage/innobase/sync/srw_lock.cc:261
|
#2 0x0000584418f75087 in ssux_lock_impl<true>::wr_wait (this=this@entry=0x7029c0ceba48, lk=0x80000001) at /data/Server/MDEV-37244/storage/innobase/sync/srw_lock.cc:371
|
#3 0x0000584418dfceda in ssux_lock_impl<true>::wr_lock (this=this@entry=0x7029c0ceba48) at /data/Server/MDEV-37244/storage/innobase/include/srw_lock.h:286
|
#4 0x0000584418dfea5b in sux_lock<ssux_lock_impl<true> >::x_lock (this=this@entry=0x7029c0ceba48, for_io=for_io@entry=0x0) at /data/Server/MDEV-37244/storage/innobase/include/sux_lock.h:436
|
#5 0x0000584418f34767 in mtr_t::x_lock (this=this@entry=0x702a2830db90, lock=0x7029c0ceba48) at /data/Server/MDEV-37244/storage/innobase/include/mtr0mtr.h:272
|
#6 0x0000584418fbbfe2 in BtrBulk::finish (this=this@entry=0x702a2830e390, err=err@entry=DB_SUCCESS) at /data/Server/MDEV-37244/storage/innobase/btr/btr0bulk.cc:1179
|
#7 0x0000584418f13fd5 in row_merge_bulk_t::load_one_row (this=this@entry=0x7029700b7610, trx=trx@entry=0x702a14a07f00) at /data/Server/MDEV-37244/storage/innobase/row/row0merge.cc:5210
|
#8 0x0000584418f14334 in row_merge_bulk_t::bulk_insert_buffered (this=0x7029700b7610, row=..., ind=..., trx=trx@entry=0x702a14a07f00) at /data/Server/MDEV-37244/storage/innobase/row/row0merge.cc:5292
|
#9 0x0000584418f8a828 in trx_mod_table_time_t::bulk_insert_buffered (trx=0x702a14a07f00, index=..., entry=..., this=<optimized out>) at /data/Server/MDEV-37244/storage/innobase/include/trx0trx.h:503
|
#10 trx_undo_report_row_operation (thr=thr@entry=0x7029700fed68, index=index@entry=0x7029c0ceb950, clust_entry=clust_entry@entry=0x702978020e70, update=update@entry=0x0, cmpl_info=cmpl_info@entry=0x0, rec=rec@entry=0x0, offsets=0x0, roll_ptr=0x0) at /data/Server/MDEV-37244/storage/innobase/trx/trx0rec.cc:1875
|
#11 0x0000584418f092ac in row_ins_clust_index_entry_low (flags=flags@entry=0x0, mode=mode@entry=BTR_MODIFY_LEAF, index=index@entry=0x7029c0ceb950, n_uniq=n_uniq@entry=0x0, entry=entry@entry=0x702978020e70, n_ext=n_ext@entry=0x0, thr=0x7029700fed68) at /data/Server/MDEV-37244/storage/innobase/row/row0ins.cc:2819
|
#12 0x0000584418f09cde in row_ins_clust_index_entry (index=index@entry=0x7029c0ceb950, entry=entry@entry=0x702978020e70, thr=thr@entry=0x7029700fed68, n_ext=n_ext@entry=0x0) at /data/Server/MDEV-37244/storage/innobase/row/row0ins.cc:3331
|
#13 0x0000584418f09faf in row_ins_index_entry (index=0x7029c0ceb950, entry=0x702978020e70, thr=thr@entry=0x7029700fed68) at /data/Server/MDEV-37244/storage/innobase/row/row0ins.cc:3471
|
#14 0x0000584418f0a078 in row_ins_index_entry_step (node=node@entry=0x7029700feb10, thr=thr@entry=0x7029700fed68) at /data/Server/MDEV-37244/storage/innobase/row/row0ins.cc:3637
|
#15 0x0000584418f0a2ee in row_ins (node=node@entry=0x7029700feb10, thr=thr@entry=0x7029700fed68) at /data/Server/MDEV-37244/storage/innobase/row/row0ins.cc:3754
|
#16 0x0000584418f0a5c8 in row_ins_step (thr=thr@entry=0x7029700fed68) at /data/Server/MDEV-37244/storage/innobase/row/row0ins.cc:3883
|
#17 0x0000584418f22b93 in row_insert_for_mysql (mysql_rec=mysql_rec@entry=0x702970077330 "\377", prebuilt=0x7029700fe690, ins_mode=<optimized out>) at /data/Server/MDEV-37244/storage/innobase/row/row0mysql.cc:1300
|
#18 0x0000584418df197b in ha_innobase::write_row (this=0x702970076b18, record=0x702970077330 "\377") at /data/Server/MDEV-37244/storage/innobase/handler/ha_innodb.cc:7775
|
#19 0x0000584418bbe9a3 in handler::ha_write_row (this=0x702970076b18, buf=0x702970077330 "\377") at /data/Server/MDEV-37244/sql/handler.cc:8003
|
#20 0x00005844188efcd3 in Write_record::single_insert (this=this@entry=0x702a2830f6c0, inserted=inserted@entry=0x702a2830f620) at /data/Server/MDEV-37244/sql/sql_insert.cc:2317
|
#21 0x00005844188f02ec in Write_record::write_record (this=this@entry=0x702a2830f6c0) at /data/Server/MDEV-37244/sql/sql_insert.cc:2359
|
#22 0x000058441891800a in read_sep_field (thd=thd@entry=0x702970000d58, info=..., table_list=table_list@entry=0x7029700138f8, fields_vars=..., set_fields=..., set_values=..., read_info=..., enclosed=..., skip_lines=<optimized out>, ignore_check_option_errors=0x1) at /data/Server/MDEV-37244/sql/sql_load.cc:1262
|
#23 0x000058441891a50f in mysql_load (thd=thd@entry=0x702970000d58, ex=0x702970013818, table_list=table_list@entry=0x7029700138f8, fields_vars=..., set_fields=..., set_values=..., handle_duplicates=DUP_ERROR, ignore=<optimized out>, read_file_from_client=0x0) at /data/Server/MDEV-37244/sql/sql_load.cc:706
|
#24 0x0000584418928c5f in mysql_execute_command (thd=thd@entry=0x702970000d58, is_called_from_prepared_stmt=is_called_from_prepared_stmt@entry=0x0) at /data/Server/MDEV-37244/sql/sql_parse.cc:4865
|
#25 0x000058441892c390 in mysql_parse (thd=thd@entry=0x702970000d58, rawbuf=<optimized out>, length=<optimized out>, parser_state=parser_state@entry=0x702a28310400) at /data/Server/MDEV-37244/sql/sql_parse.cc:7900
|
#26 0x000058441892da05 in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x702970000d58, packet=packet@entry=0x70297000b0e9 " LOAD DATA INFILE 'load_table100_innodb_default' IGNORE INTO TABLE table100_innodb_default /* E_R Thread9 QNO 4344 CON_ID 23 */ ", packet_length=packet_length@entry=0x81, blocking=blocking@entry=0x1) at /data/Server/MDEV-37244/sql/sql_parse.cc:1905
|
The mtr object in frame 11 (row_ins_clust_index_entry_low()) is holding an exclusive latch on the clustered index root page. In frame 10 (trx_undo_report_row_operation() we are allocating a further mtr object (for no good reason, instead of reusing the one of our caller) for allocating undo log and writing a record. In frame 5, within a yet another mtr object, we are waiting for an exclusive latch on the index tree. This constitutes lock order inversion with respect to the rules that have been documented in WL#6326. An index tree latch may only be waited for while not holding any latches on pages of that index. (During reads, it is allowed to hold a latch on a secondary index leaf page while waiting for the clustered index tree latch.)
That this bug is being filed so many years after the regression was introduced in MDEV-24621 shows that a lock order inversion does not always lead to a deadlock or hang. A deadlock would require another thread that is participating in closing the waits-for cycle:
|
11.4 6087bdae8ce30614b9933c258238c80d440bccad |
Thread 8 (Thread 0x702a283f26c0 (LWP 3968782)):
|
#0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
|
#1 0x0000584418f74b81 in srw_mutex_impl<true>::wait (this=this@entry=0x7029ff801bb8, lk=lk@entry=0x5) at /data/Server/MDEV-37244/storage/innobase/sync/srw_lock.cc:253
|
#2 0x0000584418f74d40 in ssux_lock_impl<true>::rd_lock_nospin (this=0x7029ff801bb8) at /data/Server/MDEV-37244/storage/innobase/sync/srw_lock.cc:410
|
#3 0x0000584418f74eaa in ssux_lock_impl<true>::rd_lock_spin (this=<optimized out>) at /data/Server/MDEV-37244/storage/innobase/sync/srw_lock.cc:395
|
#4 0x0000584418ea0243 in ssux_lock_impl<true>::rd_lock (this=this@entry=0x7029ff801bb8) at /data/Server/MDEV-37244/storage/innobase/include/srw_lock.h:353
|
#5 0x0000584418ea6507 in sux_lock<ssux_lock_impl<true> >::s_lock (this=this@entry=0x7029ff801bb8) at /data/Server/MDEV-37244/storage/innobase/include/sux_lock.h:403
|
#6 0x0000584418feb7f5 in buf_page_get_gen (page_id=..., zip_size=0x0, rw_latch=rw_latch@entry=RW_S_LATCH, guess=guess@entry=0x0, mode=mode@entry=0xa, mtr=mtr@entry=0x702a283eff60, err=0x702a283ef7d4) at /data/Server/MDEV-37244/storage/innobase/buf/buf0buf.cc:2938
|
#7 0x0000584418faa897 in btr_block_get (index=..., page=page@entry=0x3, latch_mode=RW_S_LATCH, mtr=mtr@entry=0x702a283eff60, err=err@entry=0x702a283ef7d4, first=first@entry=0x702a283ef7d3, page_get_mode=0xa) at /data/Server/MDEV-37244/storage/innobase/btr/btr0btr.cc:230
|
#8 0x0000584418fd5100 in btr_cur_t::open_leaf (this=0x70299020b798, first=0x1, index=0x7029c0ceb950, latch_mode=<optimized out>, mtr=0x702a283eff60) at /data/Server/MDEV-37244/storage/innobase/btr/btr0cur.cc:1942
|
#9 0x0000584418f0abfa in btr_pcur_t::open_leaf (this=<optimized out>, first=<optimized out>, index=<optimized out>, latch_mode=<optimized out>, mtr=<optimized out>) at /data/Server/MDEV-37244/storage/innobase/include/btr0pcur.h:392
|
#10 0x0000584418f4d241 in row_search_mvcc (buf=buf@entry=0x7029900dde60 "\377", mode=PAGE_CUR_G, prebuilt=0x70299020b5d0, match_mode=<optimized out>, direction=direction@entry=0x0) at /data/Server/MDEV-37244/storage/innobase/row/row0sel.cc:4846
|
#11 0x0000584418de9a76 in ha_innobase::index_read (this=this@entry=0x7029900dd648, buf=0x7029900dde60 "\377", key_ptr=key_ptr@entry=0x0, key_len=key_len@entry=0x0, find_flag=find_flag@entry=HA_READ_AFTER_KEY) at /data/Server/MDEV-37244/storage/innobase/handler/ha_innodb.cc:9016
|
#12 0x0000584418de9ddb in ha_innobase::index_first (this=this@entry=0x7029900dd648, buf=<optimized out>) at /data/Server/MDEV-37244/storage/innobase/handler/ha_innodb.cc:9364
|
#13 0x0000584418de9e02 in ha_innobase::rnd_next (this=0x7029900dd648, buf=<optimized out>) at /data/Server/MDEV-37244/storage/innobase/handler/ha_innodb.cc:9468
|
#14 0x0000584418bb736d in handler::ha_rnd_next (this=0x7029900dd648, buf=0x7029900dde60 "\377") at /data/Server/MDEV-37244/sql/handler.cc:3749
|
#15 0x0000584418866e11 in rr_sequential (info=0x702990016c78) at /data/Server/MDEV-37244/sql/records.cc:513
|
#16 0x0000584418976309 in READ_RECORD::read_record (this=0x702990016c78) at /data/Server/MDEV-37244/sql/records.h:81
|
#17 join_init_read_record (tab=0x702990016ba8) at /data/Server/MDEV-37244/sql/sql_select.cc:25328
|
#18 0x000058441896c87a in sub_select (join=0x702990015330, join_tab=0x702990016ba8, end_of_records=0x0) at /data/Server/MDEV-37244/sql/sql_select.cc:24253
|
#19 0x000058441897f846 in do_select (join=join@entry=0x702990015330, procedure=<optimized out>) at /data/Server/MDEV-37244/sql/sql_select.cc:23767
|
#20 0x000058441899d830 in JOIN::exec_inner (this=this@entry=0x702990015330) at /data/Server/MDEV-37244/sql/sql_select.cc:5064
|
#21 0x000058441899d8e8 in JOIN::exec (this=this@entry=0x702990015330) at /data/Server/MDEV-37244/sql/sql_select.cc:4855
|
#22 0x000058441899c6e2 in mysql_select (thd=thd@entry=0x702990000d58, tables=0x702990013eb0, fields=..., conds=0x0, og_num=0x0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=0x81040b00, result=0x702990014eb0, unit=0x702990005008, select_lex=0x702990013810) at /data/Server/MDEV-37244/sql/sql_select.cc:5380
|
#23 0x000058441899ca81 in handle_select (thd=thd@entry=0x702990000d58, lex=lex@entry=0x702990004f28, result=result@entry=0x702990014eb0, setup_tables_done_option=setup_tables_done_option@entry=0x0) at /data/Server/MDEV-37244/sql/sql_select.cc:642
|
#24 0x000058441891cfe1 in execute_sqlcom_select (thd=thd@entry=0x702990000d58, all_tables=0x702990013eb0) at /data/Server/MDEV-37244/sql/sql_parse.cc:6185
|
#25 0x0000584418926868 in mysql_execute_command (thd=thd@entry=0x702990000d58, is_called_from_prepared_stmt=is_called_from_prepared_stmt@entry=0x0) at /data/Server/MDEV-37244/sql/sql_parse.cc:3971
|
#26 0x000058441892c390 in mysql_parse (thd=thd@entry=0x702990000d58, rawbuf=<optimized out>, length=<optimized out>, parser_state=parser_state@entry=0x702a283f1400) at /data/Server/MDEV-37244/sql/sql_parse.cc:7900
|
#27 0x000058441892da05 in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x702990000d58, packet=packet@entry=0x70299000b0e9 "SELECT * FROM `table100_innodb_default` INTO OUTFILE '/dev/shm/rqg_ext4/1760698156/221/gentest3968767.tmp' /* E_R Thread4 QNO 4066 CON_ID 20 */ ", packet_length=packet_length@entry=0x91, blocking=blocking@entry=0x1) at /data/Server/MDEV-37244/sql/sql_parse.cc:1905
|
This SELECT operation in Thread 8 is holding a latch on the clustered index tree. Before it got to wait for the root page latch, Thread 11 must have started its wait for an exclusive root page latch. Because InnoDB prioritizes writers, any wait for a shared latch will be blocked by a pending wait for an exclusive latch, until the exclusive latch has been granted and released.
I think that beyond fixing the bug, we need several changes here, both for performance and correctness:
- Use the local mtr_t object of row_ins_clust_index_entry_low() in trx_undo_report_row_operation().
- Move the rarely executed "start of bulk insert" code in row_ins_clust_index_entry_low() into a separate static ATTRIBUTE_COLD function. Anything beyond the conditions !(flags & BTR_NO_UNDO_LOG_FLAG) && page_is_empty(block->page.frame) could be checked in that function.
- Make BtrBulk use a caller-supplied mtr_t object.
- In mtr_t::x_lock(), mtr_t::s_lock() and mtr_t::u_lock(), assert get_savepoint()<2. After some refactoring of the code, we could also assert !get_savepoint() || btr_page_get_index_id(block_at_savepoint(0)->page.frame) != index.id).
Attachments
Issue Links
- is caused by
-
MDEV-24621 In bulk insert, pre-sort and build indexes one page at a time
-
- Closed
-