Details
-
Bug
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
10.6
-
None
Description
There is a hang in this test:
CURRENT_TEST: innodb.lock_move_wait_lock_race
mysqltest: At line 57: query 'DROP TABLE t' failed: <Unknown> (2013): Lost connection to server during query
|
probably caused by this:
#8 0x0000562390d6d325 in debug_sync (thd=0x7f030c000d58, sync_point_name=<optimized out>, name_len=23) at /mariadb/10.6/sql/debug_sync.cc:1760 |
#9 0x000056239114dfaa in lock_sys_t::cancel<false> (trx=trx@entry=0x7f0390880180, lock=0x7f0390880280) at /mariadb/10.6/storage/innobase/lock/lock0lock.cc:5792 |
#10 0x0000562391149a25 in lock_wait (thr=thr@entry=0x7f030c05eb18) at /mariadb/10.6/storage/innobase/lock/lock0lock.cc:1891 |
#11 0x00005623911e1547 in row_mysql_handle_errors (new_err=new_err@entry=0x7f0388620030, trx=trx@entry=0x7f0390880180, thr=thr@entry=0x7f030c05eb18, savept=savept@entry=0x0) at /mariadb/10.6/storage/innobase/row/row0mysql.cc:681 |
#12 0x000056239120d9f4 in row_search_mvcc (buf=buf@entry=0x7f030c064e98 "\377", mode=PAGE_CUR_GE, prebuilt=0x7f030c05e480, match_mode=<optimized out>, direction=direction@entry=0) at /mariadb/10.6/storage/innobase/row/row0sel.cc:5856 |
#13 0x0000562390fec356 in ha_innobase::index_read (this=0x7f030c052300, buf=0x7f030c064e98 "\377", key_ptr=0x7f030c060310 "\n", key_len=4, find_flag=HA_READ_KEY_EXACT) at /mariadb/10.6/storage/innobase/handler/ha_innodb.cc:9060 |
in UPDATE t SET c = "abcdefghij" WHERE pk = 10 |
And this:
#8 0x0000562390d6d325 in debug_sync (thd=0x7f0314000d58, sync_point_name=<optimized out>, name_len=34) at /mariadb/10.6/sql/debug_sync.cc:1760 |
#9 0x0000562391137aba in operator() (__closure=__closure@entry=0x7f038b484066) at /mariadb/10.6/storage/innobase/lock/lock0lock.cc:3206 |
#10 0x000056239114137c in detail::scope_exit<lock_rec_store_on_page_infimum(const buf_block_t*, const rec_t*)::<lambda()> >::~scope_exit (this=0x7f038b484066, __in_chrg=<optimized out>) at /mariadb/10.6/include/scope.h:50 |
#11 lock_rec_store_on_page_infimum (block=block@entry=0x7f038b815950, rec=<optimized out>) at /mariadb/10.6/storage/innobase/lock/lock0lock.cc:3212 |
#12 0x00005623910969db in btr_cur_pessimistic_update (flags=flags@entry=10, cursor=cursor@entry=0x7f030804d708, offsets=offsets@entry=0x7f038b4841a8, offsets_heap=offsets_heap@entry=0x7f038b484248, entry_heap=entry_heap@entry=0x7f03141eaee0, big_rec=big_rec@entry=0x7f038b4841b8, update=0x7f030804d7e0, cmpl_info=1, thr=0x7f03141ee6f0, trx_id=2517, mtr=0x7f038b4844b0) at /mariadb/10.6/storage/innobase/btr/btr0cur.cc:4094 |
in UPDATE t SET c = NULL WHERE pk = 10.
The test was killed not to wait for debug sync point timeout.
Attachments
Issue Links
- relates to
-
MDEV-27701 Race on trx->lock.wait_lock between lock_rec_move() and lock_sys_t::cancel()
-
- Closed
-
Failed to reproduce it with:
./mtr --max-test-fail=1 --debug-sync-timeout=10 --parallel=256 innodb.lock_move_wait_lock_race{,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,} --repeat=1000
But according to the backtraces, it looks like trx2_moved_locks signal was lost. lock_rec_store_on_page_infimum() signals with trx2_moved_locks about it was entered, and waits for trx2_cont signal to continue. But trx2_cont is sent only after another connection receives trx2_moved_locks.
So, it looks like it hangs on SET DEBUG_SYNC="now WAIT_FOR trx2_moved_locks";. But why mtr reports connection was lost on table deletion?
The second backtrace shows that lock_rec_store_on_page_infimum_end sync point was reached, and it should emit trx2_moved_locks. But it was not caught.
There must not be signals overwriting, because a sequence of SET DEBUG_SYNC=s in default connection serializes sync points reaching:
--connection default
SET DEBUG_SYNC="now WAIT_FOR trx3_start_waiting";
COMMIT;
SET DEBUG_SYNC="now WAIT_FOR trx2_wait_end";
SET DEBUG_SYNC="now SIGNAL trx3_cont_waiting";
SET DEBUG_SYNC="now WAIT_FOR trx3_cancel_enter";
SET DEBUG_SYNC="now SIGNAL trx2_cont_upd";
SET DEBUG_SYNC="now WAIT_FOR trx2_moved_locks";
But the only reason I see why signal was emitted, but was not caught, is that it was overwritten by some other signal. It seems there were some changes in debug sync points, they could affect the test somehow, they need to be reviewed.
There was some suspicion about
MDEV-29593influence. But it did not find the way howMDEV-29593could influence. lock_rec_store_on_page_infimum() is invoked before page_cur_delete_rec(page_cursor, *offsets, mtr); in btr_cur_pessimistic_update() . And it's hanging before the record deleting. Besides, btr_cur_pessimistic_update() must be invoked under MTR_MEMO_PAGE_X_FIX.