[MDEV-30940] innodb.lock_move_wait_lock_race hangs sporadically Created: 2023-03-28 Updated: 2024-01-17 |
|
| Status: | Open |
| Project: | MariaDB Server |
| Component/s: | Tests |
| Affects Version/s: | 10.6 |
| Fix Version/s: | 10.6 |
| Type: | Bug | Priority: | Major |
| Reporter: | Vladislav Lesin | Assignee: | Vladislav Lesin |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Description |
|
There is a hang in this test:
probably caused by this:
And this:
in UPDATE t SET c = NULL WHERE pk = 10. The test was killed not to wait for debug sync point timeout. |
| Comments |
| Comment by Vladislav Lesin [ 2023-03-28 ] | ||||||||||||||
|
Failed to reproduce it with:
But according to the backtraces, it looks like trx2_moved_locks signal was lost. lock_rec_store_on_page_infimum() signals with trx2_moved_locks about it was entered, and waits for trx2_cont signal to continue. But trx2_cont is sent only after another connection receives trx2_moved_locks. So, it looks like it hangs on SET DEBUG_SYNC="now WAIT_FOR trx2_moved_locks";. But why mtr reports connection was lost on table deletion? The second backtrace shows that lock_rec_store_on_page_infimum_end sync point was reached, and it should emit trx2_moved_locks. But it was not caught.
But the only reason I see why signal was emitted, but was not caught, is that it was overwritten by some other signal. It seems there were some changes in debug sync points, they could affect the test somehow, they need to be reviewed. There was some suspicion about | ||||||||||||||
| Comment by Marko Mäkelä [ 2024-01-17 ] | ||||||||||||||
|
I got this again on 10.6 today, exactly as in the Description. One innobase_kill_query is blocked by lock_sys.wait_mutex that is held by an UPDATE t SET c = "abcdefghij" WHERE pk = 10. That transaction had not written any undo log (trx->undo_no==0) and not holding any other record locks. If I got it right, it was waiting for a lock on the page infimum record on the clustered index root page. The UPDATE t SET c = NULL WHERE pk = 10 had only written one undo log record. All history up to these two transactions had been purged. vlad.lesin has a hypothesis that an unexpected transaction due to dict_stats_save() could ruin the DEBUG_SYNC flow. I will try if the following would fix this occasional failure:
|