[MDEV-25016] Race condition between lock_sys_t::cancel() and page split or merge Created: 2021-03-01 Updated: 2021-03-09 Resolved: 2021-03-04 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Affects Version/s: | 10.6 |
| Fix Version/s: | 10.6.0 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Marko Mäkelä | Assignee: | Marko Mäkelä |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | performance | ||
| Issue Links: |
|
||||||||||||||||||||
| Description |
|
In
It seems that we can work around this bug by making innobase_kill_query() acquire an exclusive lock_sys.latch instead of a shared one. This work-around will obviously hurt performance, and I would think that it is merely reducing the probability of such hangs, instead of fixing them altogether. Until this bug is fixed, we can invoke the work-around whenever thd_need_wait_reports() holds. Note: thd_need_wait_reports() holds even when no replication is being used, and only the option log_bin is enabled. That condition seems to be necessary, because without it, the test binlog.rpl_parallel_optimistic would hang (fall back to innodb_lock_wait_timeout). |
| Comments |
| Comment by Marko Mäkelä [ 2021-03-02 ] | |||||||||||||
|
To repeat this, apply the following patch:
and run the following:
On my system, I can observe that the CPU load will reduce as more and more tests will hang. Finally, one of the tests will time out. Without the patch, the tests will complete. | |||||||||||||
| Comment by Marko Mäkelä [ 2021-03-02 ] | |||||||||||||
|
I am sorry for the false alarm. It turns out that my final fix of With that problem fixed, the 3 tests pass even if I remove my workaround. Waiting table lock requests, which are only possible in special cases, can be released while holding only a shared lock_sys.latch. |