[MDEV-32096] Parallel replication lags because innobase_kill_query() may fail to interrupt a lock wait Created: 2023-09-05 Updated: 2023-11-20 Resolved: 2023-09-11 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Locking, Replication, Storage Engine - InnoDB |
| Affects Version/s: | 10.6.0, 10.6, 10.7, 10.8, 10.9, 10.10, 10.11, 11.0, 11.1, 11.2 |
| Fix Version/s: | 10.6.16, 10.10.7, 10.11.6, 11.0.4, 11.1.3 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Marko Mäkelä | Assignee: | Marko Mäkelä |
| Resolution: | Fixed | Votes: | 1 |
| Labels: | performance | ||
| Attachments: |
|
||||||||||||||||
| Issue Links: |
|
||||||||||||||||
| Description |
|
This can severely affect optimistic (and aggressive) parallel replication. If the race is triggered, conflicts are not resolved correctly and parallel replication will be blocked until --innodb-lock-wait-timeout. This will be seen in SHOW PROCESSLIST as one worker being in the "killed" state and some other worker stuck in a query. A user reported a hang of parallel replication due to this, and knielsen spotted the data race: If the target transaction starts a lock wait roughly at the same time as innobase_kill_query() is invoked, then trx->lock.wait_lock could be read as nullptr and the lock wait would not be interrupted. Therefore, we need to acquire lock_sys.wait_mutex before checking if a lock wait needs to be aborted. Attached mdev32096_testcase.patch is an (ugly) ./mtr testcase that triggers the problem. |
| Comments |
| Comment by Marko Mäkelä [ 2023-09-05 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
knielsen, thank you for mdev32096_testcase.patch
Without the fix, the test case hangs where it is expected:
Here is the test case:
| ||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Vladislav Lesin [ 2023-09-06 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
The fix looks good to me. I have some proposal of how to avoid unnecessary cursor restoring under some circumstances. In the case if innobase_kill_query() was finished after lock_rec_enqueue_waiting() call and before the first trx->lock.wait_lock check in lock_wait() in parallel thread(suppose we change lock_rec debug sync point to lock_wait_start in your test), lock_wait() returns DB_SUCCESS as trx->lock.was_chosen_as_deadlock_victim was not set(see the following code).
And then row_search_mvcc() jumps to rec_loop label, as row_mysql_handle_errors() returns true, where trx_is_interrupted(trx) is checked.
Maybe we could make the path shorter and check if the transaction was interrupted in the above block in lock_wait(), so that lock_wait() would return DB_INTERRUPTED instead of DB_SUCCESS? Such a way we could avoid unnecessary cursor restoring. | ||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-09-06 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
vlad.lesin, thank you, your suggestion makes sense. The following would implement it:
I tested it with the following:
Between the execution of lock_wait() getting past the lock_wait_start point and the server shutdown, there was no call to trx_is_interrupted(). The trx->error_state (of DB_INTERRUPTED, set by innobase_kill_query()) is being returned “as is”. |