[MDEV-37586] Optimize Conflict Handling for Parallel Replication - Jira

XML

Word

Printable

Details

Type: Task
Status: Open (View Workflow)
Priority: Major
Resolution: Unresolved
Fix Version/s: None
Component/s: Replication
Labels:
None

Epic Link:
Reduce Slave Lag

Description

Conflicts in optimistic/aggressive parallel replication can be improved in innodb.

In the current (aka 12.1) code, conflicts are detected by InnoDB calling thd_rpl_deadlock_check() for every time transaction T1 enqueues a waits for a (row) lock held by T2, or (IIUC) T1 enques a wait for a lock that T2 is already waiting for.

The thd_rpl_deadlock_check() function compares the sub_id of T1 and T2 (if they are parallel replication transactions) to check the commit order; if T1 comes ahead of T2 in commit order, then this is a deadlock, because T2 will eventually need to wait_for_prior_commit on T1.

This requires to kill the T2 transaction to roll it back so that T1 can proceed, preferably as quickly as possible as T1 will be blocking later transactions from committing.

In current code, thd_rpl_deadlock_check() does not directly kill T2; this is because it is not known which mutexes could be held at this point, and calling THD::awake() might cause wrong mutex locking order on LOCK_thd_data or LOCK_thd_kill. Instead, a request to kill is enqueued for another thread. This means the kill is delayed by thread context switch and scheduling.

Some possible improvements:

1. Instead of using the server manager thread, as currently, use a dedicated rollback thread to minimize the delay of the kill. Possibly set a high scheduling priority of this thread, if possible. (This was originally the case, but a patch was made at some point to remove the dedicated thread, a wrong patch IMHO).

2. Ideally, the kill would be done synchronously, from within the thread context of T1 when the conflict is detected, avoiding the overhead of thread context switch. This would require investigating how this can be done safely, possibly trying to do so with eg. mysql_mutex_trylock() and falling back to async kill if it fails.

3. Even better would be if InnoDB in the thread T1 context could directly cancel the transaction T2, steal the conflicting lock and grant it to T1. This way, T1 would not get a wait at all. This would not always be possible I think, depending on what T2 is doing. But in the common case that T2 is in a quiescent state, where it is waiting for another InnoDB row lock or in wait_for_prior_commit, I think it might be possible. At least, I remember seeing something similar done in the InnoDB's own deadlock detector.

4. When multiple transactions T3, T4, T5, ... are waiting on an InnoDB row lock, then when the lock becomes free, InnoDB currently grants the lock to a "random" transaction (I think the first transaction to enqueue the wait, though the details can be more complicated). It would be much preferable if InnoDB could select the first transaction in commit order to grant the lock to (when transactions are parallel replication). Because granting to any later transaction will just result in that transaction eventually getting killed and rolling back, delaying the lock grant that will actually progress replication. IIUC, currently all transactions that conflict would get an async kill scheduled, but thread scheduling could still cause the lock to be granted to the wrong transaction, wasting work.

Attachments

Issue Links

is part of

MDEV-37582 Reduce Slave Lag

Open

Activity

People

Assignee:: Kristian Nielsen

Reporter:: Brandon Nesterenko

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 2025-09-08 19:09

Updated:: 2025-09-09 15:09