In parallel replication, if T2 blocks T1 on an InnoDB row lock, we deadlock
kill T2 and retry it.
If T2 already started to commit, it might have done mark_start_commit() at the
point where it is deadlock killed. In retry_event_group(), we do
unmark_start_commit() before doing rollback. The idea is that T1 cannot
reach its own mark_start_commit() until T2 does rollback. So we are sure to
get unmark_start_commit() in T2 before mark_start_commit() in T1. This way, a
following T3 will not start running until the retry of T2 has completed.
But this turns out not to work as expected. The reason is that
ha_commit_trans() does a rollback if the commit fails.
Thus we can have the following situation:
1. T2 starts committing, it is waiting in queue_for_group_commit() for T1 to
2. We detect the deadlock, we kill T2. T2 returns error from log_and_order(),
and ha_commit_trans() does ha_rollback_trans().
3. T1 can proceed due to the rollback, and itself does mark_start_commit().
4. T3 sees that T1 and T2 both started to commit, and starts executing.
5. T2 does unmark_start_commit(). At this point, T2 and T3 are running in
parallel, even though they should not, as they are from different group
commits on the master.
It was first thought that this condition does not cause any user-visible
problems (after fix of
MDEV-7326). However, MDEV-8302 shows one example
where this can cause replication to fail. If T2 deletes a row with the same
unique key value that T3 inserts, then running T3 in parallel with T2 can
cause T3 to fail with a duplicate key error. Other similar scenarios could
cause various failures from running T3 too early.
Maybe we need a check in ha_commit_trans() to not rollback in case of parallel
replication deadlock kill...