This was found by code inspection while debugging
MDEV-7326. It is probably
unrelated to that bug, except for showing similar symptoms.
The problem occurs if the relaylog (or master binlog, from which relay logs
are copied) contains an event group with GTID that is missing the end
There is code in parallel replication that tries to handle this situation, but
it is insufficent, miscounting count_committing_event_groups and
count_queued_event_groups. The result is that the following batch of event
groups will hang waiting for the prior groups to complete, but the 1/2 event
group does not "complete" for the purpose of this wait.
Basically, this code is missing mark_start_commit() (it should be checked if
other stuff is also missing, maybe full finish_event_group()?):
Here is an MTR test case to reproduce the bug. It requires a DBUG patch: