Details
-
Task
-
Status: Open (View Workflow)
-
Minor
-
Resolution: Unresolved
-
None
-
None
Description
Currently, when parallel replication needs to retry a transaction due to
deadlock or other error, it needs to re-open the relay log file, seek to the
correct position, and re-read the events from the file.
This is necessary in the general case, where a transaction may be huge and
not fit in memory. But in the usual case, this is wasteful, as the events are
likely to already be available in-memory in the list of events queued up for
the worker thread.
This list is only freed in batches for efficiency, so in most cases the events
will still be in the list when a transaction needs to be retried.
Transaction retry efficiency becomes somewhat more important with MDEV-6676,
speculative parallel replication. Thus, it might be worthwhile to implement a
simple facility for this.
Say, the worker thread, when freeing queued events, will keep around the last
event group unless it would require more than (--slave-max-queued-events/3)
bytes of memory. Then in transaction retry, if the entire transaction to be
retried is still in the queue, execute the events from out of there, rather
than re-opening and reading the relay log file.
The main problem with this approach is testing. The code that reads events
from relay log during retry will be executed very rarely, so fatal bugs
could hide and be very hard to deal with when they finally turn up. I think
some DBUG injection should be used to ensure that existing retry mtr test
cases will exercise the rare relay-log-read code path in this case and keep
some testing of this code.