Details
-
New Feature
-
Status: In Progress (View Workflow)
-
Major
-
Resolution: Unresolved
-
None
Description
When the is a problem with optimistic parallel replication, such as excessive conflicts or hang/slowness due to bugs, there is very little information available to investigate the problem. This causes debugging the problem to be extremely difficult, and many occurrences end up being ignored that could have been used to track down a bug and fix to the benefit of all users.
A simple idea that should greatly improve this is to implement an option --slave-parallel-print-all-deadlocks, inspired by --innodb-print-all-deadlocks. This option, when enabled, will output additional information in the error log about parallel replication conflicts:
- When a conflict is detected, the blocked GTID as well as the blocking GTID to be aborted, along with their associated worker thread states and active query.
- When an event group needs to retry, a dump of the chain of wait-for-prior-commit threads, and the SHOW ENGINE STATUS for all engines participating in the transaction.
Two cases are of particular interest; this is when an event group needs to retry due to a lock wait timeout, or needs to retry more than once. This is not expected to happen in normal operation, and might indicate a bug, so it will be useful to be able to enable the new option only for these cases, to make it feasible to have it enabled always in production environment.
The option should also be possible to enable for all conflicts, which will be useful to get information during specific investigations, but might produce too much output/overhead in normal use.