Details
-
Task
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Duplicate
-
None
Description
In MDEV-27161 we introduced a way to prevent SQL thread from waiting for too long and stop with error when slave_max_statement_time is reached. This allows DBA to notice a problematic slave and step in with some actions. The problem is that is one of typical real life cases of local transaction or SELECT running for too long (and holding the metadata lock) and blocking replication the only useful action would be to terminate such a conflicting transaction or statement. This is what we do with Galera replication where applier thread uses "brute force" to terminate conflicting local statements.
We need a way to automate such actions for larger setups with many instances and add an option to terminate specific transaction that caused SQL thread to wait (after waiting for slave_max_statement_time or immediately), so that replication gets priority and continues at the cost of errors in other, local threads.
Cases of multi-source and parallel replication should also be carefully considered while implementing this. Proper logging options (similar to those we have in Galera) should also be added, to help understand the reason of error local connection to slave may get.
Attachments
Issue Links
- duplicates
-
MDEV-27131 Add option for SQL thread to kill any blocking local activity
- Closed
- is duplicated by
-
MDEV-34857 Implement --slave-abort-blocking-timeout
- Closed