Details
-
Type:
Bug
-
Status: Closed (View Workflow)
-
Priority:
Major
-
Resolution: Fixed
-
Affects Version/s: 10.0.7
-
Fix Version/s: 10.0.9
-
Component/s: None
-
Labels:None
Description
This failure is a race that is quite rare, but I can usually repeat it with
eg.
./mtr rpl.rpl_gtid_basic --repeat=100 --parallel=6
The problem is a deadlock between MYSQL_BIN_LOG::reset_logs() and
MYSQL_BIN_LOG::mark_xid_done(). The former takes LOCK_log and waits for the
latter to complete. But the latter also tries to take LOCK_log; this can lead
to a deadlock.
There is already code that tries to deal with this, with the flag
reset_master_pending. However, there was still a small opportunity for
deadlock, when an previous mark_xid_done() is still running when reset_logs()
is called and is at the precise point where it first releases LOCK_xid_list
and then re-aquires both LOCK_log and LOCK_xid_list.
Proposed solution: set reset_master_pending in reset_logs() before taking
LOCK_log. And also count how many invocations of LOCK_xid_list are in the
progress of releasing and re-aquiring locks, and in reset_logs() wait for that
number to drop to zero after setting reset_master_pending and before taking
LOCK_log.