Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
10.0.7
-
None
-
None
Description
This failure is a race that is quite rare, but I can usually repeat it with
eg.
./mtr rpl.rpl_gtid_basic --repeat=100 --parallel=6
The problem is a deadlock between MYSQL_BIN_LOG::reset_logs() and
MYSQL_BIN_LOG::mark_xid_done(). The former takes LOCK_log and waits for the
latter to complete. But the latter also tries to take LOCK_log; this can lead
to a deadlock.
There is already code that tries to deal with this, with the flag
reset_master_pending. However, there was still a small opportunity for
deadlock, when an previous mark_xid_done() is still running when reset_logs()
is called and is at the precise point where it first releases LOCK_xid_list
and then re-aquires both LOCK_log and LOCK_xid_list.
Proposed solution: set reset_master_pending in reset_logs() before taking
LOCK_log. And also count how many invocations of LOCK_xid_list are in the
progress of releasing and re-aquiring locks, and in reset_logs() wait for that
number to drop to zero after setting reset_master_pending and before taking
LOCK_log.