[MDEV-7121] Parallel slave may hang if master crashes in the middle of writing transaction to binlog Created: 2014-11-17  Updated: 2014-11-17  Resolved: 2014-11-17

Status: Closed
Project: MariaDB Server
Component/s: Replication
Affects Version/s: 10.0.14
Fix Version/s: 10.0.15

Type: Bug Priority: Major
Reporter: Kristian Nielsen Assignee: Kristian Nielsen
Resolution: Fixed Votes: 0
Labels: parallelslave, replication

Issue Links:
Relates
relates to MDEV-7079 rpl.rpl_parallel_temptable fails spor... Closed

 Description   

This bug happens on the slave, when a binlog from a master ends with a
partially written event group (that has BEGIN but is missing COMMIT, eg).
Such partial event group occurs if the master crashes in the middle of writing
to the binlog.

The slave detects this when the restart format description event in the
following binlog file is received. A worker thread that is in the middle of
replicating the partial event group must be notified so that it can roll back
the transaction.

The bug was that this notification could be lost, depending on thread
scheduling. If lost, the worker thread would then wait indefinitely for the
rest of the transaction to arrive, and the SQL thread in turn would wait for
the worker thread to complete the rollback, deadlocking the slave.

This bug is likely what was seen by a user in a hard-to-reproduce hang.

It is also the cause of the sporadic failure in Buildbot in MDEV-7079.



 Comments   
Comment by Kristian Nielsen [ 2014-11-17 ]

http://lists.askmonty.org/pipermail/commits/2014-November/007034.html

Generated at Thu Feb 08 07:17:09 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.