Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-7121

Parallel slave may hang if master crashes in the middle of writing transaction to binlog

    XMLWordPrintable

    Details

      Description

      This bug happens on the slave, when a binlog from a master ends with a
      partially written event group (that has BEGIN but is missing COMMIT, eg).
      Such partial event group occurs if the master crashes in the middle of writing
      to the binlog.

      The slave detects this when the restart format description event in the
      following binlog file is received. A worker thread that is in the middle of
      replicating the partial event group must be notified so that it can roll back
      the transaction.

      The bug was that this notification could be lost, depending on thread
      scheduling. If lost, the worker thread would then wait indefinitely for the
      rest of the transaction to arrive, and the SQL thread in turn would wait for
      the worker thread to complete the rollback, deadlocking the slave.

      This bug is likely what was seen by a user in a hard-to-reproduce hang.

      It is also the cause of the sporadic failure in Buildbot in MDEV-7079.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              knielsen Kristian Nielsen
              Reporter:
              knielsen Kristian Nielsen
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: