Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-7882

Excessive transaction retry in parallel replication




      This problem was discovered as part of MDEV-7847. But these are two logically
      distinct problems (slave threads hanging vs. excessive transaction retry), so
      filing a distinct bug to keep the separation.

      If conflicting transactions T1 and T2 are run in parallel, then we may need to
      deadlock kill T2 if it is holding a row lock that T1 needs. However, there is
      no guarantee that T1 will get the lock when T2 is rolled back. If we are
      unlucky, T2 may have time to re-take the lock, requiring another deadlock

      In fact, in the scenario that discovered MDEV-7847, as well as in testing
      while working on that bug, we easily saw T2 ending up retrying 10 times, in
      cases where there were many conflicting transactions executed in
      parallel. This typically results in replication stopping with an error (10 is
      the default maximum retries allowed).

      In 10.1 "optimistic" mode, this problem is actually taken care of. After the
      first deadlock kill of T2, it will execute wait_for_prior_commit() before
      making a retry. This ensures that any earlier transactions that might conflict
      will be allowed to get the locks and complete before the retry of T2, thus
      avoiding the need for multiple retries.

      So in "conservative" mode (and in 10.0), we should just do the same wait
      before retry of T2. In conservative mode, conflicts are very rare, so there is
      no performance considerations to not do it, and it avoids this potential
      problem with excessive retries.




            knielsen Kristian Nielsen
            knielsen Kristian Nielsen
            0 Vote for this issue
            0 Start watching this issue



              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.