Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-29293

MariaDB stuck on starting commit state (waiting on commit order critical section)

    XMLWordPrintable

Details

    Description

      In an environment running Galera Cluster with 6 MariaDB nodes, 1 arbitrator node, some replicas and a ProxySQL, after a network issue that triggered a state transfer on two nodes,
      for some reason, almost all the transactions hang in:

      • “starting” state on the commit statement or on "".
      • "acquiring total order isolation" on the "KILL CONNECTION" statement (The "KILL CONNECTION" was requested by the ProxySQL)
        We tried to restart the service but it hangs on stopping, ProxySQL detected this node as down and switched the traffic to another node.

      By looking at the backtrace it seems that we have a kind of "pthread_cond_wait() deadlock" executed by lock.wait() on the enter() function on the commit monitor during the commit order critical section.

      Unfortunately, we didn't find a way to reproduce the problem

      Attachments

        1. backtraces.txt
          315 kB
        2. innodb_status.txt
          67 kB
        3. process_list.txt
          467 kB
        4. processlist.png
          processlist.png
          701 kB
        5. process-list-sample.txt
          2 kB

        Issue Links

          Activity

            People

              sysprg Julius Goryavsky
              williamwelter William Welter
              Votes:
              5 Vote for this issue
              Watchers:
              25 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.