Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Fixed
    • 10.1.38, 10.3.23
    • 10.2.35, 10.3.26
    • Galera
    • None
    • MariaDB Server MariaDB-10.1.38 & Galera cluster 3.29

    Description

      Customer created test case to simulate production stuck when running percona backup.
      This script should be run on 3 node galera cluster without maxscale. All nodes should be master What this script does is that.

      0) ./dealock.sh <user> <password>
      1) create test database and 2 tables.
      2) insert rows into 2 tables from node1
      3) run large select query from node1 and run flush tables there
      4) run insert both 2 tables from node2 and node3 at the same time
      5) run flush tables from node3
      6) run insert again both 2 tables from node2 and node3 at the same time
      7) stop the server from node1
      8) check both node2 and node3 processlist.

      Once this script finished, both node2 and node3 got stuck so nothing could be run until restarting all nodes.

      But the interesting thing is that node2 and node3 do not stuck if node1 is in wsrep_local_index=0 state. So customer wants to understand why wsrep_local_index=0 node does not cause any stuck if it's server stopped.

      Attachments

        1. deadlock.sh
          4 kB
        2. perf102_galera.cnf
          2 kB
        3. perf102_server.cnf
          3 kB
        4. perf202_galera.cnf
          2 kB
        5. perf202_server.cnf
          3 kB
        6. perf402_galera.cnf
          2 kB
        7. perf402_server.cnf
          3 kB

        Activity

          jfdignard JF D added a comment -

          I reproduced this issue on a 3-nodes cluster running MariaDB 10.3.23 and Galera 3.29. I executed the script (deadlock.sh) on the node with wsrep_local_index=0 (master node). However when running the script on another node where wsrep_local_index is not 0 (slave) the cluster deadlock did not occur.

          jfdignard JF D added a comment - I reproduced this issue on a 3-nodes cluster running MariaDB 10.3.23 and Galera 3.29. I executed the script (deadlock.sh) on the node with wsrep_local_index=0 (master node). However when running the script on another node where wsrep_local_index is not 0 (slave) the cluster deadlock did not occur.

          MariaDB 10.4 and later version are not affected.

          jplindst Jan Lindström (Inactive) added a comment - MariaDB 10.4 and later version are not affected.

          People

            jplindst Jan Lindström (Inactive)
            allen.lee@mariadb.com Allen Lee (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.