Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-18641

"Resource deadlock avoided during COMMIT" when a node leaves the group gracefully

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Incomplete
    • 10.3.12
    • N/A
    • Galera
    • Ubuntu 18.04.2 LTS

    Description

      I have 3 nodes in the cluster (A, B, C) and a sysbench running the OLTP write-only test connecting only to node "A". When the node "B" is being shutdown gracefully with systemctl stop mariadb the sysbench is always immediately interrupted, showing the error lines below (varying number depending on the run).

      Expected: sysbench to be completely unaware of node "B" leaving the group.

      Initializing worker threads...
      Threads started!
      FATAL: mysql_stmt_execute() returned error 1180 (Got error 35 "Resource deadlock avoided" during COMMIT) for query 'COMMIT'
      FATAL: `thread_run' function failed: /usr/share/sysbench/oltp_common.lua:405: SQL error, errno = 1180, state = 'HY000': Got error 35 "Resource deadlock avoided" during COMMIT
      FATAL: mysql_stmt_execute() returned error 1180 (Got error 35 "Resource deadlock avoided" during COMMIT) for query 'COMMIT'
      FATAL: `thread_run' function failed: /usr/share/sysbench/oltp_common.lua:405: SQL error, errno = 1180, state = 'HY000': Got error 35 "Resource deadlock avoided" during COMMIT
      FATAL: mysql_stmt_execute() returned error 1180 (Got error 35 "Resource deadlock avoided" during COMMIT) for query 'COMMIT'
      FATAL: `thread_run' function failed: /usr/share/sysbench/oltp_common.lua:405: SQL error, errno = 1180, state = 'HY000': Got error 35 "Resource deadlock avoided" during COMMIT

      Sysbench invocation:

      sysbench --mysql-user='sbench' \
      --mysql-password='test' \
      --db-driver=mysql \
      --mysql-host=$TEST_HOST_A \
      --mysql-db=$TEST_DB_NAME \
      --time=300 \
      --threads=16 \
      oltp_write_only \
      run

      Here is what any of the other nodes report when node B exits (during the test run):

      Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 0 [Note] WSREP: declaring eb2c2526 at tcp://10.4.153.11:4567 stable
      Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 0 [Note] WSREP: forgetting f3789450 (tcp://10.4.151.8:4567)
      Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 0 [Note] WSREP: Node 33129c3a state prim
      Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 0 [Note] WSREP: view(view_id(PRIM,33129c3a,32) memb {
      Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]:         33129c3a,0
      Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]:         eb2c2526,0
      Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: } joined {
      Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: } left {
      Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: } partitioned {
      Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]:         f3789450,0
      Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: })
      Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 0 [Note] WSREP: save pc into disk
      Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 0 [Note] WSREP: forgetting f3789450 (tcp://10.4.151.8:4567)
      Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 0 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 2
      Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 0 [Note] WSREP: STATE_EXCHANGE: sent state UUID: fc356030-3326-11e9-b854-fa77c4586cf9
      Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 0 [Note] WSREP: STATE EXCHANGE: sent state msg: fc356030-3326-11e9-b854-fa77c4586cf9
      Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 0 [Note] WSREP: STATE EXCHANGE: got state msg: fc356030-3326-11e9-b854-fa77c4586cf9 from 0 (ip-10-4-20-211)
      Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 0 [Note] WSREP: STATE EXCHANGE: got state msg: fc356030-3326-11e9-b854-fa77c4586cf9 from 1 (ip-10-4-153-11)
      Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 0 [Note] WSREP: Quorum results:
      Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]:         version    = 4,
      Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]:         component  = PRIMARY,
      Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]:         conf_id    = 29,
      Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]:         members    = 2/2 (joined/total),
      Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]:         act_id     = 17299676,
      Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]:         last_appl. = 17299555,
      Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]:         protocols  = 0/9/3 (gcs/repl/appl),
      Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]:         group UUID = 3c6c83e4-3286-11e9-945b-e253cb1ea9e0
      Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 0 [Note] WSREP: Flow-control interval: [23, 23]
      Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 0 [Note] WSREP: Trying to continue unpaused monitor
      Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 23 [Note] WSREP: New cluster view: global state: 3c6c83e4-3286-11e9-945b-e253cb1ea9e0:17299676, view# 30: Primary, number of nodes: 2, my index: 0, protocol version 3
      Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 23 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 23 [Note] WSREP: REPL Protocols: 9 (4, 2)
      Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 23 [Note] WSREP: Assign initial position for certification: 17299676, protocol version: 4
      Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 0 [Note] WSREP: Service thread queue flushed.
      Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 122 [Warning] Aborted connection 122 to db: 'sbtest' user: 'sbench' host: 'ip-10-4-20-121.us-east-2.compute.internal' (Got an error reading communication packets)
      Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 115 [Warning] Aborted connection 115 to db: 'sbtest' user: 'sbench' host: 'ip-10-4-20-121.us-east-2.compute.internal' (Got an error reading communication packets)
      Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 124 [Warning] Aborted connection 124 to db: 'sbtest' user: 'sbench' host: 'ip-10-4-20-121.us-east-2.compute.internal' (Got an error reading communication packets)
      Feb 18 02:43:39 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:39 0 [Note] WSREP:  cleaning up f3789450 (tcp://10.4.151.8:4567)

      Attachments

        Issue Links

          Activity

            Let me get this clear: So you have 3 machines A,B,C, right? Sysbench is executing transactions to node A right ? When you shutdown node B, sysbench exists, right ? At that moment there could be a transaction on node A that is replicating it write set to B and C but as B is shutdown it fails. Now, it could be that sysbench is not prepared to that failure. I do not know sysbench code that well as it is not part of our codebase.

            jplindst Jan Lindström (Inactive) added a comment - Let me get this clear: So you have 3 machines A,B,C, right? Sysbench is executing transactions to node A right ? When you shutdown node B, sysbench exists, right ? At that moment there could be a transaction on node A that is replicating it write set to B and C but as B is shutdown it fails. Now, it could be that sysbench is not prepared to that failure. I do not know sysbench code that well as it is not part of our codebase.

            I tested this with 10.3.15 using:

            [jplindst@pitbull mysql-test]$ ./mtr galera_3nodes.galera_parallel_apply_3nodes --start-and-exit
            

            That will start 3 node Galera cluster on same machine. Then I started sysbench like

            ./sysbench --test=oltp_read_write.lua --mysql-socket=/home/jplindst/10.1/mysql-test/var/tmp/mysqld.1.sock --mysql-user=root --max-time=6000 run
            

            And after few seconds it started shutdown the node 2 with

            sudo /usr/local/mysql/bin/mysqladmin -u root -S /home/jplindst/10.1/mysql-test/var/tmp/mysqld.2.sock shutdown
            

            sysbench did not stop or report any errors.

            jplindst Jan Lindström (Inactive) added a comment - - edited I tested this with 10.3.15 using: [jplindst@pitbull mysql-test]$ ./mtr galera_3nodes.galera_parallel_apply_3nodes --start-and-exit That will start 3 node Galera cluster on same machine. Then I started sysbench like ./sysbench --test=oltp_read_write.lua --mysql-socket=/home/jplindst/10.1/mysql-test/var/tmp/mysqld.1.sock --mysql-user=root --max-time=6000 run And after few seconds it started shutdown the node 2 with sudo /usr/local/mysql/bin/mysqladmin -u root -S /home/jplindst/10.1/mysql-test/var/tmp/mysqld.2.sock shutdown sysbench did not stop or report any errors.

            sysbench shouldn't be exposed or be "prepared to handle" anything happening to B or C. As far as it's concerned there are no B or C. It communicates with A's IP alone. The cluster never looses a member ungracefully. When B or C are brought down (one at a time) for maintenance, remaining set (A+B or A+C) should continue w/o any errors reported to the client talking to A. Do you agree?

            We can try again to reproduce (it's been a while) but I want to double-check with you that we are conceptually on the same page. This aside, do you believe that between the version we tried (10.3.12) and yours (10.3.15) a relevant patch might have been applied?

            Thanks!

            slisznia Slawomir Lisznianski added a comment - sysbench shouldn't be exposed or be "prepared to handle" anything happening to B or C. As far as it's concerned there are no B or C. It communicates with A's IP alone. The cluster never looses a member ungracefully. When B or C are brought down (one at a time) for maintenance, remaining set (A+B or A+C) should continue w/o any errors reported to the client talking to A. Do you agree? We can try again to reproduce (it's been a while) but I want to double-check with you that we are conceptually on the same page. This aside, do you believe that between the version we tried (10.3.12) and yours (10.3.15) a relevant patch might have been applied? Thanks!

            Conceptually I tried to reproduce the problem but could not. There is number of bug fixes between those two versions but I could not find anything that would clearly effect here.

            jplindst Jan Lindström (Inactive) added a comment - Conceptually I tried to reproduce the problem but could not. There is number of bug fixes between those two versions but I could not find anything that would clearly effect here.

            I get the same issue on MariaDB Galera Cluster 10.1.38.

            [ 472s ] thds: 64 tps: 4758.44 qps: 28550.66 (r/w/o: 0.00/17304.47/11246.18) lat (ms,95%): 21.11 err/s: 0.00 reconn/s: 0.00
            [ 474s ] thds: 64 tps: 4877.21 qps: 29291.74 (r/w/o: 0.00/17853.59/11438.16) lat (ms,95%): 23.10 err/s: 0.00 reconn/s: 0.00
            FATAL: mysql_stmt_execute() returned error 1180 (Got error 35 "Resource deadlock avoided" during COMMIT) for query 'COMMIT'
            (last message repeated 12 times)
            FATAL: `thread_run' function failed: /usr/share/sysbench/oltp_common.lua:409: SQL error, errno = 1180, state = 'HY000': Got error 35 "Resource deadlock avoided" during COMMIT
            (last message repeated 1 times)
            FATAL: mysql_stmt_execute() returned error 1180 (Got error 35 "Resource deadlock avoided" during COMMIT) for query 'COMMIT'
            (last message repeated 1 times)
            FATAL: `thread_run' function failed: /usr/share/sysbench/oltp_common.lua:409: SQL error, errno = 1180, state = 'HY000': Got error 35 "Resource deadlock avoided" during COMMIT
            (last message repeated 7 times)
            FATAL: mysql_stmt_execute() returned error 1180 (Got error 35 "Resource deadlock avoided" during COMMIT) for query 'COMMIT'
            (last message repeated 1 times)
            FATAL: `thread_run' function failed: /usr/share/sysbench/oltp_common.lua:409: SQL error, errno = 1180, state = 'HY000': Got error 35 "Resource deadlock avoided" during COMMIT
            (last message repeated 1 times)
            FATAL: mysql_stmt_execute() returned error 1180 (Got error 35 "Resource deadlock avoided" during COMMIT) for query 'COMMIT'
            FATAL: `thread_run' function failed: /usr/share/sysbench/oltp_common.lua:409: SQL error, errno = 1180, state = 'HY000': Got error 35 "Resource deadlock avoided" during COMMIT
            FATAL: mysql_stmt_execute() returned error 1180 (Got error 35 "Resource deadlock avoided" during COMMIT) for query 'COMMIT'
            FATAL: `thread_run' function failed: /usr/share/sysbench/oltp_common.lua:409: SQL error, errno = 1180, state = 'HY000': Got error 35 "Resource deadlock avoided" during COMMIT
            (last message repeated 1 times)
            FATAL: mysql_stmt_execute() returned error 1180 (Got error 35 "Resource deadlock avoided" during COMMIT) for query 'COMMIT'
            FATAL: `thread_run' function failed: /usr/share/sysbench/oltp_common.lua:409: SQL error, errno = 1180, state = 'HY000': Got error 35 "Resource deadlock avoided" during COMMIT
            (last message repeated 1 times)
            FATAL: mysql_stmt_execute() returned error 1180 (Got error 35 "Resource deadlock avoided" during COMMIT) for query 'COMMIT'
            (last message repeated 1 times)
            FATAL: `thread_run' function failed: /usr/share/sysbench/oltp_common.lua:409: SQL error, errno = 1180, state = 'HY000': Got error 35 "Resource deadlock avoided" during COMMIT
            FATAL: mysql_stmt_execute() returned error 1180 (Got error 35 "Resource deadlock avoided" during COMMIT) for query 'COMMIT'
            FATAL: `thread_run' function failed: /usr/share/sysbench/oltp_common.lua:409: SQL error, errno = 1180, state = 'HY000': Got error 35 "Resource deadlock avoided" during COMMIT
            FATAL: mysql_stmt_execute() returned error 1180 (Got error 35 "Resource deadlock avoided" during COMMIT) for query 'COMMIT'
            (last message repeated 1 times)
            FATAL: `thread_run' function failed: /usr/share/sysbench/oltp_common.lua:409: SQL error, errno = 1180, state = 'HY000': Got error 35 "Resource deadlock avoided" during COMMIT
            Error in my_thread_global_end(): 25 threads didn't exit
            

            kam1kaze Oleksii Kravchenko added a comment - I get the same issue on MariaDB Galera Cluster 10.1.38. [ 472s ] thds: 64 tps: 4758.44 qps: 28550.66 (r/w/o: 0.00/17304.47/11246.18) lat (ms,95%): 21.11 err/s: 0.00 reconn/s: 0.00 [ 474s ] thds: 64 tps: 4877.21 qps: 29291.74 (r/w/o: 0.00/17853.59/11438.16) lat (ms,95%): 23.10 err/s: 0.00 reconn/s: 0.00 FATAL: mysql_stmt_execute() returned error 1180 (Got error 35 "Resource deadlock avoided" during COMMIT) for query 'COMMIT' (last message repeated 12 times) FATAL: `thread_run' function failed: /usr/share/sysbench/oltp_common.lua:409: SQL error, errno = 1180, state = 'HY000': Got error 35 "Resource deadlock avoided" during COMMIT (last message repeated 1 times) FATAL: mysql_stmt_execute() returned error 1180 (Got error 35 "Resource deadlock avoided" during COMMIT) for query 'COMMIT' (last message repeated 1 times) FATAL: `thread_run' function failed: /usr/share/sysbench/oltp_common.lua:409: SQL error, errno = 1180, state = 'HY000': Got error 35 "Resource deadlock avoided" during COMMIT (last message repeated 7 times) FATAL: mysql_stmt_execute() returned error 1180 (Got error 35 "Resource deadlock avoided" during COMMIT) for query 'COMMIT' (last message repeated 1 times) FATAL: `thread_run' function failed: /usr/share/sysbench/oltp_common.lua:409: SQL error, errno = 1180, state = 'HY000': Got error 35 "Resource deadlock avoided" during COMMIT (last message repeated 1 times) FATAL: mysql_stmt_execute() returned error 1180 (Got error 35 "Resource deadlock avoided" during COMMIT) for query 'COMMIT' FATAL: `thread_run' function failed: /usr/share/sysbench/oltp_common.lua:409: SQL error, errno = 1180, state = 'HY000': Got error 35 "Resource deadlock avoided" during COMMIT FATAL: mysql_stmt_execute() returned error 1180 (Got error 35 "Resource deadlock avoided" during COMMIT) for query 'COMMIT' FATAL: `thread_run' function failed: /usr/share/sysbench/oltp_common.lua:409: SQL error, errno = 1180, state = 'HY000': Got error 35 "Resource deadlock avoided" during COMMIT (last message repeated 1 times) FATAL: mysql_stmt_execute() returned error 1180 (Got error 35 "Resource deadlock avoided" during COMMIT) for query 'COMMIT' FATAL: `thread_run' function failed: /usr/share/sysbench/oltp_common.lua:409: SQL error, errno = 1180, state = 'HY000': Got error 35 "Resource deadlock avoided" during COMMIT (last message repeated 1 times) FATAL: mysql_stmt_execute() returned error 1180 (Got error 35 "Resource deadlock avoided" during COMMIT) for query 'COMMIT' (last message repeated 1 times) FATAL: `thread_run' function failed: /usr/share/sysbench/oltp_common.lua:409: SQL error, errno = 1180, state = 'HY000': Got error 35 "Resource deadlock avoided" during COMMIT FATAL: mysql_stmt_execute() returned error 1180 (Got error 35 "Resource deadlock avoided" during COMMIT) for query 'COMMIT' FATAL: `thread_run' function failed: /usr/share/sysbench/oltp_common.lua:409: SQL error, errno = 1180, state = 'HY000': Got error 35 "Resource deadlock avoided" during COMMIT FATAL: mysql_stmt_execute() returned error 1180 (Got error 35 "Resource deadlock avoided" during COMMIT) for query 'COMMIT' (last message repeated 1 times) FATAL: `thread_run' function failed: /usr/share/sysbench/oltp_common.lua:409: SQL error, errno = 1180, state = 'HY000': Got error 35 "Resource deadlock avoided" during COMMIT Error in my_thread_global_end(): 25 threads didn't exit

            Can you share your configuration files and error logs from all nodes. Furthermore, can you repeat this if you use --wsrep-debug=ON ? I still think this is not a bug. Let me explain.

            Lets assume we have write transaction on node A. In commit Galera will send writeset of this transaction to nodes B and C.
            Node B starts to certify this writeset but during this certify node is shutdown. Node A that waits acknowledgement from nodes B and C that they have certified the writeset received a error that node B is not anymore on gluster (here it does not yet matter did C already replied its certify result). Node B should replay the writeset but as it is not anymore on cluster it cant. So whole transaction is aborted during commit. Clearly, sysbench (what version you are using ?) is not prepared for this.

            jplindst Jan Lindström (Inactive) added a comment - Can you share your configuration files and error logs from all nodes. Furthermore, can you repeat this if you use --wsrep-debug=ON ? I still think this is not a bug. Let me explain. Lets assume we have write transaction on node A. In commit Galera will send writeset of this transaction to nodes B and C. Node B starts to certify this writeset but during this certify node is shutdown. Node A that waits acknowledgement from nodes B and C that they have certified the writeset received a error that node B is not anymore on gluster (here it does not yet matter did C already replied its certify result). Node B should replay the writeset but as it is not anymore on cluster it cant. So whole transaction is aborted during commit. Clearly, sysbench (what version you are using ?) is not prepared for this.

            @Jan RE. "I still think this is not a bug. Let me explain." - Let me state for the record that a similar tech (MySQL Cluster) handles it gracefully. As long as the quorum remains it shouldn't really matter if either node B or C are shutdown or crash in the middle of a commit or have a network timeout. A client talking to node A should not be exposed to a wsrep-detected error under such conditions. To recap the conditions: node A is alive and B OR C are alive during the test. The majority are fine thus client should be fine too. If what you think was true, then increasing the number of nodes in the cluster (5, 7, etc) would increase the probability of a client exposed error, which would be counter-productive towards achieving fault-tolerance.

            slisznia Slawomir Lisznianski added a comment - @Jan RE. "I still think this is not a bug. Let me explain." - Let me state for the record that a similar tech (MySQL Cluster) handles it gracefully. As long as the quorum remains it shouldn't really matter if either node B or C are shutdown or crash in the middle of a commit or have a network timeout. A client talking to node A should not be exposed to a wsrep-detected error under such conditions. To recap the conditions: node A is alive and B OR C are alive during the test. The majority are fine thus client should be fine too. If what you think was true, then increasing the number of nodes in the cluster (5, 7, etc) would increase the probability of a client exposed error, which would be counter-productive towards achieving fault-tolerance.

            I suggest that you use more recent version of 10.3 and Galera library 25.3.30 and try to reproduce using --wsrep-debug=1 and if problem reproduces please provide full error logs from all nodes.

            jplindst Jan Lindström (Inactive) added a comment - I suggest that you use more recent version of 10.3 and Galera library 25.3.30 and try to reproduce using --wsrep-debug=1 and if problem reproduces please provide full error logs from all nodes.

            People

              jplindst Jan Lindström (Inactive)
              slisznia Slawomir Lisznianski
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.