Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-18641

"Resource deadlock avoided during COMMIT" when a node leaves the group gracefully

    Details

    • Type: Bug
    • Status: Open (View Workflow)
    • Priority: Critical
    • Resolution: Unresolved
    • Affects Version/s: 10.3.12
    • Fix Version/s: 10.3, 10.4
    • Component/s: Galera
    • Labels:
      None
    • Environment:
      Ubuntu 18.04.2 LTS

      Description

      I have 3 nodes in the cluster (A, B, C) and a sysbench running the OLTP write-only test connecting only to node "A". When the node "B" is being shutdown gracefully with systemctl stop mariadb the sysbench is always immediately interrupted, showing the error lines below (varying number depending on the run).

      Expected: sysbench to be completely unaware of node "B" leaving the group.

      Initializing worker threads...
      Threads started!
      FATAL: mysql_stmt_execute() returned error 1180 (Got error 35 "Resource deadlock avoided" during COMMIT) for query 'COMMIT'
      FATAL: `thread_run' function failed: /usr/share/sysbench/oltp_common.lua:405: SQL error, errno = 1180, state = 'HY000': Got error 35 "Resource deadlock avoided" during COMMIT
      FATAL: mysql_stmt_execute() returned error 1180 (Got error 35 "Resource deadlock avoided" during COMMIT) for query 'COMMIT'
      FATAL: `thread_run' function failed: /usr/share/sysbench/oltp_common.lua:405: SQL error, errno = 1180, state = 'HY000': Got error 35 "Resource deadlock avoided" during COMMIT
      FATAL: mysql_stmt_execute() returned error 1180 (Got error 35 "Resource deadlock avoided" during COMMIT) for query 'COMMIT'
      FATAL: `thread_run' function failed: /usr/share/sysbench/oltp_common.lua:405: SQL error, errno = 1180, state = 'HY000': Got error 35 "Resource deadlock avoided" during COMMIT

      Sysbench invocation:

      sysbench --mysql-user='sbench' \
      --mysql-password='test' \
      --db-driver=mysql \
      --mysql-host=$TEST_HOST_A \
      --mysql-db=$TEST_DB_NAME \
      --time=300 \
      --threads=16 \
      oltp_write_only \
      run

      Here is what any of the other nodes report when node B exits (during the test run):

      Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 0 [Note] WSREP: declaring eb2c2526 at tcp://10.4.153.11:4567 stable
      Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 0 [Note] WSREP: forgetting f3789450 (tcp://10.4.151.8:4567)
      Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 0 [Note] WSREP: Node 33129c3a state prim
      Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 0 [Note] WSREP: view(view_id(PRIM,33129c3a,32) memb {
      Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]:         33129c3a,0
      Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]:         eb2c2526,0
      Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: } joined {
      Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: } left {
      Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: } partitioned {
      Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]:         f3789450,0
      Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: })
      Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 0 [Note] WSREP: save pc into disk
      Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 0 [Note] WSREP: forgetting f3789450 (tcp://10.4.151.8:4567)
      Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 0 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 2
      Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 0 [Note] WSREP: STATE_EXCHANGE: sent state UUID: fc356030-3326-11e9-b854-fa77c4586cf9
      Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 0 [Note] WSREP: STATE EXCHANGE: sent state msg: fc356030-3326-11e9-b854-fa77c4586cf9
      Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 0 [Note] WSREP: STATE EXCHANGE: got state msg: fc356030-3326-11e9-b854-fa77c4586cf9 from 0 (ip-10-4-20-211)
      Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 0 [Note] WSREP: STATE EXCHANGE: got state msg: fc356030-3326-11e9-b854-fa77c4586cf9 from 1 (ip-10-4-153-11)
      Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 0 [Note] WSREP: Quorum results:
      Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]:         version    = 4,
      Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]:         component  = PRIMARY,
      Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]:         conf_id    = 29,
      Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]:         members    = 2/2 (joined/total),
      Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]:         act_id     = 17299676,
      Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]:         last_appl. = 17299555,
      Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]:         protocols  = 0/9/3 (gcs/repl/appl),
      Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]:         group UUID = 3c6c83e4-3286-11e9-945b-e253cb1ea9e0
      Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 0 [Note] WSREP: Flow-control interval: [23, 23]
      Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 0 [Note] WSREP: Trying to continue unpaused monitor
      Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 23 [Note] WSREP: New cluster view: global state: 3c6c83e4-3286-11e9-945b-e253cb1ea9e0:17299676, view# 30: Primary, number of nodes: 2, my index: 0, protocol version 3
      Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 23 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 23 [Note] WSREP: REPL Protocols: 9 (4, 2)
      Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 23 [Note] WSREP: Assign initial position for certification: 17299676, protocol version: 4
      Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 0 [Note] WSREP: Service thread queue flushed.
      Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 122 [Warning] Aborted connection 122 to db: 'sbtest' user: 'sbench' host: 'ip-10-4-20-121.us-east-2.compute.internal' (Got an error reading communication packets)
      Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 115 [Warning] Aborted connection 115 to db: 'sbtest' user: 'sbench' host: 'ip-10-4-20-121.us-east-2.compute.internal' (Got an error reading communication packets)
      Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 124 [Warning] Aborted connection 124 to db: 'sbtest' user: 'sbench' host: 'ip-10-4-20-121.us-east-2.compute.internal' (Got an error reading communication packets)
      Feb 18 02:43:39 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:39 0 [Note] WSREP:  cleaning up f3789450 (tcp://10.4.151.8:4567)

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                jplindst Jan Lindström
                Reporter:
                slisznia Slawomir Lisznianski
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated: