[MDEV-18641] "Resource deadlock avoided during COMMIT" when a node leaves the group gracefully - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Incomplete
Affects Version/s: 10.3.12
Fix Version/s: N/A
Component/s: Galera
Labels:
- need_feedback
Environment:
Ubuntu 18.04.2 LTS

Description

I have 3 nodes in the cluster (A, B, C) and a sysbench running the OLTP write-only test connecting only to node "A". When the node "B" is being shutdown gracefully with systemctl stop mariadb the sysbench is always immediately interrupted, showing the error lines below (varying number depending on the run).

Expected: sysbench to be completely unaware of node "B" leaving the group.

Initializing worker threads...

Threads started!

FATAL: mysql_stmt_execute() returned error 1180 (Got error 35 "Resource deadlock avoided" during COMMIT) for query 'COMMIT'

FATAL: `thread_run' function failed: /usr/share/sysbench/oltp_common.lua:405: SQL error, errno = 1180, state = 'HY000': Got error 35 "Resource deadlock avoided" during COMMIT

FATAL: mysql_stmt_execute() returned error 1180 (Got error 35 "Resource deadlock avoided" during COMMIT) for query 'COMMIT'

FATAL: `thread_run' function failed: /usr/share/sysbench/oltp_common.lua:405: SQL error, errno = 1180, state = 'HY000': Got error 35 "Resource deadlock avoided" during COMMIT

FATAL: mysql_stmt_execute() returned error 1180 (Got error 35 "Resource deadlock avoided" during COMMIT) for query 'COMMIT'

FATAL: `thread_run' function failed: /usr/share/sysbench/oltp_common.lua:405: SQL error, errno = 1180, state = 'HY000': Got error 35 "Resource deadlock avoided" during COMMIT

Sysbench invocation:

sysbench --mysql-user='sbench' \

--mysql-password='test' \

--db-driver=mysql \

--mysql-host=$TEST_HOST_A \

--mysql-db=$TEST_DB_NAME \

--time=300 \

--threads=16 \

oltp_write_only \

run

Here is what any of the other nodes report when node B exits (during the test run):

Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 0 [Note] WSREP: declaring eb2c2526 at tcp://10.4.153.11:4567 stable

Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 0 [Note] WSREP: forgetting f3789450 (tcp://10.4.151.8:4567)

Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 0 [Note] WSREP: Node 33129c3a state prim

Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 0 [Note] WSREP: view(view_id(PRIM,33129c3a,32) memb {

Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]:         33129c3a,0

Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]:         eb2c2526,0

Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: } joined {

Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: } left {

Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: } partitioned {

Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]:         f3789450,0

Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: })

Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 0 [Note] WSREP: save pc into disk

Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 0 [Note] WSREP: forgetting f3789450 (tcp://10.4.151.8:4567)

Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 0 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 2

Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 0 [Note] WSREP: STATE_EXCHANGE: sent state UUID: fc356030-3326-11e9-b854-fa77c4586cf9

Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 0 [Note] WSREP: STATE EXCHANGE: sent state msg: fc356030-3326-11e9-b854-fa77c4586cf9

Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 0 [Note] WSREP: STATE EXCHANGE: got state msg: fc356030-3326-11e9-b854-fa77c4586cf9 from 0 (ip-10-4-20-211)

Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 0 [Note] WSREP: STATE EXCHANGE: got state msg: fc356030-3326-11e9-b854-fa77c4586cf9 from 1 (ip-10-4-153-11)

Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 0 [Note] WSREP: Quorum results:

Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]:         version    = 4,

Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]:         component  = PRIMARY,

Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]:         conf_id    = 29,

Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]:         members    = 2/2 (joined/total),

Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]:         act_id     = 17299676,

Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]:         last_appl. = 17299555,

Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]:         protocols  = 0/9/3 (gcs/repl/appl),

Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]:         group UUID = 3c6c83e4-3286-11e9-945b-e253cb1ea9e0

Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 0 [Note] WSREP: Flow-control interval: [23, 23]

Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 0 [Note] WSREP: Trying to continue unpaused monitor

Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 23 [Note] WSREP: New cluster view: global state: 3c6c83e4-3286-11e9-945b-e253cb1ea9e0:17299676, view# 30: Primary, number of nodes: 2, my index: 0, protocol version 3

Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 23 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.

Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 23 [Note] WSREP: REPL Protocols: 9 (4, 2)

Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 23 [Note] WSREP: Assign initial position for certification: 17299676, protocol version: 4

Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 0 [Note] WSREP: Service thread queue flushed.

Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 122 [Warning] Aborted connection 122 to db: 'sbtest' user: 'sbench' host: 'ip-10-4-20-121.us-east-2.compute.internal' (Got an error reading communication packets)

Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 115 [Warning] Aborted connection 115 to db: 'sbtest' user: 'sbench' host: 'ip-10-4-20-121.us-east-2.compute.internal' (Got an error reading communication packets)

Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 124 [Warning] Aborted connection 124 to db: 'sbtest' user: 'sbench' host: 'ip-10-4-20-121.us-east-2.compute.internal' (Got an error reading communication packets)

Feb 18 02:43:39 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:39 0 [Note] WSREP:  cleaning up f3789450 (tcp://10.4.151.8:4567)

Attachments

Issue Links

relates to

MDEV-16936 Master and Galera Cluster as slave

Closed

MDEV-19572 Async slave on Galera node stops due to 1942 Error during COMMIT: failed to update GTID state in mysql.gtid_slave_pos: 0:

Closed

"Resource deadlock avoided during COMMIT" when a node leaves the group gracefully

Details

Description

Attachments

Issue Links

Activity

People

Dates