[MDEV-18641] "Resource deadlock avoided during COMMIT" when a node leaves the group gracefully Created: 2019-02-19  Updated: 2020-09-25  Resolved: 2020-09-25

Status: Closed
Project: MariaDB Server
Component/s: Galera
Affects Version/s: 10.3.12
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Slawomir Lisznianski Assignee: Jan Lindström (Inactive)
Resolution: Incomplete Votes: 0
Labels: need_feedback
Environment:

Ubuntu 18.04.2 LTS


Issue Links:
Relates
relates to MDEV-16936 Master and Galera Cluster as slave Closed
relates to MDEV-19572 Async slave on Galera node stops due ... Closed

 Description   

I have 3 nodes in the cluster (A, B, C) and a sysbench running the OLTP write-only test connecting only to node "A". When the node "B" is being shutdown gracefully with systemctl stop mariadb the sysbench is always immediately interrupted, showing the error lines below (varying number depending on the run).

Expected: sysbench to be completely unaware of node "B" leaving the group.

Initializing worker threads...
Threads started!
FATAL: mysql_stmt_execute() returned error 1180 (Got error 35 "Resource deadlock avoided" during COMMIT) for query 'COMMIT'
FATAL: `thread_run' function failed: /usr/share/sysbench/oltp_common.lua:405: SQL error, errno = 1180, state = 'HY000': Got error 35 "Resource deadlock avoided" during COMMIT
FATAL: mysql_stmt_execute() returned error 1180 (Got error 35 "Resource deadlock avoided" during COMMIT) for query 'COMMIT'
FATAL: `thread_run' function failed: /usr/share/sysbench/oltp_common.lua:405: SQL error, errno = 1180, state = 'HY000': Got error 35 "Resource deadlock avoided" during COMMIT
FATAL: mysql_stmt_execute() returned error 1180 (Got error 35 "Resource deadlock avoided" during COMMIT) for query 'COMMIT'
FATAL: `thread_run' function failed: /usr/share/sysbench/oltp_common.lua:405: SQL error, errno = 1180, state = 'HY000': Got error 35 "Resource deadlock avoided" during COMMIT

Sysbench invocation:

sysbench --mysql-user='sbench' \
--mysql-password='test' \
--db-driver=mysql \
--mysql-host=$TEST_HOST_A \
--mysql-db=$TEST_DB_NAME \
--time=300 \
--threads=16 \
oltp_write_only \
run

Here is what any of the other nodes report when node B exits (during the test run):

Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 0 [Note] WSREP: declaring eb2c2526 at tcp://10.4.153.11:4567 stable
Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 0 [Note] WSREP: forgetting f3789450 (tcp://10.4.151.8:4567)
Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 0 [Note] WSREP: Node 33129c3a state prim
Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 0 [Note] WSREP: view(view_id(PRIM,33129c3a,32) memb {
Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]:         33129c3a,0
Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]:         eb2c2526,0
Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: } joined {
Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: } left {
Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: } partitioned {
Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]:         f3789450,0
Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: })
Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 0 [Note] WSREP: save pc into disk
Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 0 [Note] WSREP: forgetting f3789450 (tcp://10.4.151.8:4567)
Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 0 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 2
Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 0 [Note] WSREP: STATE_EXCHANGE: sent state UUID: fc356030-3326-11e9-b854-fa77c4586cf9
Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 0 [Note] WSREP: STATE EXCHANGE: sent state msg: fc356030-3326-11e9-b854-fa77c4586cf9
Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 0 [Note] WSREP: STATE EXCHANGE: got state msg: fc356030-3326-11e9-b854-fa77c4586cf9 from 0 (ip-10-4-20-211)
Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 0 [Note] WSREP: STATE EXCHANGE: got state msg: fc356030-3326-11e9-b854-fa77c4586cf9 from 1 (ip-10-4-153-11)
Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 0 [Note] WSREP: Quorum results:
Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]:         version    = 4,
Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]:         component  = PRIMARY,
Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]:         conf_id    = 29,
Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]:         members    = 2/2 (joined/total),
Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]:         act_id     = 17299676,
Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]:         last_appl. = 17299555,
Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]:         protocols  = 0/9/3 (gcs/repl/appl),
Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]:         group UUID = 3c6c83e4-3286-11e9-945b-e253cb1ea9e0
Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 0 [Note] WSREP: Flow-control interval: [23, 23]
Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 0 [Note] WSREP: Trying to continue unpaused monitor
Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 23 [Note] WSREP: New cluster view: global state: 3c6c83e4-3286-11e9-945b-e253cb1ea9e0:17299676, view# 30: Primary, number of nodes: 2, my index: 0, protocol version 3
Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 23 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 23 [Note] WSREP: REPL Protocols: 9 (4, 2)
Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 23 [Note] WSREP: Assign initial position for certification: 17299676, protocol version: 4
Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 0 [Note] WSREP: Service thread queue flushed.
Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 122 [Warning] Aborted connection 122 to db: 'sbtest' user: 'sbench' host: 'ip-10-4-20-121.us-east-2.compute.internal' (Got an error reading communication packets)
Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 115 [Warning] Aborted connection 115 to db: 'sbtest' user: 'sbench' host: 'ip-10-4-20-121.us-east-2.compute.internal' (Got an error reading communication packets)
Feb 18 02:43:34 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:34 124 [Warning] Aborted connection 124 to db: 'sbtest' user: 'sbench' host: 'ip-10-4-20-121.us-east-2.compute.internal' (Got an error reading communication packets)
Feb 18 02:43:39 ip-10-4-20-211 mysqld[10268]: 2019-02-18  2:43:39 0 [Note] WSREP:  cleaning up f3789450 (tcp://10.4.151.8:4567)



 Comments   
Comment by Jan Lindström (Inactive) [ 2019-05-09 ]

Let me get this clear: So you have 3 machines A,B,C, right? Sysbench is executing transactions to node A right ? When you shutdown node B, sysbench exists, right ? At that moment there could be a transaction on node A that is replicating it write set to B and C but as B is shutdown it fails. Now, it could be that sysbench is not prepared to that failure. I do not know sysbench code that well as it is not part of our codebase.

Comment by Jan Lindström (Inactive) [ 2019-05-09 ]

I tested this with 10.3.15 using:

[jplindst@pitbull mysql-test]$ ./mtr galera_3nodes.galera_parallel_apply_3nodes --start-and-exit

That will start 3 node Galera cluster on same machine. Then I started sysbench like

./sysbench --test=oltp_read_write.lua --mysql-socket=/home/jplindst/10.1/mysql-test/var/tmp/mysqld.1.sock --mysql-user=root --max-time=6000 run

And after few seconds it started shutdown the node 2 with

sudo /usr/local/mysql/bin/mysqladmin -u root -S /home/jplindst/10.1/mysql-test/var/tmp/mysqld.2.sock shutdown

sysbench did not stop or report any errors.

Comment by Slawomir Lisznianski [ 2019-05-09 ]

sysbench shouldn't be exposed or be "prepared to handle" anything happening to B or C. As far as it's concerned there are no B or C. It communicates with A's IP alone. The cluster never looses a member ungracefully. When B or C are brought down (one at a time) for maintenance, remaining set (A+B or A+C) should continue w/o any errors reported to the client talking to A. Do you agree?

We can try again to reproduce (it's been a while) but I want to double-check with you that we are conceptually on the same page. This aside, do you believe that between the version we tried (10.3.12) and yours (10.3.15) a relevant patch might have been applied?

Thanks!

Comment by Jan Lindström (Inactive) [ 2019-05-10 ]

Conceptually I tried to reproduce the problem but could not. There is number of bug fixes between those two versions but I could not find anything that would clearly effect here.

Comment by Oleksii Kravchenko [ 2019-05-15 ]

I get the same issue on MariaDB Galera Cluster 10.1.38.

[ 472s ] thds: 64 tps: 4758.44 qps: 28550.66 (r/w/o: 0.00/17304.47/11246.18) lat (ms,95%): 21.11 err/s: 0.00 reconn/s: 0.00
[ 474s ] thds: 64 tps: 4877.21 qps: 29291.74 (r/w/o: 0.00/17853.59/11438.16) lat (ms,95%): 23.10 err/s: 0.00 reconn/s: 0.00
FATAL: mysql_stmt_execute() returned error 1180 (Got error 35 "Resource deadlock avoided" during COMMIT) for query 'COMMIT'
(last message repeated 12 times)
FATAL: `thread_run' function failed: /usr/share/sysbench/oltp_common.lua:409: SQL error, errno = 1180, state = 'HY000': Got error 35 "Resource deadlock avoided" during COMMIT
(last message repeated 1 times)
FATAL: mysql_stmt_execute() returned error 1180 (Got error 35 "Resource deadlock avoided" during COMMIT) for query 'COMMIT'
(last message repeated 1 times)
FATAL: `thread_run' function failed: /usr/share/sysbench/oltp_common.lua:409: SQL error, errno = 1180, state = 'HY000': Got error 35 "Resource deadlock avoided" during COMMIT
(last message repeated 7 times)
FATAL: mysql_stmt_execute() returned error 1180 (Got error 35 "Resource deadlock avoided" during COMMIT) for query 'COMMIT'
(last message repeated 1 times)
FATAL: `thread_run' function failed: /usr/share/sysbench/oltp_common.lua:409: SQL error, errno = 1180, state = 'HY000': Got error 35 "Resource deadlock avoided" during COMMIT
(last message repeated 1 times)
FATAL: mysql_stmt_execute() returned error 1180 (Got error 35 "Resource deadlock avoided" during COMMIT) for query 'COMMIT'
FATAL: `thread_run' function failed: /usr/share/sysbench/oltp_common.lua:409: SQL error, errno = 1180, state = 'HY000': Got error 35 "Resource deadlock avoided" during COMMIT
FATAL: mysql_stmt_execute() returned error 1180 (Got error 35 "Resource deadlock avoided" during COMMIT) for query 'COMMIT'
FATAL: `thread_run' function failed: /usr/share/sysbench/oltp_common.lua:409: SQL error, errno = 1180, state = 'HY000': Got error 35 "Resource deadlock avoided" during COMMIT
(last message repeated 1 times)
FATAL: mysql_stmt_execute() returned error 1180 (Got error 35 "Resource deadlock avoided" during COMMIT) for query 'COMMIT'
FATAL: `thread_run' function failed: /usr/share/sysbench/oltp_common.lua:409: SQL error, errno = 1180, state = 'HY000': Got error 35 "Resource deadlock avoided" during COMMIT
(last message repeated 1 times)
FATAL: mysql_stmt_execute() returned error 1180 (Got error 35 "Resource deadlock avoided" during COMMIT) for query 'COMMIT'
(last message repeated 1 times)
FATAL: `thread_run' function failed: /usr/share/sysbench/oltp_common.lua:409: SQL error, errno = 1180, state = 'HY000': Got error 35 "Resource deadlock avoided" during COMMIT
FATAL: mysql_stmt_execute() returned error 1180 (Got error 35 "Resource deadlock avoided" during COMMIT) for query 'COMMIT'
FATAL: `thread_run' function failed: /usr/share/sysbench/oltp_common.lua:409: SQL error, errno = 1180, state = 'HY000': Got error 35 "Resource deadlock avoided" during COMMIT
FATAL: mysql_stmt_execute() returned error 1180 (Got error 35 "Resource deadlock avoided" during COMMIT) for query 'COMMIT'
(last message repeated 1 times)
FATAL: `thread_run' function failed: /usr/share/sysbench/oltp_common.lua:409: SQL error, errno = 1180, state = 'HY000': Got error 35 "Resource deadlock avoided" during COMMIT
Error in my_thread_global_end(): 25 threads didn't exit

Comment by Jan Lindström (Inactive) [ 2019-05-16 ]

Can you share your configuration files and error logs from all nodes. Furthermore, can you repeat this if you use --wsrep-debug=ON ? I still think this is not a bug. Let me explain.

Lets assume we have write transaction on node A. In commit Galera will send writeset of this transaction to nodes B and C.
Node B starts to certify this writeset but during this certify node is shutdown. Node A that waits acknowledgement from nodes B and C that they have certified the writeset received a error that node B is not anymore on gluster (here it does not yet matter did C already replied its certify result). Node B should replay the writeset but as it is not anymore on cluster it cant. So whole transaction is aborted during commit. Clearly, sysbench (what version you are using ?) is not prepared for this.

Comment by Slawomir Lisznianski [ 2019-06-17 ]

@Jan RE. "I still think this is not a bug. Let me explain." - Let me state for the record that a similar tech (MySQL Cluster) handles it gracefully. As long as the quorum remains it shouldn't really matter if either node B or C are shutdown or crash in the middle of a commit or have a network timeout. A client talking to node A should not be exposed to a wsrep-detected error under such conditions. To recap the conditions: node A is alive and B OR C are alive during the test. The majority are fine thus client should be fine too. If what you think was true, then increasing the number of nodes in the cluster (5, 7, etc) would increase the probability of a client exposed error, which would be counter-productive towards achieving fault-tolerance.

Comment by Jan Lindström (Inactive) [ 2020-08-27 ]

I suggest that you use more recent version of 10.3 and Galera library 25.3.30 and try to reproduce using --wsrep-debug=1 and if problem reproduces please provide full error logs from all nodes.

Generated at Thu Feb 08 08:45:38 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.