[MDEV-26913] Long-running commit on Galera cluster Created: 2021-10-27  Updated: 2022-01-10  Resolved: 2022-01-10

Status: Closed
Project: MariaDB Server
Component/s: Galera, SSL
Affects Version/s: 10.4.13
Fix Version/s: N/A

Type: Bug Priority: Critical
Reporter: Vladislav Loskutov Assignee: Jan Lindström (Inactive)
Resolution: Incomplete Votes: 0
Labels: None
Environment:

Production



 Description   

Last month we noticed a very strange thing: COMMIT operations takes from 1 to 300 sec to complete

Mostly, this error is apperaed on node #1 and node #2:

2021-10-25  0:00:04 0 [Note] WSREP: (1c855d08-bb08, 'ssl://0.0.0.0:4567') connection to peer eb3c5908-a8e6 with addr ssl://[node #1 or #2 IP]:4567 timed out, no messages seen in PT6S, socket stats: rtt: 665 rttvar: 908 rto: 201000 lost: 0 last_data_recv: 0 cwnd: 3 last_queued_since: 56675 last_delivered_since: 21629907796 send_queue_length: 1 send_queue_bytes: 80 segment: 0 messages: 1

And slow log got this (only on node #1, application is connected to this node)

 Time: 211024 23:57:29
# User@Host: ***[***] @ *** [***]
# Thread_id: 2747244  Schema: stock  QC_hit: No
# Query_time: 154.342803  Lock_time: 0.000000  Rows_sent: 0  Rows_examined: 0
# Rows_affected: 0  Bytes_sent: 11
SET timestamp=1635109049;
COMMIT;

What should we do? This is our main production database and we need to find a way how to avoid such long-running COMMITs

CPU usage and RAM are ok, nothing was changed, we got the same type and amount of load



 Comments   
Comment by Jan Lindström (Inactive) [ 2021-12-09 ]

Loskutov Can you please try with more recent MariaDB version and if your problem reproduces please provide more information. Error logs, configuration and attach debugger to server and provide full output of thread apply all bt. Use debug symbol package for this.

Generated at Thu Feb 08 09:48:54 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.