Details
-
Bug
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Incomplete
-
10.4.13
-
None
-
Production
Description
Last month we noticed a very strange thing: COMMIT operations takes from 1 to 300 sec to complete
Mostly, this error is apperaed on node #1 and node #2:
2021-10-25 0:00:04 0 [Note] WSREP: (1c855d08-bb08, 'ssl://0.0.0.0:4567') connection to peer eb3c5908-a8e6 with addr ssl://[node #1 or #2 IP]:4567 timed out, no messages seen in PT6S, socket stats: rtt: 665 rttvar: 908 rto: 201000 lost: 0 last_data_recv: 0 cwnd: 3 last_queued_since: 56675 last_delivered_since: 21629907796 send_queue_length: 1 send_queue_bytes: 80 segment: 0 messages: 1 |
And slow log got this (only on node #1, application is connected to this node)
Time: 211024 23:57:29 |
# User@Host: ***[***] @ *** [***] |
# Thread_id: 2747244 Schema: stock QC_hit: No |
# Query_time: 154.342803 Lock_time: 0.000000 Rows_sent: 0 Rows_examined: 0 |
# Rows_affected: 0 Bytes_sent: 11 |
SET timestamp=1635109049; |
COMMIT;
|
What should we do? This is our main production database and we need to find a way how to avoid such long-running COMMITs
CPU usage and RAM are ok, nothing was changed, we got the same type and amount of load