[MDEV-18979] Galera cluster becoming very slow every two to three weeks, leading to max_connections exhaust, and eventually flow control kicking in Created: 2019-03-20  Updated: 2019-05-20  Resolved: 2019-05-20

Status: Closed
Project: MariaDB Server
Component/s: Galera
Affects Version/s: 10.1.37
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Hartmut Holzgraefe Assignee: Jan Lindström (Inactive)
Resolution: Not a Bug Votes: 0
Labels: need_feedback


 Description   

...



 Comments   
Comment by Hartmut Holzgraefe [ 2019-03-21 ]

Two PROCESSLIST/STATUS samples taken ~30s before and after the slowness started show that all of a sudden ~1300 connections are in "Opening tables" status. This continues for the next ~6 minutes, then the number goes down again (probably due to failing over the write virtual IP to the other cluster node).

A gdb backtrace taken at about 6 minutes in shows 242 connections waiting for a mutex in tc_add_table_callback(...): mysql_mutex_lock(&element->LOCK_table_share);

Unfortunately there's no PROCESSLIST snapshot close to the time the backtrace was taken, about half a minute before the "Opening tables" status count was still at ~800 (down from ~1300 for the first time), ~30s later it was down to zero again.

Generated at Thu Feb 08 08:48:10 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.