[MDEV-17612] Galera cluster node hangs - Jira

XML

Word

Printable

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Incomplete
Affects Version/s: 10.3.10
Fix Version/s: N/A
Component/s: Galera
Labels:
- need_feedback
Environment:
3 VM on vmware 6.7
each node :
8 vcpu
32 gb ram

Description

Hi all,
it is weeks that 2 of 3 servers go to hangs.
The error at first sight is the maximum number of connections saturated, but the written applications do not exceed 10 connections on 350 per node.
In my opinion there is some problem in the galera cluster (maybe flow control?) that sends me the nodes in hangs and consequently also the maxscale balancer blocks all the connections.

This is error for maxscale :
2018-11-05 06:45:21 error : (635732) [mariadbbackend] Invalid authentication message from backend 'server2'. Error code: 1040, Msg : Too many connections
2018-11-05 06:45:21 error : [mariadbbackend] Unable to write to backend 'server2' due to authentication failure. Server in state RUNNING SLAVE.
2018-11-05 06:45:21 error : (635733) [mariadbbackend] Invalid authentication message from backend 'server3'. Error code: 1040, Msg : Too many connections
2018-11-05 06:45:21 error : [mariadbbackend] Unable to write to backend 'server3' due to authentication failure. Server in state RUNNING SLAVE.
2018-11-05 06:45:23 error : Monitor was unable to connect to server [mariadb02.betos.lan]:3306 : "Lost connection to MySQL server at 'handshake: reading initial communication packet', system error: 107"

This is the configuration cluster :
[galera]

Mandatory settings
wsrep_on=on
wsrep_provider= /usr/lib64/galera/libgalera_smm.so
wsrep_cluster_address= gcomm://172.31.XXX.XXX,172.31.XXX.XXX,172.31.XXX.XXX
wsrep_provider_options = "gcs.fc_limit=160;gcs.fc_factor=0.8;pc.checksum=false;gcache.size=4096M;pc.bootstrap=YES"
binlog_format = ROW
wsrep_forced_binlog_format = ROW
wsrep_gtid_domain_id = 1
wsrep_gtid_mode = 1
default_storage_engine = InnoDB
innodb_autoinc_lock_mode = 2
wsrep_sst_auth = XXXX:XXXX
wsrep_debug = OFF
wsrep_log_conflicts = 1
wsrep_slave_threads = 32
wsrep_log_conflicts = 1

is there any parameter to optimize?

I've another question.

Why maxscale blocks all connections if there is still a node still alive (not in hangs)?
Thanks.
Best regards.
Nicola Battista

Attachments

Activity

People

Assignee:: Jan Lindström (Inactive)

Reporter:: Nicola

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 2018-11-05 07:17

Updated:: 2019-06-03 11:30

Resolved:: 2019-06-03 11:30

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.