[MDEV-17612] Galera cluster node hangs Created: 2018-11-05  Updated: 2019-06-03  Resolved: 2019-06-03

Status: Closed
Project: MariaDB Server
Component/s: Galera
Affects Version/s: 10.3.10
Fix Version/s: N/A

Type: Bug Priority: Critical
Reporter: Nicola Assignee: Jan Lindström (Inactive)
Resolution: Incomplete Votes: 0
Labels: need_feedback
Environment:

3 VM on vmware 6.7
each node :
8 vcpu
32 gb ram



 Description   

Hi all,
it is weeks that 2 of 3 servers go to hangs.
The error at first sight is the maximum number of connections saturated, but the written applications do not exceed 10 connections on 350 per node.
In my opinion there is some problem in the galera cluster (maybe flow control?) that sends me the nodes in hangs and consequently also the maxscale balancer blocks all the connections.

This is error for maxscale :
2018-11-05 06:45:21 error : (635732) [mariadbbackend] Invalid authentication message from backend 'server2'. Error code: 1040, Msg : Too many connections
2018-11-05 06:45:21 error : [mariadbbackend] Unable to write to backend 'server2' due to authentication failure. Server in state RUNNING SLAVE.
2018-11-05 06:45:21 error : (635733) [mariadbbackend] Invalid authentication message from backend 'server3'. Error code: 1040, Msg : Too many connections
2018-11-05 06:45:21 error : [mariadbbackend] Unable to write to backend 'server3' due to authentication failure. Server in state RUNNING SLAVE.
2018-11-05 06:45:23 error : Monitor was unable to connect to server [mariadb02.betos.lan]:3306 : "Lost connection to MySQL server at 'handshake: reading initial communication packet', system error: 107"

This is the configuration cluster :
[galera]

  1. Mandatory settings
    wsrep_on=on
    wsrep_provider= /usr/lib64/galera/libgalera_smm.so
    wsrep_cluster_address= gcomm://172.31.XXX.XXX,172.31.XXX.XXX,172.31.XXX.XXX
    wsrep_provider_options = "gcs.fc_limit=160;gcs.fc_factor=0.8;pc.checksum=false;gcache.size=4096M;pc.bootstrap=YES"
    binlog_format = ROW
    wsrep_forced_binlog_format = ROW
    wsrep_gtid_domain_id = 1
    wsrep_gtid_mode = 1
    default_storage_engine = InnoDB
    innodb_autoinc_lock_mode = 2
    wsrep_sst_auth = XXXX:XXXX
    wsrep_debug = OFF
    wsrep_log_conflicts = 1
    wsrep_slave_threads = 32
    wsrep_log_conflicts = 1

is there any parameter to optimize?

I've another question.

Why maxscale blocks all connections if there is still a node still alive (not in hangs)?
Thanks.
Best regards.
Nicola Battista



 Comments   
Comment by Nicola [ 2018-11-09 ]

Hi,
Any news?

Thanks,
Regards.

Comment by Jan Lindström (Inactive) [ 2019-05-06 ]

What is your max connections configuration value ? Maybe you could provide full configuration and full unedited error log ?

Generated at Thu Feb 08 08:37:47 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.