Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-17612

Galera cluster node hangs

    XMLWordPrintable

Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Incomplete
    • 10.3.10
    • N/A
    • Galera
    • 3 VM on vmware 6.7
      each node :
      8 vcpu
      32 gb ram

    Description

      Hi all,
      it is weeks that 2 of 3 servers go to hangs.
      The error at first sight is the maximum number of connections saturated, but the written applications do not exceed 10 connections on 350 per node.
      In my opinion there is some problem in the galera cluster (maybe flow control?) that sends me the nodes in hangs and consequently also the maxscale balancer blocks all the connections.

      This is error for maxscale :
      2018-11-05 06:45:21 error : (635732) [mariadbbackend] Invalid authentication message from backend 'server2'. Error code: 1040, Msg : Too many connections
      2018-11-05 06:45:21 error : [mariadbbackend] Unable to write to backend 'server2' due to authentication failure. Server in state RUNNING SLAVE.
      2018-11-05 06:45:21 error : (635733) [mariadbbackend] Invalid authentication message from backend 'server3'. Error code: 1040, Msg : Too many connections
      2018-11-05 06:45:21 error : [mariadbbackend] Unable to write to backend 'server3' due to authentication failure. Server in state RUNNING SLAVE.
      2018-11-05 06:45:23 error : Monitor was unable to connect to server [mariadb02.betos.lan]:3306 : "Lost connection to MySQL server at 'handshake: reading initial communication packet', system error: 107"

      This is the configuration cluster :
      [galera]

      1. Mandatory settings
        wsrep_on=on
        wsrep_provider= /usr/lib64/galera/libgalera_smm.so
        wsrep_cluster_address= gcomm://172.31.XXX.XXX,172.31.XXX.XXX,172.31.XXX.XXX
        wsrep_provider_options = "gcs.fc_limit=160;gcs.fc_factor=0.8;pc.checksum=false;gcache.size=4096M;pc.bootstrap=YES"
        binlog_format = ROW
        wsrep_forced_binlog_format = ROW
        wsrep_gtid_domain_id = 1
        wsrep_gtid_mode = 1
        default_storage_engine = InnoDB
        innodb_autoinc_lock_mode = 2
        wsrep_sst_auth = XXXX:XXXX
        wsrep_debug = OFF
        wsrep_log_conflicts = 1
        wsrep_slave_threads = 32
        wsrep_log_conflicts = 1

      is there any parameter to optimize?

      I've another question.

      Why maxscale blocks all connections if there is still a node still alive (not in hangs)?
      Thanks.
      Best regards.
      Nicola Battista

      Attachments

        Activity

          People

            jplindst Jan Lindström (Inactive)
            nbattista89 Nicola
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.