Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-26703

WSREP: 0.0 (server1): State transfer to 1.0 (server1) failed: -32 (Broken pipe)

    XMLWordPrintable

Details

    • Bug
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • 10.3.21
    • None
    • Galera SST
    • None
    • Production

    Description

      We inherited a 2-node MariaDB/Galera cluster. node2 crashed a few weeks ago and the storage for /var/lib/mysql failed; all data was lost.

      The existing configuration appeared to be in place to simply restart the mariadb service on node2 and Galera SST would automatically kick in.

      We've obscured the server and IP address names below. Server names were replaced with node1 and node2 in lieu of the original names.

      We cannot overcome the following upon startup:

      Sep 27 15:20:17 node2 mysqld[298270]: 2021-09-27 15:20:17 0 [Warning] WSREP: access file(/var/lib/mysql/data//gvwstate.dat) failed(No such file or directory)
      Sep 27 15:20:17 node2 mysqld[298270]: 2021-09-27 15:20:17 0 [Warning] WSREP: (54a3f6df, 'tcp://0.0.0.0:4567') address 'tcp://xxx.xxx.xxx.xxx:4567' points to own listening address, blacklisting
      Sep 27 15:20:18 node2 mysqld[298270]: 2021-09-27 15:20:18 2 [Warning] WSREP: Gap in state sequence. Need state transfer.
      Sep 27 15:20:19 node2 mysqld[298270]: 2021-09-27 15:20:19 2 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (e258ea9b-eea3-11e9-9b31-dfd4c89a799d): 1 (Operation not permitted)
      Sep 27 15:20:19 node2 mysqld[298270]: 2021-09-27 15:20:19 0 [Warning] WSREP: 0.0 (node1): State transfer to 1.0 (node2) failed: -32 (Broken pipe)
      Sep 27 15:20:19 node2 mysqld[298270]: 2021-09-27 15:20:19 0 [ERROR] WSREP: gcs/src/gcs_group.cpp:gcs_group_handle_join_msg():737: Will never receive state. Need to abort.
      Sep 27 15:20:19 node2 mysqld[298270]: WSREP_SST: [ERROR] Removing /tmp/tmp.MqsOxdzA9E/xtrabackup_galera_info file due to signal (20210927 15:20:19.441)
      Sep 27 15:20:19 node2 mysqld[298270]: WSREP_SST: [ERROR] Error while getting data from donor node:  exit codes: 143 143 (20210927 15:20:19.448)
      Sep 27 15:20:19 node2 mysqld[298270]: WSREP_SST: [ERROR] Cleanup after exit with status:32 (20210927 15:20:19.452)
      

      We consistently get the following that we are focusing on:

      Sep 27 15:20:19 node2 mysqld[298270]: 2021-09-27 15:20:19 0 [Warning] WSREP: 0.0 (node1): State transfer to 1.0 (node2) failed: -32 (Broken pipe)
      

      We need to know how to resolve this. mariadb never starts on node 2 and continues to retry continuously without success.

      RPM versions; same on both nodes:

      MariaDB-client-10.3.15-1.el7.centos.x86_64
      MariaDB-backup-10.3.31-1.el7.centos.x86_64
      MariaDB-common-10.3.15-1.el7.centos.x86_64
      MariaDB-server-10.3.21-1.el7.centos.x86_64
      MariaDB-compat-10.3.15-1.el7.centos.x86_64
      galera-25.3.26-1.rhel7.el7.centos.x86_64
      

      Galera configuration for node1:

      [galera]
      wsrep_on=ON
      wsrep_cluster_name=xxxxxxxxxxxx
      wsrep_provider=/usr/lib64/galera/libgalera_smm.so
      wsrep_cluster_address=gcomm://node2_ip_address,haproxy_ip_address
      binlog_format=row
      default_storage_engine=InnoDB
      innodb_autoinc_lock_mode=2
      wsrep_node_name=node1
      wsrep_node_address="node1_ip_address"
      wsrep_sst_method="mariabackup"
      

      Galera configuration for node2; the third node of wsrep_cluster is the HAProxy node:

      [galera]
      wsrep_on=ON
      wsrep_cluster_name=xxxxxxxxxxxxxx
      wsrep_provider=/usr/lib64/galera/libgalera_smm.so
      wsrep_cluster_address="gcomm://node1_ip_address,node2_ip_address,haproxy_ip_address"
      binlog_format=row
      default_storage_engine=InnoDB
      innodb_autoinc_lock_mode=2
      wsrep_node_name=node2
      wsrep_node_address="node2_ip_address"
      wsrep_sst_donor="node1"
      wsrep_sst_method="mariabackup"
      

      Note - we have no error logging configured on the first node.
      Second note - we cannot bounce the first node as this is the only one servicing our production application.

      Attachments

        Activity

          People

            Unassigned Unassigned
            bsislow Bob Sislow
            Votes:
            1 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.