[MDEV-26703] WSREP: 0.0 (server1): State transfer to 1.0 (server1) failed: -32 (Broken pipe) Created: 2021-09-27  Updated: 2021-09-27

Status: Open
Project: MariaDB Server
Component/s: Galera SST
Affects Version/s: 10.3.21
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Bob Sislow Assignee: Unassigned
Resolution: Unresolved Votes: 1
Labels: None
Environment:

Production



 Description   

We inherited a 2-node MariaDB/Galera cluster. node2 crashed a few weeks ago and the storage for /var/lib/mysql failed; all data was lost.

The existing configuration appeared to be in place to simply restart the mariadb service on node2 and Galera SST would automatically kick in.

We've obscured the server and IP address names below. Server names were replaced with node1 and node2 in lieu of the original names.

We cannot overcome the following upon startup:

Sep 27 15:20:17 node2 mysqld[298270]: 2021-09-27 15:20:17 0 [Warning] WSREP: access file(/var/lib/mysql/data//gvwstate.dat) failed(No such file or directory)
Sep 27 15:20:17 node2 mysqld[298270]: 2021-09-27 15:20:17 0 [Warning] WSREP: (54a3f6df, 'tcp://0.0.0.0:4567') address 'tcp://xxx.xxx.xxx.xxx:4567' points to own listening address, blacklisting
Sep 27 15:20:18 node2 mysqld[298270]: 2021-09-27 15:20:18 2 [Warning] WSREP: Gap in state sequence. Need state transfer.
Sep 27 15:20:19 node2 mysqld[298270]: 2021-09-27 15:20:19 2 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (e258ea9b-eea3-11e9-9b31-dfd4c89a799d): 1 (Operation not permitted)
Sep 27 15:20:19 node2 mysqld[298270]: 2021-09-27 15:20:19 0 [Warning] WSREP: 0.0 (node1): State transfer to 1.0 (node2) failed: -32 (Broken pipe)
Sep 27 15:20:19 node2 mysqld[298270]: 2021-09-27 15:20:19 0 [ERROR] WSREP: gcs/src/gcs_group.cpp:gcs_group_handle_join_msg():737: Will never receive state. Need to abort.
Sep 27 15:20:19 node2 mysqld[298270]: WSREP_SST: [ERROR] Removing /tmp/tmp.MqsOxdzA9E/xtrabackup_galera_info file due to signal (20210927 15:20:19.441)
Sep 27 15:20:19 node2 mysqld[298270]: WSREP_SST: [ERROR] Error while getting data from donor node:  exit codes: 143 143 (20210927 15:20:19.448)
Sep 27 15:20:19 node2 mysqld[298270]: WSREP_SST: [ERROR] Cleanup after exit with status:32 (20210927 15:20:19.452)

We consistently get the following that we are focusing on:

Sep 27 15:20:19 node2 mysqld[298270]: 2021-09-27 15:20:19 0 [Warning] WSREP: 0.0 (node1): State transfer to 1.0 (node2) failed: -32 (Broken pipe)

We need to know how to resolve this. mariadb never starts on node 2 and continues to retry continuously without success.

RPM versions; same on both nodes:

MariaDB-client-10.3.15-1.el7.centos.x86_64
MariaDB-backup-10.3.31-1.el7.centos.x86_64
MariaDB-common-10.3.15-1.el7.centos.x86_64
MariaDB-server-10.3.21-1.el7.centos.x86_64
MariaDB-compat-10.3.15-1.el7.centos.x86_64
galera-25.3.26-1.rhel7.el7.centos.x86_64

Galera configuration for node1:

[galera]
wsrep_on=ON
wsrep_cluster_name=xxxxxxxxxxxx
wsrep_provider=/usr/lib64/galera/libgalera_smm.so
wsrep_cluster_address=gcomm://node2_ip_address,haproxy_ip_address
binlog_format=row
default_storage_engine=InnoDB
innodb_autoinc_lock_mode=2
wsrep_node_name=node1
wsrep_node_address="node1_ip_address"
wsrep_sst_method="mariabackup"

Galera configuration for node2; the third node of wsrep_cluster is the HAProxy node:

[galera]
wsrep_on=ON
wsrep_cluster_name=xxxxxxxxxxxxxx
wsrep_provider=/usr/lib64/galera/libgalera_smm.so
wsrep_cluster_address="gcomm://node1_ip_address,node2_ip_address,haproxy_ip_address"
binlog_format=row
default_storage_engine=InnoDB
innodb_autoinc_lock_mode=2
wsrep_node_name=node2
wsrep_node_address="node2_ip_address"
wsrep_sst_donor="node1"
wsrep_sst_method="mariabackup"

Note - we have no error logging configured on the first node.
Second note - we cannot bounce the first node as this is the only one servicing our production application.


Generated at Thu Feb 08 09:47:18 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.