Details
-
Bug
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
10.3.21
-
None
-
None
-
Production
Description
We inherited a 2-node MariaDB/Galera cluster. node2 crashed a few weeks ago and the storage for /var/lib/mysql failed; all data was lost.
The existing configuration appeared to be in place to simply restart the mariadb service on node2 and Galera SST would automatically kick in.
We've obscured the server and IP address names below. Server names were replaced with node1 and node2 in lieu of the original names.
We cannot overcome the following upon startup:
Sep 27 15:20:17 node2 mysqld[298270]: 2021-09-27 15:20:17 0 [Warning] WSREP: access file(/var/lib/mysql/data//gvwstate.dat) failed(No such file or directory)
|
Sep 27 15:20:17 node2 mysqld[298270]: 2021-09-27 15:20:17 0 [Warning] WSREP: (54a3f6df, 'tcp://0.0.0.0:4567') address 'tcp://xxx.xxx.xxx.xxx:4567' points to own listening address, blacklisting
|
Sep 27 15:20:18 node2 mysqld[298270]: 2021-09-27 15:20:18 2 [Warning] WSREP: Gap in state sequence. Need state transfer.
|
Sep 27 15:20:19 node2 mysqld[298270]: 2021-09-27 15:20:19 2 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (e258ea9b-eea3-11e9-9b31-dfd4c89a799d): 1 (Operation not permitted)
|
Sep 27 15:20:19 node2 mysqld[298270]: 2021-09-27 15:20:19 0 [Warning] WSREP: 0.0 (node1): State transfer to 1.0 (node2) failed: -32 (Broken pipe)
|
Sep 27 15:20:19 node2 mysqld[298270]: 2021-09-27 15:20:19 0 [ERROR] WSREP: gcs/src/gcs_group.cpp:gcs_group_handle_join_msg():737: Will never receive state. Need to abort.
|
Sep 27 15:20:19 node2 mysqld[298270]: WSREP_SST: [ERROR] Removing /tmp/tmp.MqsOxdzA9E/xtrabackup_galera_info file due to signal (20210927 15:20:19.441)
|
Sep 27 15:20:19 node2 mysqld[298270]: WSREP_SST: [ERROR] Error while getting data from donor node: exit codes: 143 143 (20210927 15:20:19.448)
|
Sep 27 15:20:19 node2 mysqld[298270]: WSREP_SST: [ERROR] Cleanup after exit with status:32 (20210927 15:20:19.452)
|
We consistently get the following that we are focusing on:
Sep 27 15:20:19 node2 mysqld[298270]: 2021-09-27 15:20:19 0 [Warning] WSREP: 0.0 (node1): State transfer to 1.0 (node2) failed: -32 (Broken pipe)
|
We need to know how to resolve this. mariadb never starts on node 2 and continues to retry continuously without success.
RPM versions; same on both nodes:
MariaDB-client-10.3.15-1.el7.centos.x86_64
|
MariaDB-backup-10.3.31-1.el7.centos.x86_64
|
MariaDB-common-10.3.15-1.el7.centos.x86_64
|
MariaDB-server-10.3.21-1.el7.centos.x86_64
|
MariaDB-compat-10.3.15-1.el7.centos.x86_64
|
galera-25.3.26-1.rhel7.el7.centos.x86_64
|
Galera configuration for node1:
[galera]
|
wsrep_on=ON
|
wsrep_cluster_name=xxxxxxxxxxxx
|
wsrep_provider=/usr/lib64/galera/libgalera_smm.so
|
wsrep_cluster_address=gcomm://node2_ip_address,haproxy_ip_address
|
binlog_format=row
|
default_storage_engine=InnoDB
|
innodb_autoinc_lock_mode=2
|
wsrep_node_name=node1
|
wsrep_node_address="node1_ip_address"
|
wsrep_sst_method="mariabackup"
|
Galera configuration for node2; the third node of wsrep_cluster is the HAProxy node:
[galera]
|
wsrep_on=ON
|
wsrep_cluster_name=xxxxxxxxxxxxxx
|
wsrep_provider=/usr/lib64/galera/libgalera_smm.so
|
wsrep_cluster_address="gcomm://node1_ip_address,node2_ip_address,haproxy_ip_address"
|
binlog_format=row
|
default_storage_engine=InnoDB
|
innodb_autoinc_lock_mode=2
|
wsrep_node_name=node2
|
wsrep_node_address="node2_ip_address"
|
wsrep_sst_donor="node1"
|
wsrep_sst_method="mariabackup"
|
Note - we have no error logging configured on the first node.
Second note - we cannot bounce the first node as this is the only one servicing our production application.