[MDEV-9083] Slave IO thread does not handle autoreconnect to restarting Galera Cluster node Created: 2015-11-04  Updated: 2016-06-12  Resolved: 2016-06-12

Status: Closed
Project: MariaDB Server
Component/s: Replication
Affects Version/s: 10.1.8, 10.0.22-galera
Fix Version/s: 10.0.26-galera

Type: Bug Priority: Critical
Reporter: Guillaume Lefranc Assignee: Nirbhay Choubey (Inactive)
Resolution: Fixed Votes: 1
Labels: None


 Description   

Hello,

Considering the following architecture:

  • N-nodes MariaDB Galera Cluster
  • Standalone MariaDB server replicating from any above cluster node

In the case where the master node is restarted, the slave stops with this error:
1593 The slave I/O thread stops because a fatal error is encountered when it tried to SELECT @master_binlog_checksum. Error: WSREP has not yet prepared node for application use

It has to be restarted manually (START SLAVE) in order to reconnect

As Galera replication implies that, when a node is restarted, there is a short time when the node is available on port 3306, but the node cannot accept commands because it is processing an Incremental State Transfer request, I propose to have the slave thread autoreconnect instead of stopping when it encounters this particular error.



 Comments   
Comment by Philip Stoev (Inactive) [ 2016-06-10 ]

A fix and a test for this issue are available in the following merge:

https://github.com/codership/mysql-wsrep-bugs/commit/ec4a6fa61ca6be5424040048d0727ebadf271d00

Note that the fix is (currently) wrapped in WITH_WSREP directives, so it will only take effect in binaries compiled with -DHAVE_WSREP=1

Generated at Thu Feb 08 07:32:00 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.