Details
Description
Summary:
When simulating a network connectivity loss between replicating servers, if the replication channel uses SSL then MariaDB 10.x does not detect the loss of network connectivity, although 5.5 does. This is regardless of the value of the "slave_net_timeout" variable.
Reproduced using the 10.0.14 GLIBC214 build and 10.0.0 builds, using both GTID (10.0.14) and binlog-based replication (10.0.14/10.0.0). Works as expected on 5.5.40 with binlog-based replication. I am testing with the binary .tar.gz MariaDB builds downloaded from the Mariadb servers (archive.mariadb.org).
Steps to reproduce (functional case):
- Set up MariaDB with two 5.5.40 servers in master-slave configuration and ensure replication is working and SSL-encrypted.
- Start generating traffic on the master. Watch the slave status to see the traffic is being replicated successfully.
- Simulate a network failure, e.g. "iptables -I INPUT -s <master_ip> -j DROP" on the slave. This drops all network packets from the master host.
- Wait for slave_net_timeout seconds to pass. The slave will restart as documented, and the slave status will now state that it is attempting to reconnect to the master.
Steps to reproduce (broken case):
- Set up MariaDB with two 10.0.14 servers in master-slave configuration and ensure replication is working and SSL-encrypted.
- Start generating traffic on the master. Watch the slave status to see the traffic is being replicated successfully.
- Simulate a network failure, e.g. "iptables -I INPUT -s <master_ip> -j DROP" on the slave. This drops all network packets from the master host.
- Wait for slave_net_timeout seconds to pass. The slave status will continue to state "waiting for master to send event", even though the log position counters are not advancing. The slave will remain in this state until the slave is stopped and restarted – it will not restart on its own, contrary to documentation. This is a change in behavior from Mariadb 5.5 and appears to be incorrect behavior.