[MDEV-7111] Unable to detect network timeout in 10.x when using SSL (regression from 5.5) - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Minor
Resolution: Fixed
Affects Version/s: 10.0.14, 10.0(EOL)
Fix Version/s: 10.5.26, 10.6.19, 10.11.9, 11.1.6, 11.2.5, 11.4.3
Component/s: Replication, SSL
Labels:
- upstream
- verified
Environment:
Ubuntu Linux 14.04 x86_64

Description

Reproduced using the 10.0.14 GLIBC214 build and 10.0.0 builds, using both GTID (10.0.14) and binlog-based replication (10.0.14/10.0.0). Works as expected on 5.5.40 with binlog-based replication. I am testing with the binary .tar.gz MariaDB builds downloaded from the Mariadb servers (archive.mariadb.org).

Steps to reproduce (functional case):

Set up MariaDB with two 5.5.40 servers in master-slave configuration and ensure replication is working and SSL-encrypted.
Start generating traffic on the master. Watch the slave status to see the traffic is being replicated successfully.
Simulate a network failure, e.g. "iptables -I INPUT -s <master_ip> -j DROP" on the slave. This drops all network packets from the master host.
Wait for slave_net_timeout seconds to pass. The slave will restart as documented, and the slave status will now state that it is attempting to reconnect to the master.

Steps to reproduce (broken case):

Set up MariaDB with two 10.0.14 servers in master-slave configuration and ensure replication is working and SSL-encrypted.
Start generating traffic on the master. Watch the slave status to see the traffic is being replicated successfully.
Simulate a network failure, e.g. "iptables -I INPUT -s <master_ip> -j DROP" on the slave. This drops all network packets from the master host.
Wait for slave_net_timeout seconds to pass. The slave status will continue to state "waiting for master to send event", even though the log position counters are not advancing. The slave will remain in this state until the slave is stopped and restarted – it will not restart on its own, contrary to documentation. This is a change in behavior from Mariadb 5.5 and appears to be incorrect behavior.

Attachments

Issue Links

links to

Bug #74908 - Unable to detect network timeout in 5.6 when using SSL (regression from 5.5)

Activity

Ascending order - Click to sort in descending order

Paul Kreiner created issue - 2014-11-14 01:03

Paul Kreiner made changes - 2014-11-14 22:58

Field	Original Value	New Value
Component/s		SSL [ 10112 ]
Affects Version/s		10.0 [ 16000 ]
Description	Summary: When simulating a network connectivity loss, MariaDB 10.x does not detect the loss of network connectivity although 5.5 does. This is regardless of the value of the "slave_net_timeout" variable. Reproduced using the 10.0.14 GLIBC214 build and 10.0.0 builds, using both GTID and binlog-based replication. Works as expected on 5.5.40 with binlog-based replication. Using the binary .tar.gz MariaDB builds downloaded from the Mariadb servers. Steps to reproduce (functional case): - Set up MariaDB with two 5.5.40 servers in master-slave configuration and ensure replication is working. - Start generating traffic on the master (say, using pt-heartbeat). Watch the slave status to see the traffic being replicated successfully. - Simulate a network failure, e.g. "iptables -I INPUT -s <master_ip> -j DROP" on the slave. This drops all traffic from the master host. - Wait for slave_net_timeout seconds to pass. The slave will restart as documented, and the slave status will now state that it is attempting to reconnect to the master. Steps to reproduce (broken case): - Set up MariaDB with two 10.0.14 servers in master-slave configuration and ensure replication is working. - Start generating traffic on the master (say, using pt-heartbeat). Watch the slave status to see the traffic being replicated successfully. - Simulate a network failure, e.g. "iptables -I INPUT -s <master_ip> -j DROP" on the slave. This drops all traffic from the master host. - Wait for slave_net_timeout seconds to pass. The slave status will continue to state "waiting for master to send event", even though the log position counters are not advancing. The slave will remain in this state until the slave is stopped and restarted -- it will not restart on its own, contrary to documentation.	Summary: When simulating a network connectivity loss between replicating servers, if the replication channel uses SSL then MariaDB 10.x does not detect the loss of network connectivity, although 5.5 does. This is regardless of the value of the "slave_net_timeout" variable. Reproduced using the 10.0.14 GLIBC214 build and 10.0.0 builds, using both GTID and binlog-based replication. Works as expected on 5.5.40 with binlog-based replication. Using the binary .tar.gz MariaDB builds downloaded from the Mariadb servers. Steps to reproduce (functional case): - Set up MariaDB with two 5.5.40 servers in master-slave configuration and ensure replication is working and SSL-encrypted. - Start generating traffic on the master (say, using pt-heartbeat). Watch the slave status to see the traffic being replicated successfully. - Simulate a network failure, e.g. "iptables -I INPUT -s <master_ip> -j DROP" on the slave. This drops all traffic from the master host. - Wait for slave_net_timeout seconds to pass. The slave will restart as documented, and the slave status will now state that it is attempting to reconnect to the master. Steps to reproduce (broken case): - Set up MariaDB with two 10.0.14 servers in master-slave configuration and ensure replication is working and SSL-encrypted. - Start generating traffic on the master (say, using pt-heartbeat). Watch the slave status to see the traffic being replicated successfully. - Simulate a network failure, e.g. "iptables -I INPUT -s <master_ip> -j DROP" on the slave. This drops all traffic from the master host. - Wait for slave_net_timeout seconds to pass. The slave status will continue to state "waiting for master to send event", even though the log position counters are not advancing. The slave will remain in this state until the slave is stopped and restarted -- it will not restart on its own, contrary to documentation.
Summary	Unable to detect network timeout in 10.x (regression from 5.5)	Unable to detect network timeout in 10.x when using SSL (regression from 5.5)

Paul Kreiner made changes - 2014-11-14 23:01

Description

Summary:
When simulating a network connectivity loss between replicating servers, if the replication channel uses SSL then MariaDB 10.x does not detect the loss of network connectivity, although 5.5 does. This is regardless of the value of the "slave_net_timeout" variable.

Reproduced using the 10.0.14 GLIBC214 build and 10.0.0 builds, using both GTID and binlog-based replication. Works as expected on 5.5.40 with binlog-based replication. Using the binary .tar.gz MariaDB builds downloaded from the Mariadb servers.

Steps to reproduce (functional case):
- Set up MariaDB with two 5.5.40 servers in master-slave configuration and ensure replication is working and SSL-encrypted.
- Start generating traffic on the master (say, using pt-heartbeat). Watch the slave status to see the traffic being replicated successfully.
- Simulate a network failure, e.g. "iptables -I INPUT -s <master_ip> -j DROP" on the slave. This drops all traffic from the master host.
- Wait for slave_net_timeout seconds to pass. The slave will restart as documented, and the slave status will now state that it is attempting to reconnect to the master.

Steps to reproduce (broken case):
- Set up MariaDB with two 10.0.14 servers in master-slave configuration and ensure replication is working and SSL-encrypted.
- Start generating traffic on the master (say, using pt-heartbeat). Watch the slave status to see the traffic being replicated successfully.
- Simulate a network failure, e.g. "iptables -I INPUT -s <master_ip> -j DROP" on the slave. This drops all traffic from the master host.
- Wait for slave_net_timeout seconds to pass. The slave status will continue to state "waiting for master to send event", even though the log position counters are not advancing. The slave will remain in this state until the slave is stopped and restarted -- it will not restart on its own, contrary to documentation.

Summary:
When simulating a network connectivity loss between replicating servers, if the replication channel uses SSL then MariaDB 10.x does not detect the loss of network connectivity, although 5.5 does. This is regardless of the value of the "slave_net_timeout" variable.

Reproduced using the 10.0.14 GLIBC214 build and 10.0.0 builds, using both GTID (10.0.14) and binlog-based replication (10.0.14/10.0.0). Works as expected on 5.5.40 with binlog-based replication. I am testing with the binary .tar.gz MariaDB builds downloaded from the Mariadb servers (archive.mariadb.org).

Steps to reproduce (functional case):
- Set up MariaDB with two 5.5.40 servers in master-slave configuration and ensure replication is working and SSL-encrypted.
- Start generating traffic on the master. Watch the slave status to see the traffic is being replicated successfully.
- Simulate a network failure, e.g. "iptables -I INPUT -s <master_ip> -j DROP" on the slave. This drops all network packets from the master host.
- Wait for slave_net_timeout seconds to pass. The slave will restart as documented, and the slave status will now state that it is attempting to reconnect to the master.

Steps to reproduce (broken case):
- Set up MariaDB with two 10.0.14 servers in master-slave configuration and ensure replication is working and SSL-encrypted.
- Start generating traffic on the master. Watch the slave status to see the traffic is being replicated successfully.
- Simulate a network failure, e.g. "iptables -I INPUT -s <master_ip> -j DROP" on the slave. This drops all network packets from the master host.
- Wait for slave_net_timeout seconds to pass. The slave status will continue to state "waiting for master to send event", even though the log position counters are not advancing. The slave will remain in this state until the slave is stopped and restarted -- it will not restart on its own, contrary to documentation.

Paul Kreiner made changes - 2014-11-14 23:05

Description

Summary:
When simulating a network connectivity loss between replicating servers, if the replication channel uses SSL then MariaDB 10.x does not detect the loss of network connectivity, although 5.5 does. This is regardless of the value of the "slave_net_timeout" variable.

Reproduced using the 10.0.14 GLIBC214 build and 10.0.0 builds, using both GTID (10.0.14) and binlog-based replication (10.0.14/10.0.0). Works as expected on 5.5.40 with binlog-based replication. I am testing with the binary .tar.gz MariaDB builds downloaded from the Mariadb servers (archive.mariadb.org).

Steps to reproduce (functional case):
- Set up MariaDB with two 5.5.40 servers in master-slave configuration and ensure replication is working and SSL-encrypted.
- Start generating traffic on the master. Watch the slave status to see the traffic is being replicated successfully.
- Simulate a network failure, e.g. "iptables -I INPUT -s <master_ip> -j DROP" on the slave. This drops all network packets from the master host.
- Wait for slave_net_timeout seconds to pass. The slave will restart as documented, and the slave status will now state that it is attempting to reconnect to the master.

Steps to reproduce (broken case):
- Set up MariaDB with two 10.0.14 servers in master-slave configuration and ensure replication is working and SSL-encrypted.
- Start generating traffic on the master. Watch the slave status to see the traffic is being replicated successfully.
- Simulate a network failure, e.g. "iptables -I INPUT -s <master_ip> -j DROP" on the slave. This drops all network packets from the master host.
- Wait for slave_net_timeout seconds to pass. The slave status will continue to state "waiting for master to send event", even though the log position counters are not advancing. The slave will remain in this state until the slave is stopped and restarted -- it will not restart on its own, contrary to documentation.

Summary:
When simulating a network connectivity loss between replicating servers, if the replication channel uses SSL then MariaDB 10.x does not detect the loss of network connectivity, although 5.5 does. This is regardless of the value of the "slave_net_timeout" variable.

Reproduced using the 10.0.14 GLIBC214 build and 10.0.0 builds, using both GTID (10.0.14) and binlog-based replication (10.0.14/10.0.0). Works as expected on 5.5.40 with binlog-based replication. I am testing with the binary .tar.gz MariaDB builds downloaded from the Mariadb servers (archive.mariadb.org).

Steps to reproduce (functional case):
- Set up MariaDB with two 5.5.40 servers in master-slave configuration and ensure replication is working and SSL-encrypted.
- Start generating traffic on the master. Watch the slave status to see the traffic is being replicated successfully.
- Simulate a network failure, e.g. "iptables -I INPUT -s <master_ip> -j DROP" on the slave. This drops all network packets from the master host.
- Wait for slave_net_timeout seconds to pass. The slave will restart as documented, and the slave status will now state that it is attempting to reconnect to the master.

Steps to reproduce (broken case):
- Set up MariaDB with two 10.0.14 servers in master-slave configuration and ensure replication is working and SSL-encrypted.
- Start generating traffic on the master. Watch the slave status to see the traffic is being replicated successfully.
- Simulate a network failure, e.g. "iptables -I INPUT -s <master_ip> -j DROP" on the slave. This drops all network packets from the master host.
- Wait for slave_net_timeout seconds to pass. The slave status will continue to state "waiting for master to send event", even though the log position counters are not advancing. The slave will remain in this state until the slave is stopped and restarted -- it will not restart on its own, contrary to documentation. This is a change in behavior from Mariadb 5.5 and appears to be incorrect behavior.

Elena Stepanova made changes - 2014-11-15 22:17

Labels

upstream

Elena Stepanova made changes - 2014-11-15 22:19

Fix Version/s		10.0 [ 16000 ]
Due Date		2014-11-22

Elena Stepanova made changes - 2014-11-15 22:22

Fix Version/s

10.0 [ 16000 ]

Elena Stepanova made changes - 2014-11-18 01:23

Remote Link

This issue links to "Bug #74908 - Unable to detect network timeout in 5.6 when using SSL (regression from 5.5) (Web Link)" [ 21501 ]

Sergei Golubchik made changes - 2015-01-17 02:41

Due Date	2014-11-22
Priority	Major [ 3 ]	Minor [ 4 ]

Elena Stepanova made changes - 2015-01-29 18:22

Fix Version/s		10.0 [ 16000 ]
Labels	upstream	upstream verified

Rasmus Johansson (Inactive) made changes - 2015-05-18 17:51

Workflow

MariaDB v2 [ 58456 ]

MariaDB v3 [ 67154 ]

Sergei Golubchik made changes - 2021-12-06 21:32

Workflow

MariaDB v3 [ 67154 ]

MariaDB v4 [ 139729 ]

Sergei Golubchik made changes - 2022-09-08 09:53

Fix Version/s

10.0 [ 16000 ]

Brandon Nesterenko made changes - 2024-06-22 16:23

issue.field.resolutiondate

2024-06-22 16:23:33.0

2024-06-22 16:23:33.323

Brandon Nesterenko made changes - 2024-06-22 16:23

Fix Version/s		10.5.26 [ 29832 ]
Resolution		Fixed [ 1 ]
Status	Open [ 1 ]	Closed [ 6 ]

JiraAutomate made changes - 2024-06-22 16:23

Fix Version/s		10.6.19 [ 29833 ]
Fix Version/s		10.11.9 [ 29834 ]
Fix Version/s		11.1.6 [ 29835 ]
Fix Version/s		11.2.5 [ 29836 ]
Fix Version/s		11.4.3 [ 29837 ]

People

Assignee:: Unassigned

Reporter:: Paul Kreiner

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 2014-11-14 01:03

Updated:: 2024-06-22 16:23

Resolved:: 2024-06-22 16:23

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Details

Description

Attachments

Issue Links

Activity

People

Dates

Git Integration