Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-7111

Unable to detect network timeout in 10.x when using SSL (regression from 5.5)

Details

    Description

      Summary:
      When simulating a network connectivity loss between replicating servers, if the replication channel uses SSL then MariaDB 10.x does not detect the loss of network connectivity, although 5.5 does. This is regardless of the value of the "slave_net_timeout" variable.

      Reproduced using the 10.0.14 GLIBC214 build and 10.0.0 builds, using both GTID (10.0.14) and binlog-based replication (10.0.14/10.0.0). Works as expected on 5.5.40 with binlog-based replication. I am testing with the binary .tar.gz MariaDB builds downloaded from the Mariadb servers (archive.mariadb.org).

      Steps to reproduce (functional case):

      • Set up MariaDB with two 5.5.40 servers in master-slave configuration and ensure replication is working and SSL-encrypted.
      • Start generating traffic on the master. Watch the slave status to see the traffic is being replicated successfully.
      • Simulate a network failure, e.g. "iptables -I INPUT -s <master_ip> -j DROP" on the slave. This drops all network packets from the master host.
      • Wait for slave_net_timeout seconds to pass. The slave will restart as documented, and the slave status will now state that it is attempting to reconnect to the master.

      Steps to reproduce (broken case):

      • Set up MariaDB with two 10.0.14 servers in master-slave configuration and ensure replication is working and SSL-encrypted.
      • Start generating traffic on the master. Watch the slave status to see the traffic is being replicated successfully.
      • Simulate a network failure, e.g. "iptables -I INPUT -s <master_ip> -j DROP" on the slave. This drops all network packets from the master host.
      • Wait for slave_net_timeout seconds to pass. The slave status will continue to state "waiting for master to send event", even though the log position counters are not advancing. The slave will remain in this state until the slave is stopped and restarted – it will not restart on its own, contrary to documentation. This is a change in behavior from Mariadb 5.5 and appears to be incorrect behavior.

      Attachments

        Activity

          paulkreiner Paul Kreiner created issue -
          paulkreiner Paul Kreiner made changes -
          Field Original Value New Value
          Component/s SSL [ 10112 ]
          Affects Version/s 10.0 [ 16000 ]
          Description Summary:
          When simulating a network connectivity loss, MariaDB 10.x does not detect the loss of network connectivity although 5.5 does. This is regardless of the value of the "slave_net_timeout" variable.

          Reproduced using the 10.0.14 GLIBC214 build and 10.0.0 builds, using both GTID and binlog-based replication. Works as expected on 5.5.40 with binlog-based replication. Using the binary .tar.gz MariaDB builds downloaded from the Mariadb servers.

          Steps to reproduce (functional case):
          - Set up MariaDB with two 5.5.40 servers in master-slave configuration and ensure replication is working.
          - Start generating traffic on the master (say, using pt-heartbeat). Watch the slave status to see the traffic being replicated successfully.
          - Simulate a network failure, e.g. "iptables -I INPUT -s <master_ip> -j DROP" on the slave. This drops all traffic from the master host.
          - Wait for slave_net_timeout seconds to pass. The slave will restart as documented, and the slave status will now state that it is attempting to reconnect to the master.

          Steps to reproduce (broken case):
          - Set up MariaDB with two 10.0.14 servers in master-slave configuration and ensure replication is working.
          - Start generating traffic on the master (say, using pt-heartbeat). Watch the slave status to see the traffic being replicated successfully.
          - Simulate a network failure, e.g. "iptables -I INPUT -s <master_ip> -j DROP" on the slave. This drops all traffic from the master host.
          - Wait for slave_net_timeout seconds to pass. The slave status will continue to state "waiting for master to send event", even though the log position counters are not advancing. The slave will remain in this state until the slave is stopped and restarted -- it will not restart on its own, contrary to documentation.
          Summary:
          When simulating a network connectivity loss between replicating servers, if the replication channel uses SSL then MariaDB 10.x does not detect the loss of network connectivity, although 5.5 does. This is regardless of the value of the "slave_net_timeout" variable.

          Reproduced using the 10.0.14 GLIBC214 build and 10.0.0 builds, using both GTID and binlog-based replication. Works as expected on 5.5.40 with binlog-based replication. Using the binary .tar.gz MariaDB builds downloaded from the Mariadb servers.

          Steps to reproduce (functional case):
          - Set up MariaDB with two 5.5.40 servers in master-slave configuration and ensure replication is working and SSL-encrypted.
          - Start generating traffic on the master (say, using pt-heartbeat). Watch the slave status to see the traffic being replicated successfully.
          - Simulate a network failure, e.g. "iptables -I INPUT -s <master_ip> -j DROP" on the slave. This drops all traffic from the master host.
          - Wait for slave_net_timeout seconds to pass. The slave will restart as documented, and the slave status will now state that it is attempting to reconnect to the master.

          Steps to reproduce (broken case):
          - Set up MariaDB with two 10.0.14 servers in master-slave configuration and ensure replication is working and SSL-encrypted.
          - Start generating traffic on the master (say, using pt-heartbeat). Watch the slave status to see the traffic being replicated successfully.
          - Simulate a network failure, e.g. "iptables -I INPUT -s <master_ip> -j DROP" on the slave. This drops all traffic from the master host.
          - Wait for slave_net_timeout seconds to pass. The slave status will continue to state "waiting for master to send event", even though the log position counters are not advancing. The slave will remain in this state until the slave is stopped and restarted -- it will not restart on its own, contrary to documentation.
          Summary Unable to detect network timeout in 10.x (regression from 5.5) Unable to detect network timeout in 10.x when using SSL (regression from 5.5)
          paulkreiner Paul Kreiner made changes -
          Description Summary:
          When simulating a network connectivity loss between replicating servers, if the replication channel uses SSL then MariaDB 10.x does not detect the loss of network connectivity, although 5.5 does. This is regardless of the value of the "slave_net_timeout" variable.

          Reproduced using the 10.0.14 GLIBC214 build and 10.0.0 builds, using both GTID and binlog-based replication. Works as expected on 5.5.40 with binlog-based replication. Using the binary .tar.gz MariaDB builds downloaded from the Mariadb servers.

          Steps to reproduce (functional case):
          - Set up MariaDB with two 5.5.40 servers in master-slave configuration and ensure replication is working and SSL-encrypted.
          - Start generating traffic on the master (say, using pt-heartbeat). Watch the slave status to see the traffic being replicated successfully.
          - Simulate a network failure, e.g. "iptables -I INPUT -s <master_ip> -j DROP" on the slave. This drops all traffic from the master host.
          - Wait for slave_net_timeout seconds to pass. The slave will restart as documented, and the slave status will now state that it is attempting to reconnect to the master.

          Steps to reproduce (broken case):
          - Set up MariaDB with two 10.0.14 servers in master-slave configuration and ensure replication is working and SSL-encrypted.
          - Start generating traffic on the master (say, using pt-heartbeat). Watch the slave status to see the traffic being replicated successfully.
          - Simulate a network failure, e.g. "iptables -I INPUT -s <master_ip> -j DROP" on the slave. This drops all traffic from the master host.
          - Wait for slave_net_timeout seconds to pass. The slave status will continue to state "waiting for master to send event", even though the log position counters are not advancing. The slave will remain in this state until the slave is stopped and restarted -- it will not restart on its own, contrary to documentation.
          Summary:
          When simulating a network connectivity loss between replicating servers, if the replication channel uses SSL then MariaDB 10.x does not detect the loss of network connectivity, although 5.5 does. This is regardless of the value of the "slave_net_timeout" variable.

          Reproduced using the 10.0.14 GLIBC214 build and 10.0.0 builds, using both GTID (10.0.14) and binlog-based replication (10.0.14/10.0.0). Works as expected on 5.5.40 with binlog-based replication. I am testing with the binary .tar.gz MariaDB builds downloaded from the Mariadb servers (archive.mariadb.org).

          Steps to reproduce (functional case):
          - Set up MariaDB with two 5.5.40 servers in master-slave configuration and ensure replication is working and SSL-encrypted.
          - Start generating traffic on the master. Watch the slave status to see the traffic is being replicated successfully.
          - Simulate a network failure, e.g. "iptables -I INPUT -s <master_ip> -j DROP" on the slave. This drops all network packets from the master host.
          - Wait for slave_net_timeout seconds to pass. The slave will restart as documented, and the slave status will now state that it is attempting to reconnect to the master.

          Steps to reproduce (broken case):
          - Set up MariaDB with two 10.0.14 servers in master-slave configuration and ensure replication is working and SSL-encrypted.
          - Start generating traffic on the master. Watch the slave status to see the traffic is being replicated successfully.
          - Simulate a network failure, e.g. "iptables -I INPUT -s <master_ip> -j DROP" on the slave. This drops all network packets from the master host.
          - Wait for slave_net_timeout seconds to pass. The slave status will continue to state "waiting for master to send event", even though the log position counters are not advancing. The slave will remain in this state until the slave is stopped and restarted -- it will not restart on its own, contrary to documentation.
          paulkreiner Paul Kreiner made changes -
          Description Summary:
          When simulating a network connectivity loss between replicating servers, if the replication channel uses SSL then MariaDB 10.x does not detect the loss of network connectivity, although 5.5 does. This is regardless of the value of the "slave_net_timeout" variable.

          Reproduced using the 10.0.14 GLIBC214 build and 10.0.0 builds, using both GTID (10.0.14) and binlog-based replication (10.0.14/10.0.0). Works as expected on 5.5.40 with binlog-based replication. I am testing with the binary .tar.gz MariaDB builds downloaded from the Mariadb servers (archive.mariadb.org).

          Steps to reproduce (functional case):
          - Set up MariaDB with two 5.5.40 servers in master-slave configuration and ensure replication is working and SSL-encrypted.
          - Start generating traffic on the master. Watch the slave status to see the traffic is being replicated successfully.
          - Simulate a network failure, e.g. "iptables -I INPUT -s <master_ip> -j DROP" on the slave. This drops all network packets from the master host.
          - Wait for slave_net_timeout seconds to pass. The slave will restart as documented, and the slave status will now state that it is attempting to reconnect to the master.

          Steps to reproduce (broken case):
          - Set up MariaDB with two 10.0.14 servers in master-slave configuration and ensure replication is working and SSL-encrypted.
          - Start generating traffic on the master. Watch the slave status to see the traffic is being replicated successfully.
          - Simulate a network failure, e.g. "iptables -I INPUT -s <master_ip> -j DROP" on the slave. This drops all network packets from the master host.
          - Wait for slave_net_timeout seconds to pass. The slave status will continue to state "waiting for master to send event", even though the log position counters are not advancing. The slave will remain in this state until the slave is stopped and restarted -- it will not restart on its own, contrary to documentation.
          Summary:
          When simulating a network connectivity loss between replicating servers, if the replication channel uses SSL then MariaDB 10.x does not detect the loss of network connectivity, although 5.5 does. This is regardless of the value of the "slave_net_timeout" variable.

          Reproduced using the 10.0.14 GLIBC214 build and 10.0.0 builds, using both GTID (10.0.14) and binlog-based replication (10.0.14/10.0.0). Works as expected on 5.5.40 with binlog-based replication. I am testing with the binary .tar.gz MariaDB builds downloaded from the Mariadb servers (archive.mariadb.org).

          Steps to reproduce (functional case):
          - Set up MariaDB with two 5.5.40 servers in master-slave configuration and ensure replication is working and SSL-encrypted.
          - Start generating traffic on the master. Watch the slave status to see the traffic is being replicated successfully.
          - Simulate a network failure, e.g. "iptables -I INPUT -s <master_ip> -j DROP" on the slave. This drops all network packets from the master host.
          - Wait for slave_net_timeout seconds to pass. The slave will restart as documented, and the slave status will now state that it is attempting to reconnect to the master.

          Steps to reproduce (broken case):
          - Set up MariaDB with two 10.0.14 servers in master-slave configuration and ensure replication is working and SSL-encrypted.
          - Start generating traffic on the master. Watch the slave status to see the traffic is being replicated successfully.
          - Simulate a network failure, e.g. "iptables -I INPUT -s <master_ip> -j DROP" on the slave. This drops all network packets from the master host.
          - Wait for slave_net_timeout seconds to pass. The slave status will continue to state "waiting for master to send event", even though the log position counters are not advancing. The slave will remain in this state until the slave is stopped and restarted -- it will not restart on its own, contrary to documentation. This is a change in behavior from Mariadb 5.5 and appears to be incorrect behavior.
          elenst Elena Stepanova made changes -
          Labels upstream
          elenst Elena Stepanova made changes -
          Fix Version/s 10.0 [ 16000 ]
          Due Date 2014-11-22
          elenst Elena Stepanova made changes -
          Fix Version/s 10.0 [ 16000 ]
          elenst Elena Stepanova made changes -
          serg Sergei Golubchik made changes -
          Due Date 2014-11-22
          Priority Major [ 3 ] Minor [ 4 ]
          elenst Elena Stepanova made changes -
          Fix Version/s 10.0 [ 16000 ]
          Labels upstream upstream verified
          ratzpo Rasmus Johansson (Inactive) made changes -
          Workflow MariaDB v2 [ 58456 ] MariaDB v3 [ 67154 ]
          serg Sergei Golubchik made changes -
          Workflow MariaDB v3 [ 67154 ] MariaDB v4 [ 139729 ]
          serg Sergei Golubchik made changes -
          Fix Version/s 10.0 [ 16000 ]
          bnestere Brandon Nesterenko made changes -
          issue.field.resolutiondate 2024-06-22 16:23:33.0 2024-06-22 16:23:33.323
          bnestere Brandon Nesterenko made changes -
          Fix Version/s 10.5.26 [ 29832 ]
          Resolution Fixed [ 1 ]
          Status Open [ 1 ] Closed [ 6 ]
          JIraAutomate JiraAutomate made changes -
          Fix Version/s 10.6.19 [ 29833 ]
          Fix Version/s 10.11.9 [ 29834 ]
          Fix Version/s 11.1.6 [ 29835 ]
          Fix Version/s 11.2.5 [ 29836 ]
          Fix Version/s 11.4.3 [ 29837 ]

          People

            Unassigned Unassigned
            paulkreiner Paul Kreiner
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.