Uploaded image for project: 'MariaDB MaxScale'
  1. MariaDB MaxScale
  2. MXS-2454

During connectivity problems MaxScale can get confused in Master-Slave topologies and promote the slave to master within MaxScale's topology without stopping the slave threads, seeing the former master as an external master.

    XMLWordPrintable

    Details

    • Sprint:
      MXS-SPRINT-82

      Description

      It looks like an intermittent network issue that eventually confuses MaxScale:

      2019-04-09 22:30:25 error : (1945819) Lost connection to the master server, closing session. Lost connection to master server while connection was idle. Connection has been idle for 28778.2 seconds. Error caused by: #HY000: Lost connection to backend server.
       
      2019-04-09 22:40:41 error : (2115177) Lost connection to the master server, closing session. Lost connection to master server while waiting for a result. Connection has been idle for 0.0 seconds. Error caused by: #HY000: Lost connection to backend server. (x3)
       
      2019-04-09 22:51:32 error : (1842568) Lost connection to the master server, closing session. Lost connection to master server while connection was idle. Connection has been idle for 28778.2 seconds. Error caused by: #HY000: Lost connection to backend server.
      

      etc. until we get to:

      2019-04-09 23:05:21 warning: Error during monitor update of server 'server01': Query 'SHOW ALL SLAVES STATUS;' failed: 'Lost connection to MySQL server during query'.
      2019-04-09 23:05:43 error : Failure loading users data from backend [192.168.1.230:3306] for service [MasterSlave-Router]. MySQL error 2002, Can't connect to MySQL server on '192.168.1.230' (110)
      2019-04-09 23:05:43 warning: [MySQLAuth] MasterSlave-Router: login attempt for user 'vetdiss'@[192.168.1.240]:36973, authentication failed. User not found.
      2019-04-09 23:05:51 error : Monitor timed out when connecting to server server01[192.168.1.230:3306] : 'Can't connect to MySQL server on '192.168.1.230' (110)'
      2019-04-09 23:05:51 warning: 'server02' is a better master candidate than the current master 'server01'. Master will change when 'server01' is no longer a valid master.
      2019-04-09 23:05:51 notice : Server changed state: server01[192.168.1.230:3306]: master_down. [Master, Running] -> [Down]
      2019-04-09 23:05:51 error : Server server01 ([192.168.1.230]:3306) lost the master status while waiting for a result. Client sessions will be closed.
      

      so at this point the monitor ejects server01 from the topology as unreachable, treats server02 as master, but defers resetting the slave on it because it cannot reach server01.

      2019-04-09 23:05:51 error : Monitor timed out when connecting to server server01[192.168.1.230:3306] : 'Can't connect to MySQL server on '192.168.1.230' (110)'
      2019-04-09 23:05:51 warning: 'server02' is a better master candidate than the current master 'server01'. Master will change when 'server01' is no longer a valid master.
      2019-04-09 23:05:51 notice : Server changed state: server01[192.168.1.230:3306]: master_down. [Master, Running] -> [Down]
      2019-04-09 23:05:51 error : Server server01 ([192.168.1.230]:3306) lost the master status while waiting for a result. Client sessions will be closed.
      2019-04-09 23:05:51 error : Server server01 ([192.168.1.230]:3306) lost the master status while waiting for a result. Client sessions will be closed.
      2019-04-09 23:05:51 error : Lost connection to the master server, closing session. Lost connection to master server while connection was idle. Connection has been idle for 1360.8 seconds. Error caused by: #HY000: Lost connection to backend server.
      

      and after facing more connection difficulties finally exceeds failcount and does this:

      2019-04-09 23:06:42 warning: The current master server 'server01' is no longer valid because it has been down over 5 (failcount) monitor updates and it does not have any running slaves. Selecting new master server.
      2019-04-09 23:06:42 warning: 'server01' is not a valid master candidate because it is down.
      2019-04-09 23:06:42 notice : Setting 'server02' as master.
      2019-04-09 23:06:42 notice : Cluster master server is replicating from an external master: server01.domain.com:3306
      2019-04-09 23:06:42 notice : Server changed state: server02[192.168.1.231:3306]: new_master. [Slave of External Server, Running] -> [Master, Slave of External Server, Running]
      

      The problem is that at this point MaxScale sees server01 as an external server after having been unable to reach it for a length of time, so does not stop replication from it, and declares server02 the master of a separate topology of one while leaving it a slave of the 'external' server01.

        Attachments

          Activity

            People

            Assignee:
            esa.korhonen Esa Korhonen
            Reporter:
            juan.vera Juan
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: