Uploaded image for project: 'MariaDB MaxScale'
  1. MariaDB MaxScale
  2. MXS-3254

Monitor failover fails



    • MXS-SPRINT-141


      The pinloki switchover test causes the monitor to fail as described below. Rare scenario, not likely to happen in the real world.

      niclas: The pinloki test in review revealed two monitor TODO:s. First, (which I think has come up before) the monitor deduces a replica is replicating from an "external" server by comparing IPs. So a server that is can be external or internal depending on where the IP comes from, and how the monitor is configured. It should be consistent.
      Second, if the sleep(5) in the test is replaced with test.maxscale().wait_monitor_ticks(5) the monitor ties itself in knots, and maxctrl becomes unresponsive.
      esak: The monitor gets stuck?
      niclas: Something goes awry and the monitor goes into a loop trying to STOP SLAVE, which fails.
      I didn't look into it much, just noticing that something is messed up when the two scenarios play at the same time.
      esak: It's likely not an infinite loop, but depends on some timeout settings.
      but why does "stop slave" fail?
      niclas: That's the part that needs to be dug into.
      2020-10-22 10:49:20   warning: [mariadbmon] Query 'SET STATEMENT max_statement_time=3 FOR STOP SLAVE '';' failed on 'pinloki': 'Lost connection to MySQL server during query' (2013). Retrying with 86.9 seconds left.
      esak: could there be some weird deadlock where one thread cannot advance before the other? It's a bit weird since monitor runs in its own.
      niclas: I think it is something like that.




            markus makela markus makela
            nantti Niclas Antti
            0 Vote for this issue
            2 Start watching this issue



              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.