Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Cannot Reproduce
    • 2.0.4
    • 2.1.0
    • maxadmin, maxinfo
    • None
    • 2017-28, 2017-29

    Description

      What the replication manager do is in monitoring go routine pool the servers lists and the monitoring list

      During switchover we do

      • set master flag and remove slave flag to the elected master
      • set slave flag and remove master flag to the old master

      After a signifiant number or retried we are able to crash the maxscale.

      Nota that i do open and close the connection in failover and monitoring code

      2017-02-27 10:34:05 notice : Server changed state: server1[192.168.0.41:5055]: server_down. [Running] -> [Down]
      2017-02-27 10:34:11 notice : Server changed state: server1[192.168.0.41:5055]: slave_up. [Down] -> [Slave, Running]
      2017-02-27 10:34:31 notice : Server changed state: server1[192.168.0.41:5055]: lost_slave. [Slave, Running] -> [Running]
      2017-02-27 10:34:32 notice : Server changed state: server1[192.168.0.41:5055]: new_slave. [Running] -> [Slave, Running]
      2017-02-27 10:36:12 notice : Server changed state: server1[192.168.0.41:5055]: lost_slave. [Slave, Running] -> [Running]
      2017-02-27 10:36:17 notice : Server changed state: server1[192.168.0.41:5055]: new_slave. [Running] -> [Slave, Running]
      2017-02-27 15:44:24 notice : Server changed state: server1[192.168.0.41:5055]: new_master. [Master, Slave, Running] -> [Master, Running]
      2017-02-27 15:44:24 notice : Server changed state: server2[192.168.0.41:5056]: new_slave. [Master, Running] -> [Slave, Running]
      2017-02-27 15:44:24 notice : A Master Server is now available: 192.168.0.41:5055
      2017-02-27 15:55:11 error : Fatal: MaxScale 2.0.4 received fatal signal 11. Attempting backtrace.
      2017-02-27 15:55:11 error : Commit ID: 00f16e1fa56765678131faea59b2a819e59d49cc System name: Linux Release string: Ubuntu 16.10
      2017-02-27 15:55:11 error : /usr/bin/maxscale() [0x403ca7]
      2017-02-27 15:55:11 error : /lib/x86_64-linux-gnu/libpthread.so.0(+0x11670) [0x7f6686ec6670]
      2017-02-27 15:55:11 error : /usr/lib/x86_64-linux-gnu/maxscale/libcli.so(execute_cmd+0x2c) [0x7f667fea45d5]
      2017-02-27 15:55:11 error : /usr/lib/x86_64-linux-gnu/maxscale/libcli.so(+0x417b) [0x7f667fea417b]
      2017-02-27 15:55:11 error : /usr/lib/x86_64-linux-gnu/maxscale/libmaxscaled.so(+0x1828) [0x7f667e36b828]
      2017-02-27 15:55:11 error : /usr/lib/x86_64-linux-gnu/maxscale/libmaxscale-common.so.1.0.0(+0x4f338) [0x7f668738b338]
      2017-02-27 15:55:11 error : /usr/lib/x86_64-linux-gnu/maxscale/libmaxscale-common.so.1.0.0(poll_waitevents+0x6c1) [0x7f668738aad9]
      2017-02-27 15:55:11 error : /usr/bin/maxscale(main+0x193b) [0x406eb0]
      2017-02-27 15:55:11 error : /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1) [0x7f66866ca3f1]
      2017-02-27 15:55:11 error : /usr/bin/maxscale(_start+0x29) [0x403629]

      Attachments

        Activity

          johan.wikman Johan Wikman added a comment -

          Downgrading to major. Will be investigated of course, but the circumstances are somewhat unusual.

          johan.wikman Johan Wikman added a comment - Downgrading to major. Will be investigated of course, but the circumstances are somewhat unusual.

          Ok i get more infos on this , this happens not with maxadmin. but with cobination of maxinfo as well , the first report description was misslinding due to a typo in the conf of replication-manager config
          All server list was fetch using maxinfo while at the same time maxadmin was setting state of the servers

          stephane@skysql.com VAROQUI Stephane added a comment - Ok i get more infos on this , this happens not with maxadmin. but with cobination of maxinfo as well , the first report description was misslinding due to a typo in the conf of replication-manager config All server list was fetch using maxinfo while at the same time maxadmin was setting state of the servers
          esa.korhonen Esa Korhonen added a comment -

          Hello, Stephane. I'm trying to reproduce your crash.
          Could you be more specific with the MaxInfo commands used? Are you using the http or SQL interface? What commands are given? Is it just "SHOW SERVERS"?

          esa.korhonen Esa Korhonen added a comment - Hello, Stephane. I'm trying to reproduce your crash. Could you be more specific with the MaxInfo commands used? Are you using the http or SQL interface? What commands are given? Is it just "SHOW SERVERS"?

          Hello Esa , yes just /servers and /monitors around 2s interval polling

          On the other end on a 2 nodes cluster

          I do in long timeout loop

          err = m.Command("clear server " + cluster.master.MxsServerName + " slave")
          if err != nil

          { cluster.LogPrint("ERROR: MaxScale client could not send command:%s", err) }

          if err != nil

          { cluster.LogPrint("ERROR: MaxScale client could not send command:%s", err) }

          if cluster.conf.MxsMonitor == false {
          for _, s := range cluster.slaves {
          err = m.Command("clear server " + s.MxsServerName + " master")
          if err != nil

          { cluster.LogPrint("ERROR: MaxScale client could not send command:%s", err) }

          err = m.Command("set server " + s.MxsServerName + " slave")
          if err != nil

          { cluster.LogPrint("ERROR: MaxScale client could not send command:%s", err) }

          }
          if oldmaster != nil {
          err = m.Command("clear server " + oldmaster.MxsServerName + " master")
          if err != nil

          { cluster.LogPrint("ERROR: MaxScale client could not send command:%s", err) }

          err = m.Command("set server " + oldmaster.MxsServerName + " slave")
          if err != nil

          { cluster.LogPrint("ERROR: MaxScale client could not send command:%s", err) }

          }

          }

          like inverting roles from master and slave

          stephane@skysql.com VAROQUI Stephane added a comment - Hello Esa , yes just /servers and /monitors around 2s interval polling On the other end on a 2 nodes cluster I do in long timeout loop err = m.Command("clear server " + cluster.master.MxsServerName + " slave") if err != nil { cluster.LogPrint("ERROR: MaxScale client could not send command:%s", err) } if err != nil { cluster.LogPrint("ERROR: MaxScale client could not send command:%s", err) } if cluster.conf.MxsMonitor == false { for _, s := range cluster.slaves { err = m.Command("clear server " + s.MxsServerName + " master") if err != nil { cluster.LogPrint("ERROR: MaxScale client could not send command:%s", err) } err = m.Command("set server " + s.MxsServerName + " slave") if err != nil { cluster.LogPrint("ERROR: MaxScale client could not send command:%s", err) } } if oldmaster != nil { err = m.Command("clear server " + oldmaster.MxsServerName + " master") if err != nil { cluster.LogPrint("ERROR: MaxScale client could not send command:%s", err) } err = m.Command("set server " + oldmaster.MxsServerName + " slave") if err != nil { cluster.LogPrint("ERROR: MaxScale client could not send command:%s", err) } } } like inverting roles from master and slave
          stephane@skysql.com VAROQUI Stephane added a comment - - edited

          Note that i don't really noted the status of the monitor at that time it may have been stopped or may have been running

          stephane@skysql.com VAROQUI Stephane added a comment - - edited Note that i don't really noted the status of the monitor at that time it may have been stopped or may have been running
          esa.korhonen Esa Korhonen added a comment -

          I didn't manage to replicate this (only got valgrind to detect one invalid read/write).

          Server state setting code has been modified for versions 2.1.X and the cause for this crash may well be fixed in the new versions. If this crash keeps occurring, you may wish to test with 2.1.0.

          esa.korhonen Esa Korhonen added a comment - I didn't manage to replicate this (only got valgrind to detect one invalid read/write). Server state setting code has been modified for versions 2.1.X and the cause for this crash may well be fixed in the new versions. If this crash keeps occurring, you may wish to test with 2.1.0.

          People

            esa.korhonen Esa Korhonen
            stephane@skysql.com VAROQUI Stephane
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.