[MXS-1148] CoreDump using maxadmin Created: 2017-02-27  Updated: 2017-03-02  Resolved: 2017-03-02

Status: Closed
Project: MariaDB MaxScale
Component/s: maxadmin, maxinfo
Affects Version/s: 2.0.4
Fix Version/s: 2.1.0

Type: Bug Priority: Major
Reporter: VAROQUI Stephane Assignee: Esa Korhonen
Resolution: Cannot Reproduce Votes: 0
Labels: None

Sprint: 2017-28, 2017-29

 Description   

What the replication manager do is in monitoring go routine pool the servers lists and the monitoring list

During switchover we do

  • set master flag and remove slave flag to the elected master
  • set slave flag and remove master flag to the old master

After a signifiant number or retried we are able to crash the maxscale.

Nota that i do open and close the connection in failover and monitoring code

2017-02-27 10:34:05 notice : Server changed state: server1[192.168.0.41:5055]: server_down. [Running] -> [Down]
2017-02-27 10:34:11 notice : Server changed state: server1[192.168.0.41:5055]: slave_up. [Down] -> [Slave, Running]
2017-02-27 10:34:31 notice : Server changed state: server1[192.168.0.41:5055]: lost_slave. [Slave, Running] -> [Running]
2017-02-27 10:34:32 notice : Server changed state: server1[192.168.0.41:5055]: new_slave. [Running] -> [Slave, Running]
2017-02-27 10:36:12 notice : Server changed state: server1[192.168.0.41:5055]: lost_slave. [Slave, Running] -> [Running]
2017-02-27 10:36:17 notice : Server changed state: server1[192.168.0.41:5055]: new_slave. [Running] -> [Slave, Running]
2017-02-27 15:44:24 notice : Server changed state: server1[192.168.0.41:5055]: new_master. [Master, Slave, Running] -> [Master, Running]
2017-02-27 15:44:24 notice : Server changed state: server2[192.168.0.41:5056]: new_slave. [Master, Running] -> [Slave, Running]
2017-02-27 15:44:24 notice : A Master Server is now available: 192.168.0.41:5055
2017-02-27 15:55:11 error : Fatal: MaxScale 2.0.4 received fatal signal 11. Attempting backtrace.
2017-02-27 15:55:11 error : Commit ID: 00f16e1fa56765678131faea59b2a819e59d49cc System name: Linux Release string: Ubuntu 16.10
2017-02-27 15:55:11 error : /usr/bin/maxscale() [0x403ca7]
2017-02-27 15:55:11 error : /lib/x86_64-linux-gnu/libpthread.so.0(+0x11670) [0x7f6686ec6670]
2017-02-27 15:55:11 error : /usr/lib/x86_64-linux-gnu/maxscale/libcli.so(execute_cmd+0x2c) [0x7f667fea45d5]
2017-02-27 15:55:11 error : /usr/lib/x86_64-linux-gnu/maxscale/libcli.so(+0x417b) [0x7f667fea417b]
2017-02-27 15:55:11 error : /usr/lib/x86_64-linux-gnu/maxscale/libmaxscaled.so(+0x1828) [0x7f667e36b828]
2017-02-27 15:55:11 error : /usr/lib/x86_64-linux-gnu/maxscale/libmaxscale-common.so.1.0.0(+0x4f338) [0x7f668738b338]
2017-02-27 15:55:11 error : /usr/lib/x86_64-linux-gnu/maxscale/libmaxscale-common.so.1.0.0(poll_waitevents+0x6c1) [0x7f668738aad9]
2017-02-27 15:55:11 error : /usr/bin/maxscale(main+0x193b) [0x406eb0]
2017-02-27 15:55:11 error : /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1) [0x7f66866ca3f1]
2017-02-27 15:55:11 error : /usr/bin/maxscale(_start+0x29) [0x403629]



 Comments   
Comment by Johan Wikman [ 2017-02-28 ]

Downgrading to major. Will be investigated of course, but the circumstances are somewhat unusual.

Comment by VAROQUI Stephane [ 2017-02-28 ]

Ok i get more infos on this , this happens not with maxadmin. but with cobination of maxinfo as well , the first report description was misslinding due to a typo in the conf of replication-manager config
All server list was fetch using maxinfo while at the same time maxadmin was setting state of the servers

Comment by Esa Korhonen [ 2017-02-28 ]

Hello, Stephane. I'm trying to reproduce your crash.
Could you be more specific with the MaxInfo commands used? Are you using the http or SQL interface? What commands are given? Is it just "SHOW SERVERS"?

Comment by VAROQUI Stephane [ 2017-02-28 ]

Hello Esa , yes just /servers and /monitors around 2s interval polling

On the other end on a 2 nodes cluster

I do in long timeout loop

err = m.Command("clear server " + cluster.master.MxsServerName + " slave")
if err != nil

{ cluster.LogPrint("ERROR: MaxScale client could not send command:%s", err) }

if err != nil

{ cluster.LogPrint("ERROR: MaxScale client could not send command:%s", err) }

if cluster.conf.MxsMonitor == false {
for _, s := range cluster.slaves {
err = m.Command("clear server " + s.MxsServerName + " master")
if err != nil

{ cluster.LogPrint("ERROR: MaxScale client could not send command:%s", err) }

err = m.Command("set server " + s.MxsServerName + " slave")
if err != nil

{ cluster.LogPrint("ERROR: MaxScale client could not send command:%s", err) }

}
if oldmaster != nil {
err = m.Command("clear server " + oldmaster.MxsServerName + " master")
if err != nil

{ cluster.LogPrint("ERROR: MaxScale client could not send command:%s", err) }

err = m.Command("set server " + oldmaster.MxsServerName + " slave")
if err != nil

{ cluster.LogPrint("ERROR: MaxScale client could not send command:%s", err) }

}

}

like inverting roles from master and slave

Comment by VAROQUI Stephane [ 2017-02-28 ]

Note that i don't really noted the status of the monitor at that time it may have been stopped or may have been running

Comment by Esa Korhonen [ 2017-03-02 ]

I didn't manage to replicate this (only got valgrind to detect one invalid read/write).

Server state setting code has been modified for versions 2.1.X and the cause for this crash may well be fixed in the new versions. If this crash keeps occurring, you may wish to test with 2.1.0.

Generated at Thu Feb 08 04:04:35 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.