[MXS-337] MaxScale 1.1.1 crashed with Signal 6 Created: 2015-08-27  Updated: 2015-11-14  Resolved: 2015-09-07

Status: Closed
Project: MariaDB MaxScale
Component/s: readwritesplit
Affects Version/s: 1.1.1
Fix Version/s: 1.2.0

Type: Bug Priority: Blocker
Reporter: Claudio Nanni Assignee: Johan Wikman
Resolution: Won't Fix Votes: 0
Labels: None
Environment:

RHEL 6.5



 Description   

After several weeks we got:

2015-08-26 06:28:34 Fatal: MaxScale received fatal signal 6. Attempting backtrace.
2015-08-26 06:28:34 /usr/local/mariadb-maxscale/bin/maxscale() [0x54979c]
2015-08-26 06:28:34 /lib64/libpthread.so.0() [0x3588c0f710]
2015-08-26 06:28:34 /lib64/libc.so.6(gsignal+0x35) [0x3588832625]
2015-08-26 06:28:34 /lib64/libc.so.6(abort+0x175) [0x3588833e05]
2015-08-26 06:28:34 /lib64/libc.so.6() [0x3588870537]
2015-08-26 06:28:34 /lib64/libc.so.6() [0x3588875e66]
2015-08-26 06:28:34 /usr/local/mariadb-maxscale/bin/maxscale(gwbuf_free+0x11d) [0x548950]
2015-08-26 06:28:34 /usr/local/mariadb-maxscale/bin/maxscale(gwbuf_consume+0xae) [0x548de6]
2015-08-26 06:28:34 /usr/local/mariadb-maxscale/modules/libreadwritesplit.so(+0x851c) [0x7f3fc4d8d51c]
2015-08-26 06:28:34 /usr/local/mariadb-maxscale/modules/libreadwritesplit.so(+0x6890) [0x7f3fc4d8b890]
2015-08-26 06:28:34 /usr/local/mariadb-maxscale/modules/libMySQLBackend.so(+0x3199) [0x7f3fc4538199]
2015-08-26 06:28:34 /usr/local/mariadb-maxscale/bin/maxscale() [0x55b9c1]
2015-08-26 06:28:34 /usr/local/mariadb-maxscale/bin/maxscale(poll_waitevents+0x634) [0x55b280]
2015-08-26 06:28:34 /usr/local/mariadb-maxscale/bin/maxscale(main+0x1af6) [0x54c39d]
2015-08-26 06:28:34 /lib64/libc.so.6(__libc_start_main+0xfd) [0x358881ed5d]
2015-08-26 06:28:34 /usr/local/mariadb-maxscale/bin/maxscale() [0x5485bd]



 Comments   
Comment by Dipti Joshi (Inactive) [ 2015-08-31 ]

johan.wikman Can we pintpoint the line number ?

Comment by markus makela [ 2015-08-31 ]

It crashed on line 3747 in readwritesplit.c

Comment by Johan Wikman [ 2015-08-31 ]

A slightly more informative stacktrace.

/usr/local/mariadb-maxscale/bin/maxscale() [0x54979c]
/lib64/libpthread.so.0() [0x3588c0f710]
/lib64/libc.so.6(gsignal+0x35) [0x3588832625]
/lib64/libc.so.6(abort+0x175) [0x3588833e05]
/lib64/libc.so.6() [0x3588870537]
/lib64/libc.so.6() [0x3588875e66]
/usr/local/mariadb-maxscale/bin/maxscale(gwbuf_free+0x11d) [0x548950]
/usr/local/mariadb-maxscale/bin/maxscale(gwbuf_consume+0xae) [0x548de6]
/home/ec2-user/workspace/server/modules/routing/readwritesplit/readwritesplit.c:3747
/home/ec2-user/workspace/server/modules/routing/readwritesplit/readwritesplit.c:2789
/home/ec2-user/workspace/server/modules/protocol/mysql_backend.c:568
/usr/local/mariadb-maxscale/bin/maxscale() [0x55b9c1]
/usr/local/mariadb-maxscale/bin/maxscale(poll_waitevents+0x634) [0x55b280]
/usr/local/mariadb-maxscale/bin/maxscale(main+0x1af6) [0x54c39d]
/lib64/libc.so.6(__libc_start_main+0xfd) [0x358881ed5d]
/usr/local/mariadb-maxscale/bin/maxscale() [0x5485bd]

Comment by Johan Wikman [ 2015-08-31 ]

The signal occurs at the very last call to free in gwbuf_free.

  548942:       75 c7                   jne    54890b <gwbuf_free+0xd8>
  548944:       48 8b 45 d8             mov    -0x28(%rbp),%rax
  548948:       48 89 c7                mov    %rax,%rdi
  54894b:       e8 70 65 ff ff          callq  53eec0 <free@plt>
  548950:       c9                      leaveq 
  548951:       c3                      retq   

Comment by Johan Wikman [ 2015-09-07 ]

The problem appears to be the following lines in readwritesplit.c:

static GWBUF* sescmd_cursor_process_replies(...)
    ...
			if(bref->reply_cmd != scmd->reply_cmd)
			{
			    skygw_log_write(LOGFILE_TRACE,"Backend "
				    "server '%s' response differs from master's response. "
				    "Closing connection.",
                                    bref->bref_backend->backend_server->unique_name);
			     ...
			     if(replybuf)
				 gwbuf_consume(replybuf,gwbuf_length(replybuf));
			}
    ...
    return replybuf;
}

If slave does not return the same result as the master, the function will return a GWBUF that has already been consumed. Eventually that will result in a double free with a SIGABRT as the result.

Comment by Johan Wikman [ 2015-09-07 ]

The bug (as outlined in the last comment) is fixed in release 1.2.

The problematic lines are now:

			     if(replybuf)
				 while((replybuf = gwbuf_consume(replybuf,gwbuf_length(replybuf))));

That is, in case of problems, the GWBUF is consumed and NULL is (correctly) returned.

If an upgrade to 1.2 is not possible, then this needs to be backported to 1.1.1.

Comment by cai sunny [ 2015-11-14 ]

johan.wikman

MaxScale 1.2 crashed by this error on our system, please help check what should we do.
Should we change read write on the same server?

maxscale-1.2.0-1.x86_64

the error message is :
2015-11-11 20:50:50 XXX PR RW Split Service: Refresh rate limit exceeded for load of users' table.sers' table.
2015-11-11 20:50:50 Error : Unable to write to backend due to authentication failure.
2015-11-11 20:50:50 Fatal: MaxScale received fatal signal 6. Attempting backtrace.
2015-11-11 20:50:50 /usr/bin/maxscale() [0x5238ab]
2015-11-11 20:50:50 /lib64/libpthread.so.0(+0xf790) [0x7f13b7e25790]
2015-11-11 20:50:50 /lib64/libc.so.6(gsignal+0x35) [0x7f13b66d8625]
2015-11-11 20:50:50 /lib64/libc.so.6(abort+0x175) [0x7f13b66d9e05]
2015-11-11 20:50:50 /lib64/libc.so.6(+0x70537) [0x7f13b6716537]
2015-11-11 20:50:50 /lib64/libc.so.6(+0x75e66) [0x7f13b671be66]
2015-11-11 20:50:50 /lib64/libc.so.6(+0x7897a) [0x7f13b671e97a]
2015-11-11 20:50:50 /usr/bin/maxscale(gwbuf_free+0x11d) [0x522940]
2015-11-11 20:50:50 /usr/bin/maxscale(gwbuf_consume+0xae) [0x522de2]
2015-11-11 20:50:50 /usr/lib64/maxscale/libreadwritesplit.so(+0xaebd) [0x7f13b10caebd]
2015-11-11 20:50:50 /usr/lib64/maxscale/libreadwritesplit.so(+0xac03) [0x7f13b10cac03]
2015-11-11 20:50:50 /usr/lib64/maxscale/libreadwritesplit.so(+0x97ce) [0x7f13b10c97ce]
2015-11-11 20:50:50 /usr/lib64/maxscale/libreadwritesplit.so(+0xc4c9) [0x7f13b10cc4c9]
2015-11-11 20:50:50 /usr/lib64/maxscale/libreadwritesplit.so(+0xc227) [0x7f13b10cc227]
2015-11-11 20:50:50 /usr/lib64/maxscale/libMySQLBackend.so(+0x6264) [0x7f13801ee264]
2015-11-11 20:50:50 /usr/bin/maxscale() [0x539521]
2015-11-11 20:50:50 /usr/bin/maxscale(poll_waitevents+0x634) [0x5389e8]
2015-11-11 20:50:50 /lib64/libpthread.so.0(+0x7a51) [0x7f13b7e1da51]
2015-11-11 20:50:50 /lib64/libc.so.6(clone+0x6d) [0x7f13b678e9ad]

Comment by markus makela [ 2015-11-14 ]

First thing I'd suggest to try is to test with MaxScale 1.2.1 and see if the crash still happens.

Generated at Thu Feb 08 03:58:32 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.