[MXS-198] MaxScale received fatal signal 11 Created: 2015-06-15  Updated: 2023-01-10  Resolved: 2015-11-22

Status: Closed
Project: MariaDB MaxScale
Component/s: Core
Affects Version/s: 1.1.1
Fix Version/s: 1.3.0

Type: Bug Priority: Blocker
Reporter: Tibs Assignee: markus makela
Resolution: Cannot Reproduce Votes: 1
Labels: None
Environment:

Centos 6.5


Issue Links:
Relates
relates to MXS-329 The session pointer in a DCB can be n... Closed

 Description   

Hi,

I'm using MaxScale proxy on more than 40 boxes and now I got a weird error on one of the box.
Opp
2015-06-15 12:30:05 Error : Write failed, dcb is closed.
2015-06-15 12:30:05 Fatal: MaxScale received fatal signal 11. Attempting backtrace.
2015-06-15 12:30:05 /usr/local/mariadb-maxscale/bin/maxscale() [0x52189c]
2015-06-15 12:30:05 /lib64/libpthread.so.0(+0xf710) [0x7fa206689710]
2015-06-15 12:30:05 /usr/local/mariadb-maxscale/bin/maxscale(spinlock_acquire+0x1d) [0x521569]
2015-06-15 12:30:05 /usr/local/mariadb-maxscale/modules/libMySQLBackend.so(+0x44a3) [0x7fa1e816b4a3]
2015-06-15 12:30:05 /usr/local/mariadb-maxscale/bin/maxscale(dcb_close+0x332) [0x527f43]
2015-06-15 12:30:05 /usr/local/mariadb-maxscale/modules/libreadconnroute.so(+0x20c8) [0x7fa2005ea0c8]
2015-06-15 12:30:05 /usr/local/mariadb-maxscale/modules/libMySQLClient.so(+0x4b9b) [0x7fa1eaf81b9b]
2015-06-15 12:30:05 /usr/local/mariadb-maxscale/bin/maxscale(dcb_close+0x332) [0x527f43]
2015-06-15 12:30:05 /usr/local/mariadb-maxscale/modules/libMySQLClient.so(+0x3970) [0x7fa1eaf80970]
2015-06-15 12:30:05 /usr/local/mariadb-maxscale/bin/maxscale() [0x533ac1]
2015-06-15 12:30:05 /usr/local/mariadb-maxscale/bin/maxscale(poll_waitevents+0x634) [0x533380]
2015-06-15 12:30:05 /lib64/libpthread.so.0(+0x79d1) [0x7fa2066819d1]
2015-06-15 12:30:05 /lib64/libc.so.6(clone+0x6d) [0x7fa204ff286d]

After I restarted MaxScale everything is fine again, but I don't know what caused the problem.
I don't use the read-write splitting I only use the "read connection router" ability.

If you need anything else just let me know.
Tibi



 Comments   
Comment by Dipti Joshi (Inactive) [ 2015-06-15 ]

Can you attach your MaxScale.cnf file ?

Also, when you say MaxScale on 40 boxes, can you describe on your configuration ? How are client applications communicating to each of the MaxScale ? How many database servers are behind each MaxScale ? And out of curiosity why do you have 40 boxes with MaxScale running ?

Comment by Tibs [ 2015-06-15 ]

Hi Dipti,

Thank you for your answer, on the app servers we run the MaxScale too and it listening on 127.0.0.1, so the applications connect to the Maxscale by 127.0.0.1 and behind the MaxScale we have 4-6 Mysql boxes (we have much more Mysql servers but now we are still testing MaxScale). If we loose a Mysql server in that case the app servers can work without any problem, because Maxscale will not send traffic to the wrong server.

As I wrote we are still testing, maybe the configuration will change in the future but know we are using on this way.
Here is our current conf:

[maxscale]
threads=4

[MySQL Monitor]
type=monitor
module=mysqlmon
servers= server1,server2,server3,server4,server5
user=xxxxxxx
passwd=xxxxxxx
monitor_interval=1000
#backend_connect_timeout=
#backend_read_timeout=
#backend_write_timeout=
#detect_replication_lag=
#detect_stale_master=

[qla]
type=filter
module=qlafilter
options=/tmp/QueryLog

[fetch]
type=filter
module=regexfilter
match=fetch
replace=select

#[Write Service]
#type=service
#router=readconnroute
#router_options=master
#servers=server1
#user=
#pass=

[Read Connection Router]
type=service
router=readconnroute
servers=server2,server3,server4,server5
user=xxxxxx
passwd=xxxxxxx
router_options=slave
localhost_match_wildcard_host=1

[Debug Interface]
type=service
router=debugcli

[CLI]
type=service
router=cli

#[Write Listener]
#type=listener
#service=Write Service
#protocol=MySQLClient
#port=4306
#socket=/tmp/ClusterMaster

[Read Connection Listener]
type=listener
service=Read Connection Router
protocol=MySQLClient
address=127.0.0.1
port=3306
#socket=/tmp/readconn.sock

[Debug Listener]
type=listener
service=Debug Interface
protocol=telnetd
#address=127.0.0.1
port=4442

[CLI Listener]
type=listener
service=CLI
protocol=maxscaled
#address=localhost
port=6603

[server1]
type=server
address=server1
port=3306
protocol=MySQLBackend

[server2]
type=server
address=server2
port=3306
protocol=MySQLBackend
[server3]
type=server
address=server3
port=3306
protocol=MySQLBackend
[server4]
type=server
address=server4
port=3306
protocol=MySQLBackend
[server5]
type=server
address=server5
port=3306
protocol=MySQLBackend

if you need anything else just let me know.

Comment by Geoff Montee (Inactive) [ 2015-06-17 ]

I've seen a similar, but slightly different, backtrace from MaxScale 1.1.1:

2015-06-16 15:37:07 Fatal: MaxScale received fatal signal 11. Attempting backtrace.
2015-06-16 15:37:07 /usr/local/mariadb-maxscale/bin/maxscale() [0x5476d8]
2015-06-16 15:37:07 /lib64/libpthread.so.0(+0xf130) [0x7f2ed18f7130]
2015-06-16 15:37:07 /usr/local/mariadb-maxscale/modules/libreadwritesplit.so(+0x3d91) [0x7f2eb79fbd91]
2015-06-16 15:37:07 /usr/local/mariadb-maxscale/modules/libreadwritesplit.so(+0x4e82) [0x7f2eb79fce82]
2015-06-16 15:37:07 /usr/local/mariadb-maxscale/modules/libreadwritesplit.so(+0x495c) [0x7f2eb79fc95c]
2015-06-16 15:37:07 /usr/local/mariadb-maxscale/modules/libMySQLClient.so(+0x4c73) [0x7f2eb65cdc73]
2015-06-16 15:37:07 /usr/local/mariadb-maxscale/modules/libMySQLClient.so(+0x3a3e) [0x7f2eb65cca3e]
2015-06-16 15:37:07 /usr/local/mariadb-maxscale/bin/maxscale() [0x5596d2]
2015-06-16 15:37:07 /usr/local/mariadb-maxscale/bin/maxscale(poll_waitevents+0x616) [0x558f9b]
2015-06-16 15:37:07 /usr/local/mariadb-maxscale/bin/maxscale(main+0x1a10) [0x54a1a3]
2015-06-16 15:37:07 /lib64/libc.so.6(__libc_start_main+0xf5) [0x7f2ed00e9af5]
2015-06-16 15:37:07 /usr/local/mariadb-maxscale/bin/maxscale() [0x54651d]

Comment by Geoff Montee (Inactive) [ 2015-06-17 ]

Similar backtrace submitted to MXS-207 in case they are separate bugs:

https://mariadb.atlassian.net/browse/MXS-207

Comment by Matt wells [ 2015-06-17 ]

I think we are seeing this issue happen at the same time a backup kicks off. Maybe this could have something to do with wsrep_desync turning on and off? We've hadd maxscale fail 3 times in the last 12 hours.

Comment by Dipti Joshi (Inactive) [ 2015-06-17 ]

matt131We created separate Jira item MXS-207 for your scenario. Since this ticket is using Master/Slave, where as you seem to use Galera

Comment by Dipti Joshi (Inactive) [ 2015-06-17 ]

Tibs Can you please attach coredump file ?

Comment by Dipti Joshi (Inactive) [ 2015-06-21 ]

Tibs Please generate and attach coredump with this issue. Please also attach log files.

You need do following to enable coredump

ulimit -c unlimited
export DAEMON_COREFILE_LIMIT='unlimited'
echo /tmp/core-%e-%s-%u-%g-%p-%t > /proc/sys/kernel/core_pattern
echo 2 > /proc/sys/fs/suid_dumpable
service maxscale restart

After this run your traffic that causes fatal singal 11, you should find the core file in /tmp directory. It will be named core-maxscale-*
Find the latest such core file generated and attach it to this jira item.

Comment by Dipti Joshi (Inactive) [ 2015-06-24 ]

Tibs Please provide the exact OS version of your system.

Comment by Dipti Joshi (Inactive) [ 2015-06-26 ]

Tibs Thanks for the OS version.

Here you can download MaxScale debug build rpm from http://maxscale-jenkins.mariadb.com/repository/1.1.1-debug/mariadb-maxscale/yum/centos6/x86_64/

Change repo config in your /etc/yum.repo.d/maxscale.repo

[maxscale]
name=maxscale
baseurl= http://maxscale-jenkins.mariadb.com/repository/1.1.1-debug/mariadb-maxscale/yum/centos/6/x86_64
enabled=1
gpgcheck=false

Then do following to reinstall maxscale

yum reinstall maxscale

Next ,enable coredump

ulimit -c unlimited
export DAEMON_COREFILE_LIMIT='unlimited'
echo /tmp/core-%e-%s-%u-%g-%p-%t > /proc/sys/kernel/core_pattern
echo 2 > /proc/sys/fs/suid_dumpable
service maxscale restart

After this run your traffic that causes fatal singal 11, you should find the core file in /tmp directory. It will be named core-maxscale-*
Find the latest such core file generated and attach it to this jira item.

Comment by markus makela [ 2015-06-26 ]

It is better to uninstall maxscale and then reinstall maxscale. This will guarantee that the right version is installed.

yum remove maxscale
yum install maxscale

Comment by Dipti Joshi (Inactive) [ 2015-08-17 ]

markus makela Can you pin point where these two locations are llibMySQLBackend.so(+0x44a3) and libreadconnroute.so(+0x20c8) in the source code (1.1.1 on CentOS 6.5) ?

Comment by markus makela [ 2015-08-17 ]

By the looks of it, the crash occurred when the backend DCB was closed and the session spinlock was being acquired. The 1.1.1 source code has the DCB being closed on line 673 in readconnroute.c, the protocol closing function is called on line 1279 in dcb.c and the spinlock is acquired on line 1162 in mysql_backend.c.

Since the crash occurs on the acquisition of the spinlock, it is possible that the session pointer in that function is invalid. A likely cause for this is a corrupt DCB.

Comment by Johan Wikman [ 2015-09-02 ]

More readable stacktrace:

/usr/local/mariadb-maxscale/bin/maxscale() [0x52189c]
/lib64/libpthread.so.0(+0xf710) [0x7fa206689710]
/usr/local/mariadb-maxscale/bin/maxscale(spinlock_acquire+0x1d) [0x521569]
/home/ec2-user/workspace/server/modules/protocol/mysql_backend.c:1167
/usr/local/mariadb-maxscale/bin/maxscale(dcb_close+0x332) [0x527f43]
/home/ec2-user/workspace/server/modules/routing/readconnroute.c:676
/home/ec2-user/workspace/server/modules/protocol/mysql_client.c:1489
/usr/local/mariadb-maxscale/bin/maxscale(dcb_close+0x332) [0x527f43]
/home/ec2-user/workspace/server/modules/protocol/mysql_client.c:925
/usr/local/mariadb-maxscale/bin/maxscale() [0x533ac1]
/usr/local/mariadb-maxscale/bin/maxscale(poll_waitevents+0x634) [0x533380]
/lib64/libpthread.so.0(+0x79d1) [0x7fa2066819d1]
/lib64/libc.so.6(clone+0x6d) [0x7fa204ff286d]

Comment by Dipti Joshi (Inactive) [ 2015-09-12 ]

Per Johan this Jira issue will be resolved when MXS-329 is closed

Comment by Johan Wikman [ 2015-11-22 ]

This was experienced with 1.1.1, which by now is an old version. As it could not readily be reproduced and as there has now been significant changes in areas likely to be related to this issue, this will now be closed.

Generated at Thu Feb 08 03:57:31 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.