[MXS-207] MaxScale received fatal signal 11 (libreadwritesplit) Created: 2015-06-17  Updated: 2015-10-19  Resolved: 2015-08-31

Status: Closed
Project: MariaDB MaxScale
Component/s: Core, readwritesplit
Affects Version/s: 1.1.1
Fix Version/s: 1.2.1

Type: Bug Priority: Blocker
Reporter: Geoff Montee (Inactive) Assignee: markus makela
Resolution: Fixed Votes: 0
Labels: None

Attachments: File MaxScale.cnf     File cluster.cnf     Text File maxscale-error.log     File my.cnf    
Issue Links:
Duplicate
is duplicated by MXS-338 MaxScale 1.1.1 crashed with Signal 11 Closed
Relates
relates to MXS-414 Maxscale crashed every day! Closed
relates to MXS-329 The session pointer in a DCB can be n... Closed

 Description   

Similar to MXS-198.

2015-06-16 15:37:07 Fatal: MaxScale received fatal signal 11. Attempting backtrace.
2015-06-16 15:37:07 /usr/local/mariadb-maxscale/bin/maxscale() [0x5476d8]
2015-06-16 15:37:07 /lib64/libpthread.so.0(+0xf130) [0x7f2ed18f7130]
2015-06-16 15:37:07 /usr/local/mariadb-maxscale/modules/libreadwritesplit.so(+0x3d91) [0x7f2eb79fbd91]
2015-06-16 15:37:07 /usr/local/mariadb-maxscale/modules/libreadwritesplit.so(+0x4e82) [0x7f2eb79fce82]
2015-06-16 15:37:07 /usr/local/mariadb-maxscale/modules/libreadwritesplit.so(+0x495c) [0x7f2eb79fc95c]
2015-06-16 15:37:07 /usr/local/mariadb-maxscale/modules/libMySQLClient.so(+0x4c73) [0x7f2eb65cdc73]
2015-06-16 15:37:07 /usr/local/mariadb-maxscale/modules/libMySQLClient.so(+0x3a3e) [0x7f2eb65cca3e]
2015-06-16 15:37:07 /usr/local/mariadb-maxscale/bin/maxscale() [0x5596d2]
2015-06-16 15:37:07 /usr/local/mariadb-maxscale/bin/maxscale(poll_waitevents+0x616) [0x558f9b]
2015-06-16 15:37:07 /usr/local/mariadb-maxscale/bin/maxscale(main+0x1a10) [0x54a1a3]
2015-06-16 15:37:07 /lib64/libc.so.6(__libc_start_main+0xf5) [0x7f2ed00e9af5]
2015-06-16 15:37:07 /usr/local/mariadb-maxscale/bin/maxscale() [0x54651d]



 Comments   
Comment by Dipti Joshi (Inactive) [ 2015-06-17 ]

GeoffMontee Please attach MaxScale.cnf

Comment by Dipti Joshi (Inactive) [ 2015-06-17 ]

Massimiliano Pinto, markus makela Please analyze this ticket.

Comment by Dipti Joshi (Inactive) [ 2015-06-17 ]

Comment by matt131:
I think we are seeing this issue happen at the same time a backup kicks off. Maybe this could have something to do with wsrep_desync turning on and off? We've hadd maxscale fail 3 times in the last 12 hours.

Comment by Matt wells [ 2015-06-18 ]

Just to confirm, we've disabled backups and the problem has stopped. This issue has something to do with how backups were being ran.

Do we have an insight from the application team?

-Matt

Comment by Dipti Joshi (Inactive) [ 2015-06-18 ]

matt131 Can you please locate the core file of crashed MaxScale from your environment and provide it to us ?

Comment by Matt wells [ 2015-06-18 ]

Hi,

Do you have instructions on finding that file?

-Matt

Comment by markus makela [ 2015-06-18 ]

I tested with MaxScale and a four node Galera cluster. I set wsrep_desync manually to ON on master and slave nodes and could not reproduce the crash. I also tried doing a backup on master and slave nodes with both xtrabackup and mariadb-backup and could not reproduce the crash.

If you can provide more information about the method of backup that would be much appreciated.

Comment by markus makela [ 2015-06-25 ]

Repeated the previous tests while also blocking network traffic to various nodes. I could not repeat the crash.

Comment by Timofey Turenko [ 2015-06-29 ]

Following info is needed:

  • Maxscale configuration (MaxScale.cnf file)
  • Linux distribution version
  • backend configuration (is it Master/slave or Galera?)
  • backend servers parameters (my.cnf)
  • backend MariaDB/MySQL version
Comment by Dipti Joshi (Inactive) [ 2015-06-29 ]

tturenko Please check in eventum 9510

Comment by Guillaume Lefranc [ 2015-07-18 ]

Hello, what is the status of this issue? Is it fixed in 1.2?

Comment by Bryan Traywick [ 2015-07-24 ]

I've experienced this issue several times as well. Here is the request info:

  • All servers are running Ubuntu 14.04
  • MaxScale 1.1.1
  • MariaDB Galera Server 10.0.17
  • The backend configuration is a 3 node Galera cluster
  • Using the R/W Splitter

The crash MaxScale and MariaDB configs are attached as well as the relevant
snippet of the MaxScale error log.

Comment by Dipti Joshi (Inactive) [ 2015-07-26 ]

bryantraywick , Can you install a debug version of the build to reproduce the crash ?

Here is what you need to do

(1) uninstall current version of MaxScale and install debug version

service maxscale stop
sudo apt-get remove maxscale
deb http://downloads.mariadb.com/enterprise/f8rm-9k90/mariadb-maxscale/1.2-debug/ubuntu/  trusty main

(2) Configure maxscale per your environment

(3) Enable coredump

ulimit -c unlimited
export DAEMON_COREFILE_LIMIT='unlimited'
echo /tmp/core-%e-%s-%u-%g-%p-%t > /proc/sys/kernel/core_pattern
echo 2 > /proc/sys/fs/suid_dumpable
service maxscale restart

(3) Run your traffic that leads to crash

Now the coredump will be in /tmp and will have name patterned as core-maxscale-*

Please collect the coredump as well as config file and log files and attach to this Jira.

The coredump will help us easily identify the root cause and fix it.

Comment by martin brampton (Inactive) [ 2015-08-20 ]

A report from a crash while running a debug build would be extremely helpful for making progress with this fault. Please note that the debug and trace logs should configured off for production running of a debug build, since otherwise you will suffer excessive log size. This is done through the MaxScale configuration file, something like:

[maxscale]
threads=6
log_debug=0
log_trace=0
 

Comment by Dipti Joshi (Inactive) [ 2015-08-21 ]

markus makela based on stacktrace please identify the line number where this is crashing in readwrite split.

Comment by martin brampton (Inactive) [ 2015-08-28 ]

The failure looks likely to have been at what was line 1651 in release 1.1.1 (1652 in release 1.2) which was:

data = (MYSQL_session*)master_dcb->session->data;

This could be caused by either master_dcb or session being NULL, and the latter is believed to be more likely. MXS-329 is working to guarantee that the session pointer in a DCB cannot be NULL.

Comment by markus makela [ 2015-08-31 ]

NULL value checks were added to the temporary table functions and the calls to those functions are now properly done under a spinlock. As was stated before, this is a treatment to the symptoms of NULL DCB/Session pointers and the real fix will be done in MXS-329.

Generated at Thu Feb 08 03:57:35 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.