[MXS-200] MaxScale crashes with backtrace Created: 2015-06-15  Updated: 2015-12-15  Resolved: 2015-12-15

Status: Closed
Project: MariaDB MaxScale
Component/s: galeramon, readwritesplit
Affects Version/s: 1.1.1, 1.2.1
Fix Version/s: 1.3.0

Type: Bug Priority: Blocker
Reporter: Julian G Assignee: Johan Wikman
Resolution: Cannot Reproduce Votes: 0
Labels: None
Environment:

Debian 7.8 AMD64, KVM Virtualization
MaxScale-Repository: http://code.mariadb.com/mariadb-maxscale/1.1/repo/debian


Attachments: File MaxScale.cnf     File maxscale.log    
Issue Links:
Relates
relates to MXS-358 Crash, Error in `/usr/bin/maxscale':... Closed

 Description   

We are using MaxScale on a Linux server with Apache + PHP5 serving a webshop. Shortly after starting a load test the maxscale daemon crashes with the attached backtrace. We were also able to reproduce the crash on the second identical web server.



 Comments   
Comment by Dipti Joshi (Inactive) [ 2015-07-08 ]

jgold
Can you provide us coredump ?

Comment by Julian G [ 2015-07-09 ]

What steps are needed for a coredump?

Comment by Julian G [ 2015-07-22 ]

I did following steps, but didn't get a core file in /tmp, am I missing something?

ulimit -c unlimited
export DAEMON_COREFILE_LIMIT='unlimited'
echo /tmp/core-%e-%s-%u-%g-%p-%t > /proc/sys/kernel/core_pattern
echo 2 > /proc/sys/fs/suid_dumpable
service maxscale restart

Comment by markus makela [ 2015-08-21 ]

jgold If you would be willing to test this with the debug version of MaxScale 1.2 we could get more information about what's going wrong. You can get the debug build by following the instructions in this comment

Comment by markus makela [ 2015-08-21 ]

The crash happens when the buffer being written to the client is being consumed on line 1049 in dcb.c

Comment by martin brampton (Inactive) [ 2015-08-28 ]

The crash is triggered by a SIGABRT and given that the trace shows calls to gwbuf_consume and gwbuf_free, with a further reference to cfree, it seems very likely that it as the result of freeing the same memory twice.

This could conceivably be caused by processing a DCB that has been killed by the zombie processing in another thread. However, the zombie mechanism should prevent it. Changes made for version 1.3 eliminate a small gap in the logic that could have caused a problem, although it is surprising that a small timing related issue would repeat with any regularity.

Overall, a lot of extra work on checking for areas of risk in the basic DCB mechanisms has been done. It's likely to be difficult to make further progress with this specific problem.

Comment by Johan Wikman [ 2015-09-01 ]

Stracktrace

/home/ec2-user/workspace/server/core/gateway.c:263
/lib/x86_64-linux-gnu/libpthread.so.0(+0xf0a0) [0x7facd0f540a0]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35) [0x7faccf7fb165]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x180) [0x7faccf7fe3e0]
/lib/x86_64-linux-gnu/libc.so.6(+0x6c39b) [0x7faccf83539b]
/lib/x86_64-linux-gnu/libc.so.6(+0x75be6) [0x7faccf83ebe6]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x6c) [0x7faccf84398c]
/home/ec2-user/workspace/server/core/buffer.c:165
/home/ec2-user/workspace/server/core/buffer.c:383
/home/ec2-user/workspace/server/core/dcb.c:1049
/home/ec2-user/workspace/server/modules/protocol/mysql_client.c:558
/home/ec2-user/workspace/server/core/session.c:880
/home/ec2-user/workspace/server/modules/routing/readwritesplit/readwritesplit.c:2831
/home/ec2-user/workspace/server/modules/protocol/mysql_backend.c:568
/home/ec2-user/workspace/server/core/poll.c:877
/home/ec2-user/workspace/server/core/poll.c:609
/lib/x86_64-linux-gnu/libpthread.so.0(+0x6b50) [0x7facd0f4bb50]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7faccf8a495d]

Comment by Johan Wikman [ 2015-09-01 ]

jgold, this crash is most likely caused by a bug that has been fixed in 1.2. It would be great if you could try that out.

Comment by Johan Wikman [ 2015-09-07 ]

This is caused by the same bug outlined in MXS-337.

It is fixed in release 1.2. If an upgrade is not possible, then this needs to be packported to 1.1.1.

Comment by Dipti Joshi (Inactive) [ 2015-09-07 ]

johan.wikman since this has been fixed in 1.2 - better resolution would have been "Fixed" with a fixVersion of 1.2.

Comment by Johan Wikman [ 2015-09-07 ]

Reopened only to change from Won't Fix to Fixed.

Comment by Julian G [ 2015-09-24 ]

maxscale is still crashing

here's the last few lines of the error.log

2015-09-23 16:30:24 Backend hangup -> closing session.
2015-09-23 16:30:27 Client hangup error handling.
2015-09-23 16:30:29 Client error event handling.
2015-09-23 16:30:30 Backend hangup error handling.
2015-09-23 16:30:30 Backend hangup -> closing session.
2015-09-23 16:30:30 Backend hangup error handling.
2015-09-23 16:30:30 Backend hangup error handling.
2015-09-23 16:30:30 Backend hangup error handling.
2015-09-23 16:30:30 Backend hangup -> closing session.
2015-09-23 16:30:30 Backend hangup -> closing session.
2015-09-23 16:30:30 Error : Unable to write to backend due to authentication failure.
2015-09-23 16:30:31 Client hangup error handling.
2015-09-23 16:30:32 debug assert /home/vagrant/workspace/server/modules/routing/readwritesplit/readwritesplit.c:2491
2015-09-23 16:30:32 Fatal: MaxScale received fatal signal 6. Attempting backtrace.
2015-09-23 16:30:32 /usr/bin/maxscale() [0x544b6f]
2015-09-23 16:30:32 /lib/x86_64-linux-gnu/libpthread.so.0(+0xf0a0) [0x7fd5567700a0]
2015-09-23 16:30:32 /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35) [0x7fd555017165]
2015-09-23 16:30:32 /lib/x86_64-linux-gnu/libc.so.6(abort+0x180) [0x7fd55501a3e0]
2015-09-23 16:30:32 /lib/x86_64-linux-gnu/libc.so.6(__assert_fail+0xf1) [0x7fd555010311]
2015-09-23 16:30:32 /usr/lib/x86_64-linux-gnu/maxscale/libreadwritesplit.so(+0x97e1) [0x7fd5407867e1]
2015-09-23 16:30:32 /usr/lib/x86_64-linux-gnu/maxscale/libreadwritesplit.so(+0x7be9) [0x7fd540784be9]
2015-09-23 16:30:32 /usr/lib/x86_64-linux-gnu/maxscale/libMySQLClient.so(+0x981d) [0x7fd53f91181d]
2015-09-23 16:30:32 /usr/lib/x86_64-linux-gnu/maxscale/libMySQLClient.so(+0x774a) [0x7fd53f90f74a]
2015-09-23 16:30:32 /usr/bin/maxscale() [0x55cbf2]
2015-09-23 16:30:32 /usr/bin/maxscale(poll_waitevents+0x6d9) [0x55c1c9]
2015-09-23 16:30:32 /lib/x86_64-linux-gnu/libpthread.so.0(+0x6b50) [0x7fd556767b50]
2015-09-23 16:30:32 /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7fd5550c095d]

should i open a new issue?

Comment by Johan Wikman [ 2015-09-24 ]

jgold What version are you now running? Still 1.1.1 or have you upgraded to 1.2.?

Comment by Julian G [ 2015-09-24 ]

I've upgraded to 1.2 and also tested with 1.2-debug.

Comment by Johan Wikman [ 2015-09-24 ]

Could you try with the branch release-1.2.1.

It contains a number of fixes and will eventually be released as 1.2.1.

Comment by Julian G [ 2015-09-24 ]

Is there a repository where i can get deb packages? Or do I need to compile it from github?

Comment by Johan Wikman [ 2015-09-24 ]

jgold You can download packages from here: http://maxscale-jenkins.mariadb.com/ci-repository/release-1.2.1/mariadb-maxscale/

Comment by Julian G [ 2015-10-06 ]

It's still crashing

2015-10-06 10:48:02 debug assert /home/vagrant/workspace/server/modules/routing/readwritesplit/readwritesplit.c:2584
2015-10-06 10:48:02 Fatal: MaxScale 1.2.1 received fatal signal 6. Attempting backtrace.
2015-10-06 10:48:02 Commit ID: 36b89e7ec484261ce8335b2c8d8e596415beb964 System name: Linux Release string: undefined Embedded library version: 5.5.42-MariaDB
2015-10-06 10:48:02 /usr/bin/maxscale() [0x544d22]
2015-10-06 10:48:02 /lib/x86_64-linux-gnu/libpthread.so.0(+0xf0a0) [0x7ff30f9000a0]
2015-10-06 10:48:02 /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35) [0x7ff30e1a7165]
2015-10-06 10:48:02 /lib/x86_64-linux-gnu/libc.so.6(abort+0x180) [0x7ff30e1aa3e0]
2015-10-06 10:48:02 /lib/x86_64-linux-gnu/libc.so.6(__assert_fail+0xf1) [0x7ff30e1a0311]
2015-10-06 10:48:02 /usr/lib/x86_64-linux-gnu/maxscale/libreadwritesplit.so(+0x9b63) [0x7ff2f92eab63]
2015-10-06 10:48:02 /usr/lib/x86_64-linux-gnu/maxscale/libreadwritesplit.so(+0x7ebc) [0x7ff2f92e8ebc]
2015-10-06 10:48:02 /usr/lib/x86_64-linux-gnu/maxscale/libMySQLClient.so(+0x987c) [0x7ff2f847587c]
2015-10-06 10:48:02 /usr/lib/x86_64-linux-gnu/maxscale/libMySQLClient.so(+0x77a9) [0x7ff2f84737a9]
2015-10-06 10:48:02 /usr/bin/maxscale() [0x55cf4e]
2015-10-06 10:48:02 /usr/bin/maxscale(poll_waitevents+0x6d9) [0x55c525]
2015-10-06 10:48:02 /lib/x86_64-linux-gnu/libpthread.so.0(+0x6b50) [0x7ff30f8f7b50]
2015-10-06 10:48:02 /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7ff30e25095d]

Comment by Johan Wikman [ 2015-10-06 ]

Did you build it yourself or did you download a package?

Comment by Julian G [ 2015-10-06 ]

I've installed the packages from http://maxscale-jenkins.mariadb.com/ci-repository/release-1.2.1/mariadb-maxscale/

Comment by Johan Wikman [ 2015-10-06 ]

Right, that's from our build directory and you seem (based on the commit id) running something older than the final and it also seems to be a debug version. Please download 1.2.1 from here https://mariadb.com/my_portal/download/maxscale and give it another shot.

Comment by Julian G [ 2015-10-06 ]

2015-10-06 15:16:36 Fatal: MaxScale 1.2.1 received fatal signal 6. Attempting backtrace.
2015-10-06 15:16:36 Commit ID: 7bfda81b098bfd0db3c306495725d1910294d2e8 System name: Linux Release string: undefined Embedded library version: 5.5.42-MariaDB
2015-10-06 15:16:36 /usr/bin/maxscale() [0x543f8a]
2015-10-06 15:16:36 /lib/x86_64-linux-gnu/libpthread.so.0(+0xf0a0) [0x7f0c1704e0a0]
2015-10-06 15:16:36 /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35) [0x7f0c158f5165]
2015-10-06 15:16:36 /lib/x86_64-linux-gnu/libc.so.6(abort+0x180) [0x7f0c158f83e0]
2015-10-06 15:16:36 /lib/x86_64-linux-gnu/libc.so.6(+0x6c39b) [0x7f0c1592f39b]
2015-10-06 15:16:36 /lib/x86_64-linux-gnu/libc.so.6(+0x75be6) [0x7f0c15938be6]
2015-10-06 15:16:36 /lib/x86_64-linux-gnu/libc.so.6(cfree+0x6c) [0x7f0c1593d98c]
2015-10-06 15:16:36 /usr/bin/maxscale(gwbuf_free+0x41) [0x542efc]
2015-10-06 15:16:36 /usr/bin/maxscale(gwbuf_consume+0xaa) [0x543476]
2015-10-06 15:16:36 /usr/bin/maxscale(dcb_write+0x829) [0x54ab2e]
2015-10-06 15:16:36 /usr/lib/x86_64-linux-gnu/maxscale/libMySQLBackend.so(gw_send_authentication_to_backend+0x515) [0x7f0bf8021d66]
2015-10-06 15:16:36 /usr/lib/x86_64-linux-gnu/maxscale/libMySQLBackend.so(+0x4dec) [0x7f0bf801ddec]
2015-10-06 15:16:36 /usr/bin/maxscale() [0x5598ea]
2015-10-06 15:16:36 /usr/bin/maxscale(poll_waitevents+0x61b) [0x5591b1]
2015-10-06 15:16:36 /lib/x86_64-linux-gnu/libpthread.so.0(+0x6b50) [0x7f0c17045b50]
2015-10-06 15:16:36 /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f0c1599e95d]

Comment by Johan Wikman [ 2015-10-06 ]

That's unfortunate. Thanks for reporting. We'll investigate.

Comment by Dipti Joshi (Inactive) [ 2015-10-06 ]

johan.wikman Does this new crash look same as MXS-379 now ?

Comment by Johan Wikman [ 2015-10-07 ]

Translated stacktrace:

/home/vagrant/workspace/server/core/gateway.c:362
/lib/x86_64-linux-gnu/libpthread.so.0(+0xf0a0) [0x7f0c1704e0a0]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35) [0x7f0c158f5165]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x180) [0x7f0c158f83e0]
/lib/x86_64-linux-gnu/libc.so.6(+0x6c39b) [0x7f0c1592f39b]
/lib/x86_64-linux-gnu/libc.so.6(+0x75be6) [0x7f0c15938be6]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x6c) [0x7f0c1593d98c]
/home/vagrant/workspace/server/core/buffer.c:140
/home/vagrant/workspace/server/core/buffer.c:384
/home/vagrant/workspace/server/core/dcb.c:1313
/home/vagrant/workspace/server/modules/protocol/mysql_common.c:728
/home/vagrant/workspace/server/modules/protocol/mysql_backend.c:228
/home/vagrant/workspace/server/core/poll.c:878
/home/vagrant/workspace/server/core/poll.c:610
/lib/x86_64-linux-gnu/libpthread.so.0(+0x6b50) [0x7f0c17045b50]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f0c1599e95d]

Comment by Johan Wikman [ 2015-10-07 ]

jgold if it's not too inconvenient, could you please attach a core file?
Instructions for generating a core file can be found here: https://github.com/mariadb-corporation/MaxScale/wiki/Obtaining-a-core-file

Comment by Johan Wikman [ 2015-10-07 ]

dshjoshi No, this is not the same as MXS-379, but it seems to be identical with MXS-358.

Comment by martin brampton (Inactive) [ 2015-10-28 ]

As commented above, the logic of dcb_write has been substantially overhauled but those changes do not appear to be in any release. It would be most helpful to know whether the problem still exists with the newer code. As of right now, the best code for this would appear to be branch MXS-329. If the newer code does not fix the problem, the greater clarity will make diagnosis easier.

Comment by Julian G [ 2015-10-28 ]

are there debian packages available? i've found http://maxscale-jenkins.mariadb.com/ci-repository/MXS-329/ but it only contains centos packages.

Comment by Jonathan Frank [ 2015-12-10 ]

We just deployed Maxscale into production yesterday and have the same problem as reported here. For now, we have reverted to HA Proxy. As for the operating system, we are using Ubuntu Trusty 14.04 64-bit.

The backtrace looks the same as the one posted here:

2015-12-10 14:03:30 Fatal: MaxScale 1.2.1 received fatal signal 6. Attempting backtrace.
2015-12-10 14:03:30 Commit ID: 7bfda81b098bfd0db3c306495725d1910294d2e8 System name: Linux Release string: Ubuntu 14.04.2 LTS Embedded library version: 5.5.42-MariaDB
2015-12-10 14:03:30 /usr/bin/maxscale() [0x549855]
2015-12-10 14:03:30 /lib/x86_64-linux-gnu/libpthread.so.0(+0x10340) [0x7f2daa63d340]
2015-12-10 14:03:30 /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x39) [0x7f2da9275cc9]
2015-12-10 14:03:30 /lib/x86_64-linux-gnu/libc.so.6(abort+0x148) [0x7f2da92790d8]
2015-12-10 14:03:30 /lib/x86_64-linux-gnu/libc.so.6(+0x73394) [0x7f2da92b2394]
2015-12-10 14:03:30 /lib/x86_64-linux-gnu/libc.so.6(+0x7f66e) [0x7f2da92be66e]
2015-12-10 14:03:30 /usr/bin/maxscale(gwbuf_free+0x3d) [0x5487e5]
2015-12-10 14:03:30 /usr/bin/maxscale(gwbuf_consume+0xa6) [0x548d55]
2015-12-10 14:03:30 /usr/bin/maxscale(dcb_write+0x835) [0x550717]
2015-12-10 14:03:30 /usr/lib/x86_64-linux-gnu/maxscale/libMySQLBackend.so(gw_send_authentication_to_backend+0x648) [0x7f2da0257ad4]
2015-12-10 14:03:30 /usr/lib/x86_64-linux-gnu/maxscale/libMySQLBackend.so(+0x479f) [0x7f2da025379f]
2015-12-10 14:03:30 /usr/bin/maxscale() [0x55fcca]
2015-12-10 14:03:30 /usr/bin/maxscale(poll_waitevents+0x6d4) [0x55f57b]
2015-12-10 14:03:30 /usr/bin/maxscale(main+0x1dc5) [0x54c42d]
2015-12-10 14:03:30 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7f2da9260ec5]
2015-12-10 14:03:30 /usr/bin/maxscale() [0x54852a]

Comment by Johan Wikman [ 2015-12-10 ]

Indeed looks very similar. However, we were never able to reproduce exactly this one, although we tried.

We will shortly release 1.3 beta, where a number of concurrency issues have addressed. Hopefully they will make this one disappear. I will let you know when a Ubuntu version is available.

Comment by Johan Wikman [ 2015-12-15 ]

Even though the evidence clearly shows that there is a problem, I will close this now as we haven't been able to reproduce it in-house. Also, the expectation is that the problem is caused by some of the concurrency issues that have been corrected in 1.3.

If the problem is still present with 1.3, please reopen this one or create a new report.

1.3 beta is available at: http://maxscale-jenkins.mariadb.com/ci-repository/1.3.0-beta-release/

Generated at Thu Feb 08 03:57:32 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.