[MXS-200] MaxScale crashes with backtrace Created: 2015-06-15 Updated: 2015-12-15 Resolved: 2015-12-15 |
|
| Status: | Closed |
| Project: | MariaDB MaxScale |
| Component/s: | galeramon, readwritesplit |
| Affects Version/s: | 1.1.1, 1.2.1 |
| Fix Version/s: | 1.3.0 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Julian G | Assignee: | Johan Wikman |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Debian 7.8 AMD64, KVM Virtualization |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Description |
|
We are using MaxScale on a Linux server with Apache + PHP5 serving a webshop. Shortly after starting a load test the maxscale daemon crashes with the attached backtrace. We were also able to reproduce the crash on the second identical web server. |
| Comments |
| Comment by Dipti Joshi (Inactive) [ 2015-07-08 ] | ||||||||||||||||||
|
jgold | ||||||||||||||||||
| Comment by Julian G [ 2015-07-09 ] | ||||||||||||||||||
|
What steps are needed for a coredump? | ||||||||||||||||||
| Comment by Julian G [ 2015-07-22 ] | ||||||||||||||||||
|
I did following steps, but didn't get a core file in /tmp, am I missing something? ulimit -c unlimited | ||||||||||||||||||
| Comment by markus makela [ 2015-08-21 ] | ||||||||||||||||||
|
jgold If you would be willing to test this with the debug version of MaxScale 1.2 we could get more information about what's going wrong. You can get the debug build by following the instructions in this comment | ||||||||||||||||||
| Comment by markus makela [ 2015-08-21 ] | ||||||||||||||||||
|
The crash happens when the buffer being written to the client is being consumed on line 1049 in dcb.c | ||||||||||||||||||
| Comment by martin brampton (Inactive) [ 2015-08-28 ] | ||||||||||||||||||
|
The crash is triggered by a SIGABRT and given that the trace shows calls to gwbuf_consume and gwbuf_free, with a further reference to cfree, it seems very likely that it as the result of freeing the same memory twice. This could conceivably be caused by processing a DCB that has been killed by the zombie processing in another thread. However, the zombie mechanism should prevent it. Changes made for version 1.3 eliminate a small gap in the logic that could have caused a problem, although it is surprising that a small timing related issue would repeat with any regularity. Overall, a lot of extra work on checking for areas of risk in the basic DCB mechanisms has been done. It's likely to be difficult to make further progress with this specific problem. | ||||||||||||||||||
| Comment by Johan Wikman [ 2015-09-01 ] | ||||||||||||||||||
|
Stracktrace
| ||||||||||||||||||
| Comment by Johan Wikman [ 2015-09-01 ] | ||||||||||||||||||
|
jgold, this crash is most likely caused by a bug that has been fixed in 1.2. It would be great if you could try that out. | ||||||||||||||||||
| Comment by Johan Wikman [ 2015-09-07 ] | ||||||||||||||||||
|
This is caused by the same bug outlined in It is fixed in release 1.2. If an upgrade is not possible, then this needs to be packported to 1.1.1. | ||||||||||||||||||
| Comment by Dipti Joshi (Inactive) [ 2015-09-07 ] | ||||||||||||||||||
|
johan.wikman since this has been fixed in 1.2 - better resolution would have been "Fixed" with a fixVersion of 1.2. | ||||||||||||||||||
| Comment by Johan Wikman [ 2015-09-07 ] | ||||||||||||||||||
|
Reopened only to change from Won't Fix to Fixed. | ||||||||||||||||||
| Comment by Julian G [ 2015-09-24 ] | ||||||||||||||||||
|
maxscale is still crashing here's the last few lines of the error.log 2015-09-23 16:30:24 Backend hangup -> closing session. should i open a new issue? | ||||||||||||||||||
| Comment by Johan Wikman [ 2015-09-24 ] | ||||||||||||||||||
|
jgold What version are you now running? Still 1.1.1 or have you upgraded to 1.2.? | ||||||||||||||||||
| Comment by Julian G [ 2015-09-24 ] | ||||||||||||||||||
|
I've upgraded to 1.2 and also tested with 1.2-debug. | ||||||||||||||||||
| Comment by Johan Wikman [ 2015-09-24 ] | ||||||||||||||||||
|
Could you try with the branch release-1.2.1. It contains a number of fixes and will eventually be released as 1.2.1. | ||||||||||||||||||
| Comment by Julian G [ 2015-09-24 ] | ||||||||||||||||||
|
Is there a repository where i can get deb packages? Or do I need to compile it from github? | ||||||||||||||||||
| Comment by Johan Wikman [ 2015-09-24 ] | ||||||||||||||||||
|
jgold You can download packages from here: http://maxscale-jenkins.mariadb.com/ci-repository/release-1.2.1/mariadb-maxscale/ | ||||||||||||||||||
| Comment by Julian G [ 2015-10-06 ] | ||||||||||||||||||
|
It's still crashing 2015-10-06 10:48:02 debug assert /home/vagrant/workspace/server/modules/routing/readwritesplit/readwritesplit.c:2584 | ||||||||||||||||||
| Comment by Johan Wikman [ 2015-10-06 ] | ||||||||||||||||||
|
Did you build it yourself or did you download a package? | ||||||||||||||||||
| Comment by Julian G [ 2015-10-06 ] | ||||||||||||||||||
|
I've installed the packages from http://maxscale-jenkins.mariadb.com/ci-repository/release-1.2.1/mariadb-maxscale/ | ||||||||||||||||||
| Comment by Johan Wikman [ 2015-10-06 ] | ||||||||||||||||||
|
Right, that's from our build directory and you seem (based on the commit id) running something older than the final and it also seems to be a debug version. Please download 1.2.1 from here https://mariadb.com/my_portal/download/maxscale and give it another shot. | ||||||||||||||||||
| Comment by Julian G [ 2015-10-06 ] | ||||||||||||||||||
|
2015-10-06 15:16:36 Fatal: MaxScale 1.2.1 received fatal signal 6. Attempting backtrace. | ||||||||||||||||||
| Comment by Johan Wikman [ 2015-10-06 ] | ||||||||||||||||||
|
That's unfortunate. Thanks for reporting. We'll investigate. | ||||||||||||||||||
| Comment by Dipti Joshi (Inactive) [ 2015-10-06 ] | ||||||||||||||||||
|
johan.wikman Does this new crash look same as | ||||||||||||||||||
| Comment by Johan Wikman [ 2015-10-07 ] | ||||||||||||||||||
|
Translated stacktrace: /home/vagrant/workspace/server/core/gateway.c:362 | ||||||||||||||||||
| Comment by Johan Wikman [ 2015-10-07 ] | ||||||||||||||||||
|
jgold if it's not too inconvenient, could you please attach a core file? | ||||||||||||||||||
| Comment by Johan Wikman [ 2015-10-07 ] | ||||||||||||||||||
|
dshjoshi No, this is not the same as | ||||||||||||||||||
| Comment by martin brampton (Inactive) [ 2015-10-28 ] | ||||||||||||||||||
|
As commented above, the logic of dcb_write has been substantially overhauled but those changes do not appear to be in any release. It would be most helpful to know whether the problem still exists with the newer code. As of right now, the best code for this would appear to be branch | ||||||||||||||||||
| Comment by Julian G [ 2015-10-28 ] | ||||||||||||||||||
|
are there debian packages available? i've found http://maxscale-jenkins.mariadb.com/ci-repository/MXS-329/ but it only contains centos packages. | ||||||||||||||||||
| Comment by Jonathan Frank [ 2015-12-10 ] | ||||||||||||||||||
|
We just deployed Maxscale into production yesterday and have the same problem as reported here. For now, we have reverted to HA Proxy. As for the operating system, we are using Ubuntu Trusty 14.04 64-bit. The backtrace looks the same as the one posted here: 2015-12-10 14:03:30 Fatal: MaxScale 1.2.1 received fatal signal 6. Attempting backtrace. | ||||||||||||||||||
| Comment by Johan Wikman [ 2015-12-10 ] | ||||||||||||||||||
|
Indeed looks very similar. However, we were never able to reproduce exactly this one, although we tried. We will shortly release 1.3 beta, where a number of concurrency issues have addressed. Hopefully they will make this one disappear. I will let you know when a Ubuntu version is available. | ||||||||||||||||||
| Comment by Johan Wikman [ 2015-12-15 ] | ||||||||||||||||||
|
Even though the evidence clearly shows that there is a problem, I will close this now as we haven't been able to reproduce it in-house. Also, the expectation is that the problem is caused by some of the concurrency issues that have been corrected in 1.3. If the problem is still present with 1.3, please reopen this one or create a new report. 1.3 beta is available at: http://maxscale-jenkins.mariadb.com/ci-repository/1.3.0-beta-release/ |