[MXS-198] MaxScale received fatal signal 11 Created: 2015-06-15 Updated: 2023-01-10 Resolved: 2015-11-22 |
|
| Status: | Closed |
| Project: | MariaDB MaxScale |
| Component/s: | Core |
| Affects Version/s: | 1.1.1 |
| Fix Version/s: | 1.3.0 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Tibs | Assignee: | markus makela |
| Resolution: | Cannot Reproduce | Votes: | 1 |
| Labels: | None | ||
| Environment: |
Centos 6.5 |
||
| Issue Links: |
|
||||||||
| Description |
|
Hi, I'm using MaxScale proxy on more than 40 boxes and now I got a weird error on one of the box. After I restarted MaxScale everything is fine again, but I don't know what caused the problem. If you need anything else just let me know. |
| Comments |
| Comment by Dipti Joshi (Inactive) [ 2015-06-15 ] | |||||||||||||
|
Can you attach your MaxScale.cnf file ? Also, when you say MaxScale on 40 boxes, can you describe on your configuration ? How are client applications communicating to each of the MaxScale ? How many database servers are behind each MaxScale ? And out of curiosity why do you have 40 boxes with MaxScale running ? | |||||||||||||
| Comment by Tibs [ 2015-06-15 ] | |||||||||||||
|
Hi Dipti, Thank you for your answer, on the app servers we run the MaxScale too and it listening on 127.0.0.1, so the applications connect to the Maxscale by 127.0.0.1 and behind the MaxScale we have 4-6 Mysql boxes (we have much more Mysql servers but now we are still testing MaxScale). If we loose a Mysql server in that case the app servers can work without any problem, because Maxscale will not send traffic to the wrong server. As I wrote we are still testing, maybe the configuration will change in the future but know we are using on this way. [maxscale] [MySQL Monitor] [qla] [fetch] #[Write Service] [Read Connection Router] [Debug Interface] [CLI] #[Write Listener] [Read Connection Listener] [Debug Listener] [CLI Listener] [server1] [server2] if you need anything else just let me know. | |||||||||||||
| Comment by Geoff Montee (Inactive) [ 2015-06-17 ] | |||||||||||||
|
I've seen a similar, but slightly different, backtrace from MaxScale 1.1.1:
| |||||||||||||
| Comment by Geoff Montee (Inactive) [ 2015-06-17 ] | |||||||||||||
|
Similar backtrace submitted to | |||||||||||||
| Comment by Matt wells [ 2015-06-17 ] | |||||||||||||
|
I think we are seeing this issue happen at the same time a backup kicks off. Maybe this could have something to do with wsrep_desync turning on and off? We've hadd maxscale fail 3 times in the last 12 hours. | |||||||||||||
| Comment by Dipti Joshi (Inactive) [ 2015-06-17 ] | |||||||||||||
|
matt131We created separate Jira item | |||||||||||||
| Comment by Dipti Joshi (Inactive) [ 2015-06-17 ] | |||||||||||||
|
Tibs Can you please attach coredump file ? | |||||||||||||
| Comment by Dipti Joshi (Inactive) [ 2015-06-21 ] | |||||||||||||
|
Tibs Please generate and attach coredump with this issue. Please also attach log files. You need do following to enable coredump
After this run your traffic that causes fatal singal 11, you should find the core file in /tmp directory. It will be named core-maxscale-* | |||||||||||||
| Comment by Dipti Joshi (Inactive) [ 2015-06-24 ] | |||||||||||||
|
Tibs Please provide the exact OS version of your system. | |||||||||||||
| Comment by Dipti Joshi (Inactive) [ 2015-06-26 ] | |||||||||||||
|
Tibs Thanks for the OS version. Here you can download MaxScale debug build rpm from http://maxscale-jenkins.mariadb.com/repository/1.1.1-debug/mariadb-maxscale/yum/centos6/x86_64/ Change repo config in your /etc/yum.repo.d/maxscale.repo
Then do following to reinstall maxscale
Next ,enable coredump
After this run your traffic that causes fatal singal 11, you should find the core file in /tmp directory. It will be named core-maxscale-* | |||||||||||||
| Comment by markus makela [ 2015-06-26 ] | |||||||||||||
|
It is better to uninstall maxscale and then reinstall maxscale. This will guarantee that the right version is installed.
| |||||||||||||
| Comment by Dipti Joshi (Inactive) [ 2015-08-17 ] | |||||||||||||
|
markus makela Can you pin point where these two locations are llibMySQLBackend.so(+0x44a3) and libreadconnroute.so(+0x20c8) in the source code (1.1.1 on CentOS 6.5) ? | |||||||||||||
| Comment by markus makela [ 2015-08-17 ] | |||||||||||||
|
By the looks of it, the crash occurred when the backend DCB was closed and the session spinlock was being acquired. The 1.1.1 source code has the DCB being closed on line 673 in readconnroute.c, the protocol closing function is called on line 1279 in dcb.c and the spinlock is acquired on line 1162 in mysql_backend.c. Since the crash occurs on the acquisition of the spinlock, it is possible that the session pointer in that function is invalid. A likely cause for this is a corrupt DCB. | |||||||||||||
| Comment by Johan Wikman [ 2015-09-02 ] | |||||||||||||
|
More readable stacktrace:
| |||||||||||||
| Comment by Dipti Joshi (Inactive) [ 2015-09-12 ] | |||||||||||||
|
Per Johan this Jira issue will be resolved when | |||||||||||||
| Comment by Johan Wikman [ 2015-11-22 ] | |||||||||||||
|
This was experienced with 1.1.1, which by now is an old version. As it could not readily be reproduced and as there has now been significant changes in areas likely to be related to this issue, this will now be closed. |