[MXS-207] MaxScale received fatal signal 11 (libreadwritesplit) Created: 2015-06-17 Updated: 2015-10-19 Resolved: 2015-08-31 |
|
| Status: | Closed |
| Project: | MariaDB MaxScale |
| Component/s: | Core, readwritesplit |
| Affects Version/s: | 1.1.1 |
| Fix Version/s: | 1.2.1 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Geoff Montee (Inactive) | Assignee: | markus makela |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||
| Description |
|
Similar to
|
| Comments |
| Comment by Dipti Joshi (Inactive) [ 2015-06-17 ] | ||||||||
|
GeoffMontee Please attach MaxScale.cnf | ||||||||
| Comment by Dipti Joshi (Inactive) [ 2015-06-17 ] | ||||||||
|
Massimiliano Pinto, markus makela Please analyze this ticket. | ||||||||
| Comment by Dipti Joshi (Inactive) [ 2015-06-17 ] | ||||||||
|
Comment by matt131: | ||||||||
| Comment by Matt wells [ 2015-06-18 ] | ||||||||
|
Just to confirm, we've disabled backups and the problem has stopped. This issue has something to do with how backups were being ran. Do we have an insight from the application team? -Matt | ||||||||
| Comment by Dipti Joshi (Inactive) [ 2015-06-18 ] | ||||||||
|
matt131 Can you please locate the core file of crashed MaxScale from your environment and provide it to us ? | ||||||||
| Comment by Matt wells [ 2015-06-18 ] | ||||||||
|
Hi, Do you have instructions on finding that file? -Matt | ||||||||
| Comment by markus makela [ 2015-06-18 ] | ||||||||
|
I tested with MaxScale and a four node Galera cluster. I set wsrep_desync manually to ON on master and slave nodes and could not reproduce the crash. I also tried doing a backup on master and slave nodes with both xtrabackup and mariadb-backup and could not reproduce the crash. If you can provide more information about the method of backup that would be much appreciated. | ||||||||
| Comment by markus makela [ 2015-06-25 ] | ||||||||
|
Repeated the previous tests while also blocking network traffic to various nodes. I could not repeat the crash. | ||||||||
| Comment by Timofey Turenko [ 2015-06-29 ] | ||||||||
|
Following info is needed:
| ||||||||
| Comment by Dipti Joshi (Inactive) [ 2015-06-29 ] | ||||||||
|
tturenko Please check in eventum 9510 | ||||||||
| Comment by Guillaume Lefranc [ 2015-07-18 ] | ||||||||
|
Hello, what is the status of this issue? Is it fixed in 1.2? | ||||||||
| Comment by Bryan Traywick [ 2015-07-24 ] | ||||||||
|
I've experienced this issue several times as well. Here is the request info:
The crash MaxScale and MariaDB configs are attached as well as the relevant | ||||||||
| Comment by Dipti Joshi (Inactive) [ 2015-07-26 ] | ||||||||
|
bryantraywick , Can you install a debug version of the build to reproduce the crash ? Here is what you need to do (1) uninstall current version of MaxScale and install debug version
(2) Configure maxscale per your environment (3) Enable coredump
(3) Run your traffic that leads to crash Now the coredump will be in /tmp and will have name patterned as core-maxscale-* Please collect the coredump as well as config file and log files and attach to this Jira. The coredump will help us easily identify the root cause and fix it. | ||||||||
| Comment by martin brampton (Inactive) [ 2015-08-20 ] | ||||||||
|
A report from a crash while running a debug build would be extremely helpful for making progress with this fault. Please note that the debug and trace logs should configured off for production running of a debug build, since otherwise you will suffer excessive log size. This is done through the MaxScale configuration file, something like:
| ||||||||
| Comment by Dipti Joshi (Inactive) [ 2015-08-21 ] | ||||||||
|
markus makela based on stacktrace please identify the line number where this is crashing in readwrite split. | ||||||||
| Comment by martin brampton (Inactive) [ 2015-08-28 ] | ||||||||
|
The failure looks likely to have been at what was line 1651 in release 1.1.1 (1652 in release 1.2) which was:
This could be caused by either master_dcb or session being NULL, and the latter is believed to be more likely. | ||||||||
| Comment by markus makela [ 2015-08-31 ] | ||||||||
|
NULL value checks were added to the temporary table functions and the calls to those functions are now properly done under a spinlock. As was stated before, this is a treatment to the symptoms of NULL DCB/Session pointers and the real fix will be done in |