[MDEV-10167] MariaDB crashes with Xen PVH - mysqld got signal 11 Created: 2016-06-02 Updated: 2016-09-21 Resolved: 2016-09-21 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Affects Version/s: | 10.0.23, 10.0.25, 10.0 |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Major |
| Reporter: | Tomas Mozes | Assignee: | Jan Lindström (Inactive) |
| Resolution: | Not a Bug | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Gentoo Linux, latest stable Hardware: Software: |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Description |
|
Hello, 160527 9:34:05 [ERROR] mysqld got signal 11 ; We have a master-slave replication. We first upgraded the slave to 10.0.25, had it running for 2 weeks without problems. Then we did a master failover, it worked for a week and then it started to crash. First it crashed on MariaDB 10.0.25, then again on 10.0.25. We downgraded to MariaDB 10.0.23 (that version worked before the system update), but it crashed again. Then we switched back to the old master instance (failover) with MariaDB 10.0.23, but it crashed again. The MariaDB instances are running virtualized as Xen DomU (in pvh mode) on Linux Kernel 4.1.24 with 230GB RAM / 24 vcpus. MariaDB was compiled with the following Gentoo USE flags: "extraengine jemalloc openssl pam server". The database is mostly used for selects (1000-3000 qps). |
| Comments |
| Comment by Tomas Mozes [ 2016-06-02 ] | ||||||||||||||||||||||||||
|
Gentoo bug report: https://bugs.gentoo.org/show_bug.cgi?id=584828 | ||||||||||||||||||||||||||
| Comment by Elena Stepanova [ 2016-06-03 ] | ||||||||||||||||||||||||||
|
It could have been the notorious 'long semaphore wait' crash, except that it used to be SIGABRT, not SIGSEGV. hydrapolic, could you please provide a bigger portion of the error log, the whole session since the server startup and till the restart? | ||||||||||||||||||||||||||
| Comment by Tomas Mozes [ 2016-06-03 ] | ||||||||||||||||||||||||||
|
I've attached the server log. This is from the slave where we upgraded to version 10.0.25 on 5.5.2016, then we did a failover later on and the first crash appeared 27.5.2016. | ||||||||||||||||||||||||||
| Comment by Brian Evans [ 2016-06-08 ] | ||||||||||||||||||||||||||
|
This may be related or a possible duplicate of | ||||||||||||||||||||||||||
| Comment by Elena Stepanova [ 2016-06-09 ] | ||||||||||||||||||||||||||
|
grknight, how do you figure? Do you see any resemblance in the crash stack trace, output, or anywhere else? | ||||||||||||||||||||||||||
| Comment by Brian Evans [ 2016-06-09 ] | ||||||||||||||||||||||||||
|
@elenst, after review of the documents provided here, I must have misunderstood the OPs issue when i talked to him privately on IRC. The dpaste is about to expire, so I'll repost it here
| ||||||||||||||||||||||||||
| Comment by Elena Stepanova [ 2016-06-09 ] | ||||||||||||||||||||||||||
|
grknight, thanks for this, it makes much more sense now; and you might still be right about it being a duplicate of If we look at the attached crash logs and error log as a whole, the difference with the typical "long semaphore wait" problem is even more obvious.
256 seconds is almost exactly the time between the crash and the monitor output. So, my best guess is that the server gets the SIGSEGV on some reason (which might well be | ||||||||||||||||||||||||||
| Comment by Tomas Mozes [ 2016-06-10 ] | ||||||||||||||||||||||||||
|
We had a chat with Brian on irc while trying to bring some more light on this bug report. I was trying to compile MariaDB on Gentoo in ways it would produce the most information to proceed and I was referring to The dpaste output Brian posted was from the testing machine with MariaDB 10.0.25 while trying to crash it with That said, I don't think these two bugs have something in common - if you crash with It seems we have identified the root cause of these crashes. Last week, we disabled Xen PVH mode (switching back to PV mode) and there was no crash afterwards (yet). I'll wait another week and if it won't crash again, I'll open a bug report on Xen. | ||||||||||||||||||||||||||
| Comment by Tomas Mozes [ 2016-06-21 ] | ||||||||||||||||||||||||||
|
http://lists.xenproject.org/archives/html/xen-users/2016-06/msg00089.html | ||||||||||||||||||||||||||
| Comment by Jan Lindström (Inactive) [ 2016-06-21 ] | ||||||||||||||||||||||||||
|
Some of the crashes are long semaphore waits, but not all. Too bad that you did not provide full unedited error logs for all crash cases. Not familiar with Xen PVH or PV mode. | ||||||||||||||||||||||||||
| Comment by Tomas Mozes [ 2016-06-21 ] | ||||||||||||||||||||||||||
|
@jplindst, please see the attached mysqld.err. | ||||||||||||||||||||||||||
| Comment by Jan Lindström (Inactive) [ 2016-06-21 ] | ||||||||||||||||||||||||||
|
I did, it does not contain the long semaphore wait error messages seen on other error logs and from startup to crash there is several hours and no error messages. | ||||||||||||||||||||||||||
| Comment by Tomas Mozes [ 2016-06-21 ] | ||||||||||||||||||||||||||
|
@jplindst, I first attached the crash logs, then the surrounding logs. The information there is complete, just not in one place (sorry for that), so mysqld.err contains the full logs except the crash logs which are in separate logs. | ||||||||||||||||||||||||||
| Comment by Tomas Mozes [ 2016-09-21 ] | ||||||||||||||||||||||||||
|
This bug can be closed. This was caused by Xen PVH mode. After switching back to PV, the crashes are gone. Thanks for all the help. |