[MDEV-10857] mysqld got signal 11 (MariaDB 10.1.17-MariaDB-1~jessie + Galera 25.3.17r3619) Created: 2016-09-21 Updated: 2019-05-21 Resolved: 2019-05-21 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Galera, Platform Debian, Replication, Storage Engine - InnoDB |
| Affects Version/s: | 10.1.17 |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Major |
| Reporter: | Jory | Assignee: | Jan Lindström (Inactive) |
| Resolution: | Cannot Reproduce | Votes: | 1 |
| Labels: | galera, innodb, replication | ||
| Environment: |
Linux Debian 8 Jessie The servers (VPS) are each equipped with: 4 cores (CPU average 11%) |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Description |
|
The last week we have had several crashes within our 3 node MariaDB Galera cluster. The first crashed where (10.0) before upgrading to the last (10.1). These crashes have happened on each of the server now yet we can't figure out what is causing them. Each crash shows in the log and prevents the node from joining the cluster until manually reset using "service mysql stop" & "service mysql start". After this the node joins the cluster, syncs and is back in business. Please let me know if any information is needed. As this issue shows in our production environment this issue needs to be resolved, at least to get us back our night sleeps |
| Comments |
| Comment by Elena Stepanova [ 2016-09-21 ] |
|
Looks similar to |
| Comment by Jory [ 2016-09-22 ] |
|
For debugging purposes we turned on the global log on the 3 nodes. One of the last queries run (note not the actual last, there is some activity after) is a GET_LOCK, SELECT, UPDATE.... not never the expected RELEASE_LOCK. I don't wanna point anyone in a wrong direction but read this on your site: Unsupported explicit locking include LOCK TABLES, FLUSH TABLES {explicit table list}WITH READ LOCK, (GET_LOCK(), RELEASE_LOCK(),…). Using transactions properly should be able to overcome these limitations. Global locking operators like FLUSH TABLES WITH READ LOCK are supported. |
| Comment by Jory [ 2016-09-23 ] |
|
We just removed the GET_LOCK and RELEASE_LOCK and it still crashed... so it's not that... just thought I let you guys know. |
| Comment by Jory [ 2016-09-25 ] |
|
Did a clean install of all 3 MariaDB nodes. Used a 4th node, took one down at a time, reinstalled Debian, MariaDB, synced (all went perfect with that!!!). This was friday evening, sunday morning the first node crash occurred and a little hour later the second node crashed. Clean install did not solve the issue for us... Just thought we would let you guys know. |
| Comment by Jory [ 2016-09-29 ] |
|
Today we had 2 crashes already on 2 different nodes. Both of the nodes crashed the exact moment the log's where rotated. For now disabled log rotate for mysql to see if this is the reason the crash occurs. This issue is getting very annoying as we can't seem to be running for 24 hours without a crash on at least one node. Is there anything we can try or do to fix this?? Any help is welcome as we simply don't know anymore. added the rotate file for mysql rotate.rtf |
| Comment by Nirbhay Choubey (Inactive) [ 2016-09-29 ] |
|
de Kort Was the stacktrace for the crash during log rotation same as that reported in attached mysql_error.log? |
| Comment by Jory [ 2016-09-29 ] |
|
I have uploaded the last crash log now. |
| Comment by Nirbhay Choubey (Inactive) [ 2016-09-29 ] |
|
The last crash is related to |
| Comment by Jory [ 2016-09-29 ] |
|
More than willing to do that. Will this produce more info on the crash in our error log on the next crash? If so I will post the information here as soon as one goes again. |
| Comment by Nirbhay Choubey (Inactive) [ 2016-09-29 ] |
Yes. In 10.1.17, I pushed a patch to produce some additional information around
Sure. That will be helpful. Thanks! |
| Comment by Jory [ 2016-09-29 ] |
|
Enabled! Now we just have to wait.... |
| Comment by Jory [ 2016-09-30 ] |
|
That didn't take long. Added the log with the debug info. Above this point there is only lines with WSREP: cleanup transaction for LOCAL_STATE SELECT ..... |
| Comment by Nirbhay Choubey (Inactive) [ 2016-09-30 ] |
|
de Kort Thank you. Could you share the full debug error log instead? I need to look at more instances of "Adding/Removing xid_list_entry". |
| Comment by Jory [ 2016-09-30 ] |
|
Hi @nirbhay_c, The log has a lot of user specific information in it that I can't just post here for the world to read. I hope you understand. I have made a log file with the specific info you ask for above. If not enough is there an other way to get you to file or to have debug not show every query done? |
| Comment by Nirbhay Choubey (Inactive) [ 2016-09-30 ] |
|
xid_error.log is what I wanted for now. |
| Comment by Jory [ 2016-10-10 ] |
|
Is there anything we can do or try? This is starting to become unworkable and annoying to say the least. We never had to worry about the database servers before and now it's a daily task starting mariadb to keep our production site running. Any version that is known without this issue we can downgrade to? Any insight as to what you know about the issue? Is there being worked on? Is it being resolved? or are you still searching? |
| Comment by Nirbhay Choubey (Inactive) [ 2016-10-10 ] |
|
de Kort I am sorry to hear that. You could disable binary logging to avoid this issue. Regarding the last log attached, I need to know if there was any server restart, SST or IST around "2016-09-30 11:41:12 140289565125376 [Note] WSREP: TC_LOG_BINLOG::mark_xid_done(): Removing xid_list_entry for mariadb-bin.001139 (7)"? Will it be possible for you to share the full debug error log to MariaDB ftp site? You can find the instructions here : https://mariadb.com/kb/en/meta/ftp/ |
| Comment by Jory [ 2016-10-11 ] |
|
I have uploaded the log file. I did take out most of the one liners with just a query SELECT, UPDATE, INSERT etc. |
| Comment by Jory [ 2016-10-25 ] |
|
After the 10.1.18 update crashes occurred even more often. Up to 7 times in 24 hours, making it unworkable so we decided to turn of the bin logs for now. Since we turned this of the servers have all been up for 8 days without any issue. This how ever is a temp solution in our eyes. We hope the issue gets solved so bin logging can be enabled again. |
| Comment by Jan Lindström (Inactive) [ 2019-05-21 ] |
|
Can you please try with more recent version if problem is still reproducible provide some instructions how to repeat. |