[MDEV-10062] Random crash about once a week (mysqld got signal 11) Created: 2016-05-13 Updated: 2019-03-14 Resolved: 2019-03-14 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Server |
| Affects Version/s: | 10.1.12, 10.1.13, 10.1.16, 10.1.18 |
| Fix Version/s: | 10.1.22 |
| Type: | Bug | Priority: | Major |
| Reporter: | Joachim Wickman | Assignee: | Unassigned |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Motherboard: Intel server board S2600GZ |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Description |
|
Occasionally a couple of times a month the server part restarts with a crash of "mysqld got signal 11;". All our databases uses InnoDB tables. Also functions, triggers, views and events are used. I've only kept two backtraces so not much to go on but in both of them it seems to access method mysqld_show_create_get_fields. We have seen similar issues on development server when we use EXPLAIN query syntax, but only happens randomly there also. Here is the last backtrace:
|
| Comments |
| Comment by Elena Stepanova [ 2016-05-31 ] | |||||||||
|
joachim, Would you be able to provide a database dump, maybe from the development server where the problem also occurs? If you can, please upload it to ftp.askmonty.org/private. Also, can you maybe try running a debug version on the development server for a while? | |||||||||
| Comment by Elena Stepanova [ 2016-06-29 ] | |||||||||
|
Hi joachim, | |||||||||
| Comment by Joachim Wickman [ 2016-07-11 ] | |||||||||
|
Hello, Attached my.cnf and all files included from my.cnf. I'm using packages from your repository. Do they come with debug versions and if so, any instructions on how to use them? Unfortunately I'm not allowed to share the database definitions. | |||||||||
| Comment by Elena Stepanova [ 2016-07-16 ] | |||||||||
|
Packages come without a debug version; but if you are willing to try it, I can build a debug binary for you, just let me know which server version you are currently on. Using the debug binary should be as simple as placing it instead your mysqld and restarting the server, but of course it shouldn't be done on a production server. | |||||||||
| Comment by Joachim Wickman [ 2016-07-19 ] | |||||||||
|
It's semi production, ie used by developers for modelling and testing. So I believe it should be fine by testing debug version. Currently 10.1.13 is used.
| |||||||||
| Comment by Elena Stepanova [ 2016-07-19 ] | |||||||||
|
I've built mysqld binary from 10.1.13 sources for trusty x864_64: ftp://ftp.askmonty.org/public/mdev10062/mysqld . Please try just to replace your mysqld binary with it (store the original one of course), and change permissions accordingly. This is in case the server does not print the full stack trace in the error log, it happens. After it crashes, please upload the resulting coredump to ftp.askmonty.org/private. If the binary does not work for you, please let us know. Thanks, | |||||||||
| Comment by Joachim Wickman [ 2016-07-21 ] | |||||||||
|
Thanks. The binary did work but sadly it became too slow so it was not usable. Getting data from a view in original binary took around 2-3 seconds but with the debug version it timed out after 1 minute and 30 seconds. So I had switch back. First time I started it then there were alot of Br, | |||||||||
| Comment by Joachim Wickman [ 2016-08-05 ] | |||||||||
|
Added crash log. DB4 is on a virtualized environment but the XEN hypervisor has the exact same hardware specs as DB10. DB4 guest has 4GB memory and 2 CPU cores attached. Same OS and almost same config files on both. Entity Developer, update database from model was used when it crashed. Br, | |||||||||
| Comment by Joachim Wickman [ 2016-09-01 ] | |||||||||
|
Seems like another scenario but adding to same ticket. | |||||||||
| Comment by Elena Stepanova [ 2016-09-01 ] | |||||||||
|
Yes, the last one looks indeed different. We have a couple of similar ones, | |||||||||
| Comment by Joachim Wickman [ 2016-09-29 ] | |||||||||
|
Still on 10.1.12 and this looks similar. | |||||||||
| Comment by Elena Stepanova [ 2016-10-27 ] | |||||||||
|
The stack trace from DB4 that you attached earlier is from 10.1.16. If you keep upgrading that instance, you should be on 10.1.17 or .18 by now. The version 10.1.17 came with a change where upon a crash the server attempts to fetch the guilty query even if the pointer appears to be invalid. The query should be printed at the very end of the crash report, after the optimizer switch. If it happens again on 10.1.17 or higher, could you please attach the error log including this additional record? | |||||||||
| Comment by Joachim Wickman [ 2016-10-28 ] | |||||||||
|
Looked at the latest crashes with 10.1.17 and they also includes same query pointer.
And they seems to happen the same time we do backups. 22:05 we have a cron job dumping databases and the crashes has happened 22:05:01. I had query logging enabled so I've cleaned it up and included last minutes. Mirth Connected feeds very large XML data into the tables, could be 1000 or 10000 of bytes. Backup command mysqldump --opt --events --routines --skip-lock-tables [database name] Sessions: | |||||||||
| Comment by Elena Stepanova [ 2016-10-28 ] | |||||||||
|
Thanks. Do you know if running SELECT FROM `COLUMNS` causes a crash at other times, e.g. when it's run directly from the client? | |||||||||
| Comment by Joachim Wickman [ 2016-10-31 ] | |||||||||
|
Hi, Right now DB4 has 27 878 records and production server has 50 482. No signs of corruption. And running that query several times in the client works without problems. | |||||||||
| Comment by Joachim Wickman [ 2016-11-24 ] | |||||||||
|
Hi, Since 8th November I've been running 10.1.18 with debug symbols compiled in by myself and today the first crash happened so it's been running long with no crashes. Seems I had forgotten to add in config some settings so I did. And a couple of hours later the second crash happened, although it took minutes to restart the service so I guess it did struggle to create the core file but the file was only 0 bytes. {{[mysqld] [mysqld_safe] {{# egrep "Units|core" /proc/$(pidof mysqld)/limits {{# df -h /var/lib/mysql Any ideas why? I was hoping to the get a core dump and a better stack trace for you guys. | |||||||||
| Comment by Joachim Wickman [ 2016-12-07 ] | |||||||||
|
Hi! I've uploaded to FTP server a full backtrace from a crash last night. Hopefully it helps. Core file is 2.6GB so did not upload that one. // Joachim | |||||||||
| Comment by Elena Stepanova [ 2016-12-09 ] | |||||||||
|
Thank you. Did you upload it to FTP because you want it to remain private, or may I attach it to this issue (in which case it will be publicly available)? | |||||||||
| Comment by Joachim Wickman [ 2016-12-09 ] | |||||||||
|
For private, I have redacted database names from the attached file. | |||||||||
| Comment by Joachim Wickman [ 2017-04-13 ] | |||||||||
|
FYI, I've upgraded both production and development servers to MariaDB 10.1.22. Development has been running for 17 days now, and production for 2 days and still no issues. So hopefully this issue has been fixed without knowing about it. I'll report back again in a couple of weeks. | |||||||||
| Comment by Daniel Black [ 2017-04-13 ] | |||||||||
|
thanks | |||||||||
| Comment by Joachim Wickman [ 2017-05-18 ] | |||||||||
|
Positive news. System has been running MariaDB 10.1.22 for over 37 days without crashes. So I guess this issue has been fixed. Thank you! | |||||||||
| Comment by Joachim Wickman [ 2019-03-14 ] | |||||||||
|
Status update, now the MariaDB uptime is 636 days on that server with 10.1.22. Guess it's OK to close this issue. | |||||||||
| Comment by Elena Stepanova [ 2019-03-14 ] | |||||||||
|
Closing based on the comment above. The Fix version is empirical, we don't know which particular commit fixed the problem. |