[MDEV-10062] Random crash about once a week (mysqld got signal 11) Created: 2016-05-13  Updated: 2019-03-14  Resolved: 2019-03-14

Status: Closed
Project: MariaDB Server
Component/s: Server
Affects Version/s: 10.1.12, 10.1.13, 10.1.16, 10.1.18
Fix Version/s: 10.1.22

Type: Bug Priority: Major
Reporter: Joachim Wickman Assignee: Unassigned
Resolution: Fixed Votes: 0
Labels: None
Environment:

Motherboard: Intel server board S2600GZ
CPU: 2x Quad-Core Intel Xeon E5-2620 2.00 GHz
Memory: 56GB RAM DDR3-1600 ECC
OS: Ubuntu Trusty 14.04.4 LTS 64 bit
apt-source: deb [arch=amd64,i386] http://mirror.netinch.com/pub/mariadb/repo/10.1/ubuntu trusty main


Attachments: Text File DB10-Crash-2016-09-01.txt     Text File DB10-Crash-2016-09-29.txt     Text File DB4-Crash-2016-08-05.txt     Text File DB4-Crash-2016-10-27.txt     Text File DB4-Crash-2016-11-24-2.txt     Text File MDEV-10062-Backtrace.txt     File mariadb.cnf     File my.cnf     File mysqld_safe_syslog.cnf     Text File query-2016-10-27.log     File tokudb.cnf     File trust.cnf    
Issue Links:
Relates
relates to MDEV-10624 Crash on CREATE VIEW (with invalid de... Closed

 Description   

Occasionally a couple of times a month the server part restarts with a crash of "mysqld got signal 11;".

All our databases uses InnoDB tables. Also functions, triggers, views and events are used.

I've only kept two backtraces so not much to go on but in both of them it seems to access method mysqld_show_create_get_fields.

We have seen similar issues on development server when we use EXPLAIN query syntax, but only happens randomly there also.

Here is the last backtrace:

May 13 08:34:24 db-10 mysqld: 160513  8:34:24 [ERROR] mysqld got signal 11 ;
May 13 08:34:24 db-10 mysqld: This could be because you hit a bug. It is also possible that this binary
May 13 08:34:24 db-10 mysqld: or one of the libraries it was linked against is corrupt, improperly built,
May 13 08:34:24 db-10 mysqld: or misconfigured. This error can also be caused by malfunctioning hardware.
May 13 08:34:24 db-10 mysqld:
May 13 08:34:24 db-10 mysqld: To report this bug, see http://kb.askmonty.org/en/reporting-bugs
May 13 08:34:24 db-10 mysqld:
May 13 08:34:24 db-10 mysqld: We will try our best to scrape up some info that will hopefully help
May 13 08:34:24 db-10 mysqld: diagnose the problem, but since we have already crashed,
May 13 08:34:24 db-10 mysqld: something is definitely wrong and this may fail.
May 13 08:34:24 db-10 mysqld:
May 13 08:34:24 db-10 mysqld: Server version: 10.1.12-MariaDB-1~trusty
May 13 08:34:24 db-10 mysqld: key_buffer_size=134217728
May 13 08:34:24 db-10 mysqld: read_buffer_size=2097152
May 13 08:34:24 db-10 mysqld: max_used_connections=47
May 13 08:34:24 db-10 mysqld: max_threads=102
May 13 08:34:24 db-10 mysqld: thread_count=34
May 13 08:34:24 db-10 mysqld: It is possible that mysqld could use up to
May 13 08:34:24 db-10 mysqld: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 759828 K  bytes of memory
May 13 08:34:24 db-10 mysqld: Hope that's ok; if not, decrease some variables in the equation.
May 13 08:34:24 db-10 mysqld:
May 13 08:34:24 db-10 mysqld: Thread pointer: 0x0x7f3e9d631008
May 13 08:34:24 db-10 mysqld: Attempting backtrace. You can use the following information to find out
May 13 08:34:24 db-10 mysqld: where mysqld died. If you see no messages after this, something went
May 13 08:34:24 db-10 mysqld: terribly wrong...
May 13 08:34:24 db-10 mysqld: stack_bottom = 0x7f4ac029a1f0 thread_stack 0x48400
May 13 08:34:24 db-10 mysqld: /usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x7f4ac417405e]
May 13 08:34:24 db-10 mysqld: /usr/sbin/mysqld(handle_fatal_signal+0x38d)[0x7f4ac3ca11bd]
May 13 08:34:24 db-10 mysqld: /lib/x86_64-linux-gnu/libpthread.so.0(+0x10340)[0x7f4ac21f8340]
May 13 08:34:24 db-10 mysqld: /usr/sbin/mysqld(_Z19find_field_in_tableP3THDP5TABLEPKcjbPj+0x91)[0x7f4ac3ae09e1]
May 13 08:34:24 db-10 mysqld: /usr/sbin/mysqld(_Z23find_field_in_table_refP3THDP10TABLE_LISTPKcjS4_S4_S4_PP4ItembbPjbPS2_+0x878)[0x7f4ac3ae1468]
May 13 08:34:24 db-10 mysqld: /usr/sbin/mysqld(_Z20find_field_in_tablesP3THDP10Item_identP10TABLE_LISTS4_PP4Item27find_item_error_report_typebb+0x2d8)[0x7f4ac3ae19a8]
May 13 08:34:24 db-10 mysqld: /usr/sbin/mysqld(_ZN10Item_field10fix_fieldsEP3THDPP4Item+0x332)[0x7f4ac3cc0642]
May 13 08:34:24 db-10 mysqld: /usr/sbin/mysqld(_ZN9Item_func10fix_fieldsEP3THDPP4Item+0x19f)[0x7f4ac3cfb68f]
May 13 08:34:24 db-10 mysqld: /usr/sbin/mysqld(_ZN9Item_func10fix_fieldsEP3THDPP4Item+0x19f)[0x7f4ac3cfb68f]
May 13 08:34:24 db-10 mysqld: /usr/sbin/mysqld(_Z12setup_fieldsP3THDPP4ItemR4ListIS1_E17enum_mark_columnsPS5_b+0x17c)[0x7f4ac3ae24fc]
May 13 08:34:24 db-10 mysqld: /usr/sbin/mysqld(_ZN4JOIN7prepareEPPP4ItemP10TABLE_LISTjS1_jP8st_orderbS7_S1_S7_P13st_select_lexP18st_select_lex_unit+0x312)[0x7f4ac3b6b8b2]
May 13 08:34:24 db-10 mysqld: /usr/sbin/mysqld(_ZN30subselect_single_select_engine7prepareEv+0x6d9)[0x7f4ac3d314c9]
May 13 08:34:24 db-10 mysqld: /usr/sbin/mysqld(_ZN14Item_subselect10fix_fieldsEP3THDPP4Item+0xcd)[0x7f4ac3d35b6d]
May 13 08:34:24 db-10 mysqld: /usr/sbin/mysqld(_ZN9Item_func10fix_fieldsEP3THDPP4Item+0x19f)[0x7f4ac3cfb68f]
May 13 08:34:24 db-10 mysqld: message repeated 2 times: [ /usr/sbin/mysqld(_ZN9Item_func10fix_fieldsEP3THDPP4Item+0x19f)[0x7f4ac3cfb68f]]
May 13 08:34:24 db-10 mysqld: /usr/sbin/mysqld(_ZN9Item_cond10fix_fieldsEP3THDPP4Item+0xb9)[0x7f4ac3cd1259]
May 13 08:34:24 db-10 mysqld: /usr/sbin/mysqld(_Z11setup_condsP3THDP10TABLE_LISTR4ListIS1_EPP4Item+0x18b)[0x7f4ac3ae3d4b]
May 13 08:34:24 db-10 mysqld: /usr/sbin/mysqld(_ZN4JOIN7prepareEPPP4ItemP10TABLE_LISTjS1_jP8st_orderbS7_S1_S7_P13st_select_lexP18st_select_lex_unit+0x3c4)[0x7f4ac3b6b964]
May 13 08:34:24 db-10 mysqld: /usr/sbin/mysqld(_ZN18st_select_lex_unit7prepareEP3THDP13select_resultm+0x968)[0x7f4ac3bb8788]
May 13 08:34:24 db-10 mysqld: /usr/sbin/mysqld(_Z21mysql_derived_prepareP3THDP3LEXP10TABLE_LIST+0x1ff)[0x7f4ac3afeccf]
May 13 08:34:24 db-10 mysqld: /usr/sbin/mysqld(_Z20mysql_handle_derivedP3LEXj+0x226)[0x7f4ac3aff7f6]
May 13 08:34:24 db-10 mysqld: /usr/sbin/mysqld(_Z29mysqld_show_create_get_fieldsP3THDP10TABLE_LISTP4ListI4ItemEP6String+0x856)[0x7f4ac3b81ff6]
May 13 08:34:24 db-10 mysqld: /usr/sbin/mysqld(_Z18mysqld_show_createP3THDP10TABLE_LIST+0xc0)[0x7f4ac3b824d0]
May 13 08:34:24 db-10 mysqld: /usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x3015)[0x7f4ac3b22685]
May 13 08:34:24 db-10 mysqld: /usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_state+0x26d)[0x7f4ac3b28fed]
May 13 08:34:24 db-10 mysqld: /usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0x2460)[0x7f4ac3b2c330]
May 13 08:34:24 db-10 mysqld: /usr/sbin/mysqld(_Z10do_commandP3THD+0x169)[0x7f4ac3b2cae9]
May 13 08:34:24 db-10 mysqld: /usr/sbin/mysqld(_Z24do_handle_one_connectionP3THD+0x18a)[0x7f4ac3bf00fa]
May 13 08:34:24 db-10 mysqld: /usr/sbin/mysqld(handle_one_connection+0x40)[0x7f4ac3bf02d0]
May 13 08:34:24 db-10 mysqld: /lib/x86_64-linux-gnu/libpthread.so.0(+0x8182)[0x7f4ac21f0182]
May 13 08:34:24 db-10 mysqld: /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f4ac191347d]
May 13 08:34:24 db-10 mysqld:
May 13 08:34:24 db-10 mysqld: Trying to get some variables.
May 13 08:34:24 db-10 mysqld: Some pointers may be invalid and cause the dump to abort.
May 13 08:34:24 db-10 mysqld: Query (0x7f3fbbc20020): is an invalid pointer
May 13 08:34:24 db-10 mysqld: Connection ID (thread ID): 62004
May 13 08:34:24 db-10 mysqld: Status: NOT_KILLED
May 13 08:34:24 db-10 mysqld:
May 13 08:34:24 db-10 mysqld: Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on
May 13 08:34:24 db-10 mysqld:
May 13 08:34:24 db-10 mysqld: The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
May 13 08:34:24 db-10 mysqld: information that should help you find out what is causing the crash.
May 13 08:34:24 db-10 mysqld_safe: Number of processes running now: 0
May 13 08:34:24 db-10 mysqld_safe: mysqld restarted



 Comments   
Comment by Elena Stepanova [ 2016-05-31 ]

joachim,
Please attach your cnf file(s).

Would you be able to provide a database dump, maybe from the development server where the problem also occurs? If you can, please upload it to ftp.askmonty.org/private.

Also, can you maybe try running a debug version on the development server for a while?

Comment by Elena Stepanova [ 2016-06-29 ]

Hi joachim,
any news on this – are you still experiencing the problem? Would you be able to provide information as requested above?

Comment by Joachim Wickman [ 2016-07-11 ]

Hello,

Attached my.cnf and all files included from my.cnf.

I'm using packages from your repository. Do they come with debug versions and if so, any instructions on how to use them?

Unfortunately I'm not allowed to share the database definitions.

Comment by Elena Stepanova [ 2016-07-16 ]

Packages come without a debug version; but if you are willing to try it, I can build a debug binary for you, just let me know which server version you are currently on. Using the debug binary should be as simple as placing it instead your mysqld and restarting the server, but of course it shouldn't be done on a production server.

Comment by Joachim Wickman [ 2016-07-19 ]

It's semi production, ie used by developers for modelling and testing. So I believe it should be fine by testing debug version.

Currently 10.1.13 is used.

$ sudo apt-cache policy mariadb-server
mariadb-server:
  Installed: 10.1.13+maria-1~trusty
  Candidate: 10.1.16+maria-1~trusty
  Version table:
     10.1.16+maria-1~trusty 0
        500 http://mirror.jmu.edu/pub/mariadb/repo/10.1/ubuntu/ trusty/main amd64 Packages
 *** 10.1.13+maria-1~trusty 0
        100 /var/lib/dpkg/status

Comment by Elena Stepanova [ 2016-07-19 ]

joachim,

I've built mysqld binary from 10.1.13 sources for trusty x864_64: ftp://ftp.askmonty.org/public/mdev10062/mysqld . Please try just to replace your mysqld binary with it (store the original one of course), and change permissions accordingly.
Before you restart the server with it, please also enable the core dump as described here:
https://mariadb.com/kb/en/mariadb/how-to-produce-a-full-stack-trace-for-mysqld/#updating-your-mycnf

This is in case the server does not print the full stack trace in the error log, it happens. After it crashes, please upload the resulting coredump to ftp.askmonty.org/private.

If the binary does not work for you, please let us know.

Thanks,
Elena.

Comment by Joachim Wickman [ 2016-07-21 ]

Thanks.

The binary did work but sadly it became too slow so it was not usable. Getting data from a view in original binary took around 2-3 seconds but with the debug version it timed out after 1 minute and 30 seconds. So I had switch back.

First time I started it then there were alot of
Warning: xxxxx bytes lost at 0x7f6157421370, allocated by T@0 ....
And then it halted itself. Afterwards it has worked but too slow to use.

Br,
Joachim

Comment by Joachim Wickman [ 2016-08-05 ]

Added crash log. DB4 is on a virtualized environment but the XEN hypervisor has the exact same hardware specs as DB10. DB4 guest has 4GB memory and 2 CPU cores attached. Same OS and almost same config files on both.

Entity Developer, update database from model was used when it crashed.

Br,
Joachim

Comment by Joachim Wickman [ 2016-09-01 ]

Seems like another scenario but adding to same ticket.

Comment by Elena Stepanova [ 2016-09-01 ]

Yes, the last one looks indeed different. We have a couple of similar ones, MDEV-9682 was fixed in 10.1.13, and MDEV-10053 is still open.

Comment by Joachim Wickman [ 2016-09-29 ]

Still on 10.1.12 and this looks similar.

Comment by Elena Stepanova [ 2016-10-27 ]

The stack trace from DB4 that you attached earlier is from 10.1.16. If you keep upgrading that instance, you should be on 10.1.17 or .18 by now. The version 10.1.17 came with a change where upon a crash the server attempts to fetch the guilty query even if the pointer appears to be invalid. The query should be printed at the very end of the crash report, after the optimizer switch. If it happens again on 10.1.17 or higher, could you please attach the error log including this additional record?
Thanks.

Comment by Joachim Wickman [ 2016-10-28 ]

Looked at the latest crashes with 10.1.17 and they also includes same query pointer.

We think the query pointer is invalid, but we will try to print it anyway.
Query: SELECT /*!40001 SQL_NO_CACHE */ * FROM `COLUMNS`

And they seems to happen the same time we do backups. 22:05 we have a cron job dumping databases and the crashes has happened 22:05:01. I had query logging enabled so I've cleaned it up and included last minutes. Mirth Connected feeds very large XML data into the tables, could be 1000 or 10000 of bytes.

Backup command mysqldump --opt --events --routines --skip-lock-tables [database name]

Sessions:
288243 is from mysqldump
288244 & 288245 is from Zabbix which monitors MySQL data
All others are from Mirth Connect.

Comment by Elena Stepanova [ 2016-10-28 ]

joachim,

Thanks.
How many records are there (in other words, how many columns do you have in your instance)?
Have you ever seen any signs that any tables might be corrupted?

Do you know if running SELECT FROM `COLUMNS` causes a crash at other times, e.g. when it's run directly from the client?

Comment by Joachim Wickman [ 2016-10-31 ]

Hi,

Right now DB4 has 27 878 records and production server has 50 482.

No signs of corruption. And running that query several times in the client works without problems.

Comment by Joachim Wickman [ 2016-11-24 ]

Hi,

Since 8th November I've been running 10.1.18 with debug symbols compiled in by myself and today the first crash happened so it's been running long with no crashes. Seems I had forgotten to add in config some settings so I did. And a couple of hours later the second crash happened, although it took minutes to restart the service so I guess it did struggle to create the core file but the file was only 0 bytes.

{{[mysqld]
stack-trace
core-file
disable-gdb

[mysqld_safe]
--core-file-size=unlimited}}

{{# egrep "Units|core" /proc/$(pidof mysqld)/limits
Limit Soft Limit Hard Limit Units
Max core file size unlimited unlimited bytes}}

{{# df -h /var/lib/mysql
Filesystem Size Used Avail Use% Mounted on
/dev/xvda1 40G 20G 18G 52% /}}

Any ideas why? I was hoping to the get a core dump and a better stack trace for you guys.

Comment by Joachim Wickman [ 2016-12-07 ]

Hi!

I've uploaded to FTP server a full backtrace from a crash last night.

Hopefully it helps. Core file is 2.6GB so did not upload that one.

// Joachim

Comment by Elena Stepanova [ 2016-12-09 ]

Thank you. Did you upload it to FTP because you want it to remain private, or may I attach it to this issue (in which case it will be publicly available)?

Comment by Joachim Wickman [ 2016-12-09 ]

For private, I have redacted database names from the attached file.

Comment by Joachim Wickman [ 2017-04-13 ]

FYI, I've upgraded both production and development servers to MariaDB 10.1.22.

Development has been running for 17 days now, and production for 2 days and still no issues. So hopefully this issue has been fixed without knowing about it.

I'll report back again in a couple of weeks.

Comment by Daniel Black [ 2017-04-13 ]

thanks

Comment by Joachim Wickman [ 2017-05-18 ]

Positive news. System has been running MariaDB 10.1.22 for over 37 days without crashes. So I guess this issue has been fixed.

Thank you!

Comment by Joachim Wickman [ 2019-03-14 ]

Status update, now the MariaDB uptime is 636 days on that server with 10.1.22. Guess it's OK to close this issue.

Comment by Elena Stepanova [ 2019-03-14 ]

Closing based on the comment above. The Fix version is empirical, we don't know which particular commit fixed the problem.

Generated at Thu Feb 08 07:39:23 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.