[MDEV-7617] [ERROR] mysqld got signal 11 ; InnoDB Created: 2015-02-21  Updated: 2015-04-01  Due: 2015-03-21  Resolved: 2015-04-01

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 5.5.41, 5.5.42
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Cristian Nicoara Assignee: Unassigned
Resolution: Incomplete Votes: 0
Labels: None

Attachments: Text File gdb_debug.txt    

 Description   

Hello,
I do not know if this is a bug or not, I am basically asking for your feedback on this behavior:

mysqld: 150219 12:42:01 [ERROR] mysqld got signal 11 ;
mysqld: This could be because you hit a bug. It is also possible that this binary
mysqld: or one of the libraries it was linked against is corrupt, improperly built,
mysqld: or misconfigured. This error can also be caused by malfunctioning hardware.
mysqld:
mysqld: To report this bug, see http://kb.askmonty.org/en/reporting-bugs
mysqld:
mysqld: We will try our best to scrape up some info that will hopefully help
mysqld: diagnose the problem, but since we have already crashed,
mysqld: something is definitely wrong and this may fail.
mysqld:
mysqld: Server version: 5.5.41-MariaDB-1~wheezy-log
mysqld: key_buffer_size=16777216
mysqld: read_buffer_size=131072
mysqld: max_used_connections=168
mysqld: max_threads=1002
mysqld: thread_count=157
mysqld: It is possible that mysqld could use up to
mysqld: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 2214597 K bytes of memory
mysqld: Hope that's ok; if not, decrease some variables in the equation.
mysqld:
mysqld: Thread pointer: 0x0x7f8d59817000
mysqld: Attempting backtrace. You can use the following information to find out
mysqld: where mysqld died. If you see no messages after this, something went
mysqld: terribly wrong...
mysqld: stack_bottom = 0x7f8d516fae50 thread_stack 0x40000
kernel: [772494.249397] mysqld[29838]: segfault at 7f8f39024c28 ip 00007f8f3651e2c1 sp 00007f8d516f7fb0 error 7 in libc-2.13.so[7f8f364a8000+182000]
mysqld_safe: Number of processes running now: 0

We use InnoDB.
This happened a few times so far . What we also have:
– tokuDB
– audit plugin , was not active when happened
– master slave configuration , master was affected
– we had some synchronization problem between master and slave
Do you have any similar occurrences?
Thank you



 Comments   
Comment by Elena Stepanova [ 2015-02-21 ]

Hi,

I do not know if this is a bug or not, I am basically asking for your feedback on this behavior:

Do you have any similar occurrences?

It's most certainly a bug unless you kill your server with SIGSEGV manually, which is obviously not the case.
Unfortunately, the produced error log is most generic, any crash contains it, so there is no way to say whether we know about similar ones or not. The important part is missing (not because you didn't copy it, but because it failed to be written).
Could you please check whether the coredump was produced and if it was, whether it's still available? If it is, please run gdb --batch --eval-command="thread apply all bt" <path to your mysqld> <path to your core>, and store the coredump in some safe place.
Do you happen to have general log enabled?

we had some synchronization problem between master and slave

What kind of a problem? And which one crashed, master or slave?

Comment by Cristian Nicoara [ 2015-02-21 ]

Hello,

there is no core dump available , will try to enable it.
The synchronization problem was due to configuration error: we had 2 slaves with the same server_id . But was solved and the segmentation fault hit again.
I also enabled general logging .

Also I must say that we are using TokuDB and the audit plugin.

Looking in the mysql-slow.log (the only available mysql log) I found a given query that was present on each segmentation fault occurrence exactly before the restart. Running the query manually did not reproduced the segfault; maybe running it from the application will trigger the behavior (but this only Monday) ; also this might be a coincidence ...

Comment by Cristian Nicoara [ 2015-02-22 ]

Hello,

the query was not a coincidence because I was able to reproduce the issue but only when triggered from interface... investigation continues.

Comment by Cristian Nicoara [ 2015-02-22 ]

Since this is coming from user interface should it be related to the java connector ? We are using java7 and mysql-connector-java-5.1.21.jar .

Comment by Cristian Nicoara [ 2015-02-22 ]

I was able to build mysql with debug option enabled and I got a stacktrace:

stack_bottom = 0x7fb003583e68 thread_stack 0x40000
mysys/stacktrace.c:246(my_print_stacktrace)[0xcfdc1a]
sql/signal_handler.cc:155(handle_fatal_signal)[0x7de851]
/lib/x86_64-linux-gnu/libpthread.so.0(+0xf0a0)[0x7fb1d2a190a0]
sql/opt_subselect.cc:3290(fix_semijoin_strategies_for_picked_join_order(JOIN*))[0x76e7c1]
sql/sql_select.cc:7730(get_best_combination(JOIN*))[0x66fd6c]
sql/sql_select.cc:3832(make_join_statistics)[0x666c8d]
sql/sql_select.cc:1229(JOIN::optimize())[0x65daba]
sql/sql_union.cc:611(st_select_lex_unit::optimize())[0x6dd77b]
sql/sql_derived.cc:780(mysql_derived_optimize(THD*, LEX*, TABLE_LIST*))[0x608476]
sql/sql_derived.cc:192(mysql_handle_single_derived(LEX*, TABLE_LIST*, unsigned int))[0x607461]
sql/table.cc:6626(TABLE_LIST::handle_derived(LEX*, unsigned int))[0x6fcf01]
sql/sql_lex.cc:3554(st_select_lex::handle_derived(LEX*, unsigned int))[0x623262]
sql/table.cc:6624(TABLE_LIST::handle_derived(LEX*, unsigned int))[0x6fcec4]
sql/sql_lex.cc:3554(st_select_lex::handle_derived(LEX*, unsigned int))[0x623262]
sql/sql_select.cc:991(JOIN::optimize())[0x65cd8b]
sql/sql_select.cc:3080(mysql_select(THD*, Item**, TABLE_LIST, unsigned int, List<Item>&, Item*, unsigned int, st_order*, st_order*, Item*, st_order*, unsigned long long, select_result*, st_select_lex_unit*, st_select_lex*))[0x664323]
sql/sql_update.cc:1438(mysql_multi_update(THD*, TABLE_LIST*, List<Item>, List<Item>, Item*, unsigned long long, enum_duplicates, bool, st_select_lex_unit*, st_select_lex*, multi_update**))[0x6e2a72]
sql/sql_parse.cc:2915(mysql_execute_command(THD*))[0x62eb48]
sql/sql_parse.cc:5909(mysql_parse(THD*, char*, unsigned int, Parser_state*))[0x636b16]
sql/sql_parse.cc:1081(dispatch_command(enum_server_command, THD*, char*, unsigned int))[0x62a759]
sql/sql_parse.cc:793(do_command(THD*))[0x6298e5]
sql/sql_connect.cc:1266(do_handle_one_connection(THD*))[0x72bf52]
sql/sql_connect.cc:1182(handle_one_connection)[0x72ba11]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x6b50)[0x7fb1d2a10b50]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fb1d197870d]

Comment by Cristian Nicoara [ 2015-02-22 ]

gdb output

Comment by Cristian Nicoara [ 2015-02-22 ]

Hello again ,
>>> Could you please check whether the coredump was produced and if it was, whether it's still available? If it is, please run gdb --batch --eval-command="thread apply all bt" <path to your mysqld> <path to your core>, and store the coredump in some safe place.
>>>

the output of gdb is in previous comment

Comment by Cristian Nicoara [ 2015-02-24 ]

To rule out library mismatch , I installed mariaDB 5.5.42 on a newly debian wheezy : I got the seg fault error.

Comment by Dan Stanculescu [ 2015-02-24 ]

This was reproduced also on a new debian 7 installation, single mariadb instance (no master-slave), without the tokudb and without the audit plugin in place.
Only InnoDB tables involved.
Also an OPTIMIZE TABLE was run on all tables to make sure that all tables are OK.
Same behavior: [ERROR] mysqld got signal 11 ;

Comment by Cristian Nicoara [ 2015-02-24 ]

We tested against mariadb 10.0.16 and we get the same segmentation fault

Comment by Elena Stepanova [ 2015-02-24 ]

Hi,
Since you have already found the query which causes the problem, and are able to reproduce it, could you provide your test script and the data so that we could reproduce it locally?
You can upload it to ftp.askmonty.org/private, this way only MariaDB developers will have access to it.

Comment by Cristian Nicoara [ 2015-02-25 ]

Hello,

I found that the BUG was introduced starting with version 5.5.38 .

Comment by Elena Stepanova [ 2015-02-25 ]

Great, but what about the test case? Could you please provide it?

Comment by Elena Stepanova [ 2015-04-01 ]

It's possible that the reason is the same as in MDEV-7613. Unfortunately, without further information we can't know for sure. Closing for now as 'Incomplete', please comment to re-open if you have additional info.

Generated at Thu Feb 08 07:20:58 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.