[MDEV-8370] mysqld got signal 11 with aggressive parallel replication Created: 2015-06-24  Updated: 2015-06-29  Resolved: 2015-06-29

Status: Closed
Project: MariaDB Server
Component/s: OTHER
Affects Version/s: 10.1.5
Fix Version/s: 10.1.6

Type: Bug Priority: Major
Reporter: Jean-François Gagné Assignee: Sergey Vojtovich
Resolution: Fixed Votes: 0
Labels: None
Environment:

$ uname -a
Linux hostname 2.6.32-431.29.2.el6.x86_64 #1 SMP Tue Sep 9 21:36:05 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

$ free -m
total used free shared buffers cached
Mem: 193658 131919 61739 0 26 124488
-/+ buffers/cache: 7404 186253
Swap: 1023 5 1018

$ cat /proc/cpuinfo | grep "model name" | uniq -c
24 model name : Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz


Attachments: PNG File e2_open_tables.png    
Issue Links:
Relates
relates to MDEV-8333 mysqld got signal 11 Closed
relates to MDEV-8339 Server crash during table cache eviction Closed

 Description   

Hi,

I have a "mysqld got signal 11" in the logs when running aggressive parallel replication with high number of threads. When running with single threaded replication, I have no problem: I ran many times without problems with 5, 10, 20, 40, 80 and 120 thread; but when I run with 320, 240 and 160 threads, sometimes it works, and sometimes mysqld is killed. Stack trace below.

Already reported similar problems in the following bugs for mysql_admin_table and execute_sqlcom_select:

but I think this might be a different issue as tis one is on mysql_insert.

Also, the number of Open Tables as shown by SHOW GLOBAL STATUS is much bigger with aggressive parallel replication than with single threaded replication (see image in attachment where the 1st run is with single threaded replication, and the 3 other runs are with aggressive parallel replication with number of threads of 120, 240 and 320 respectively).

2015-06-21 23:46:45 140496568968960 [ERROR] Error reading packet from server: Lost connection to MySQL server during query (server_errno=2013)
2015-06-21 23:46:45 140496568968960 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'binlog.000010' at position 929208383; GTID position '0-187225014-740885929', GTID event skip 1
2015-06-21 23:52:21 140496568968960 [ERROR] Error reading packet from server: Lost connection to MySQL server during query (server_errno=2013)
2015-06-21 23:52:21 140496568968960 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'binlog.000011' at position 3634315; GTID position '0-187225014-740980660', GTID event skip 205
150621 23:52:26 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see http://kb.askmonty.org/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
 
Server version: 10.1.5-MariaDB-log
key_buffer_size=268435456
read_buffer_size=4194304
max_used_connections=7
max_threads=3002
thread_count=2
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 15693505 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x0x7fa644812008
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7fd2b194ac80 thread_stack 0x48000
mysys/stacktrace.c:247(my_print_stacktrace)[0x7fd2eeda683b]
sql/signal_handler.cc:153(handle_fatal_signal)[0x7fd2ee9408a5]
/lib64/libpthread.so.0(+0xf710)[0x7fd2edf4d710]
sql/sql_plist.h:181(I_P_List_iterator)[0x7fd2ee8f2ab3]
mysys/lf_hash.c:125(lfind)[0x7fd2eedafee0]
sql/table_cache.cc:1084(tdc_iterate(THD*, char (*)(void*, void*), void*, bool))[0x7fd2ee8f4093]
sql/table_cache.cc:267(tc_add_table(THD*, TABLE*))[0x7fd2ee8f43b8]
sql/sql_base.cc:2572(open_table(THD*, TABLE_LIST*, Open_table_context*))[0x7fd2ee787e82]
sql/sql_base.cc:4016(open_and_process_table)[0x7fd2ee78bcc0]
sql/sql_base.cc:5184(open_and_lock_tables(THD*, DDL_options_st const&, TABLE_LIST*, bool, unsigned int, Prelocking_strategy*))[0x7fd2ee78bf84]
sql/sql_insert.cc:719(mysql_insert(THD*, TABLE_LIST*, List<Item>&, List<List<Item> >&, List<Item>&, List<Item>&, enum_duplicates, bool))[0x7fd2ee7b871b]
sql/sql_parse.cc:3866(mysql_execute_command(THD*))[0x7fd2ee7cd687]
sql/sql_parse.cc:7173(mysql_parse)[0x7fd2ee7d374d]
sql/log_event.cc:4274(Query_log_event::do_apply_event(rpl_group_info*, char const*, unsigned int))[0x7fd2eea0092c]
sql/log_event.h:1348(Log_event::apply_event(rpl_group_info*))[0x7fd2ee760711]
sql/slave.cc:3291(apply_event_and_update_pos(Log_event*, THD*, rpl_group_info*, rpl_parallel_thread*))[0x7fd2ee756b1b]
sql/rpl_parallel.cc:49(rpt_handle_event)[0x7fd2ee8db6e2]
perfschema/pfs.cc:1863(pfs_spawn_thread)[0x7fd2eec9435d]
/lib64/libpthread.so.0(+0x79d1)[0x7fd2edf459d1]
/lib64/libc.so.6(clone+0x6d)[0x7fd2ec42b86d]
 
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x7fa687a5580f): is an invalid pointer
Connection ID (thread ID): 3719
Status: NOT_KILLED
 
Optimizer switch: index_merge=off,index_merge_union=off,index_merge_sort_union=off,index_merge_intersection=off,index_merge_sort_intersection=off,engin
e_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=o
n,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_c
ache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,exte
nded_keys=on,exists_to_in=on
 
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
150621 23:52:36 mysqld_safe Number of processes running now: 0
150621 23:52:36 mysqld_safe mysqld restarted

JFG



 Comments   
Comment by Elena Stepanova [ 2015-06-24 ]

MDEV-8333 provides stack traces both for DELETE and SELECT, as far as I remember, it might fail on INSERT as well, depending on luck and build; so I don't think there is a difference here. I will assign it to svoj to confirm.

Comment by Jean-François Gagné [ 2015-06-24 ]

I was surprised by the Open Tables difference between a run without parallel replication and with different number of threads (maybe normal, just asking).

Note: all those tests start by restoring a backup and running the same binlogs, so exactly the same workload run in all tests, hence the difference in Open Tables puzzling me.

Comment by Sergey Vojtovich [ 2015-06-24 ]

I suspect high number of tables should be expected with parallel replication, same as with high number of connecting clients.

Comment by Elena Stepanova [ 2015-06-24 ]

Tests in MDEV-8333 / MDEV-8339 are essentially concurrent. I brought the number of threads down to 2 to make debugging simpler, but generally the more threads, the easier to reproduce. Parallel replication apparently introduces the same concurrent pattern, only on the slave.

Comment by Sergey Vojtovich [ 2015-06-29 ]

Fixed in scope of MDEV-8339.

Generated at Thu Feb 08 07:26:39 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.