[MDEV-8339] Server crash during table cache eviction Created: 2015-06-19  Updated: 2015-06-29  Resolved: 2015-06-29

Status: Closed
Project: MariaDB Server
Component/s: OTHER
Affects Version/s: 10.1.5
Fix Version/s: 10.1.6

Type: Bug Priority: Major
Reporter: Sergey Vojtovich Assignee: Sergey Vojtovich
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Relates
relates to MDEV-8333 mysqld got signal 11 Closed
relates to MDEV-8370 mysqld got signal 11 with aggressive ... Closed

 Description   
  • we need to run out of table_open_cache to enter this code branch
  • there were ~5 active + N stale connections, though low connection number should be fine
    if they can simulate out of table cache (e.g. one connection locks/unlocks table_open_cache
    table instances, while other connections accessing other tables)
  • number of tables is not known
  • table_definition_cache may (or may not) relate to this problem
  • there should be no DDL/admin statements involved (though they're not excluded)
  • there should be no difference which DML statements are involved, so plain SELECT-s should be fine

2015-06-17 14:01:32 140497706510080 [Note] WSREP: Stop replication
2015-06-17 14:01:32 140497706510080 [Note] Event Scheduler: Purging the
queue. 0 events
2015-06-17 14:01:32 140497706510080 [Note]
/usr/mariadb/mariadb/bin//mysqld: Shutdown complete
 
150617 14:01:32 mysqld_safe mysqld from pid file /home/mysql/pid/local.pid
ended
150617 14:02:38 mysqld_safe Starting mysqld daemon with databases from
/home/mysql/data/local/
2015-06-17 14:02:38 140260849645440 [Warning] 'THREAD_CONCURRENCY' is
deprecated and will be removed in a future release.
2015-06-17 14:02:38 140260849645440 [Note] /usr/mariadb/mariadb/bin//mysqld
(mysqld 10.1.5-MariaDB-log) starting as process 2321 ...
2015-06-17 14:02:38 140260849645440 [Note] Plugin 'InnoDB' is disabled.
2015-06-17 14:02:38 140260849645440 [Note] Plugin 'FEEDBACK' is disabled.
2015-06-17 14:02:38 140260849645440 [Note] Server socket created on IP:
'::'.
2015-06-17 14:02:38 140260849645440 [Note] Event Scheduler: Loaded 0 events
2015-06-17 14:02:38 140260849645440 [Note] WSREP: Read nil XID from storage
engines, skipping position init
2015-06-17 14:02:38 140260849645440 [Note] WSREP: wsrep_load(): loading
provider library 'none'
2015-06-17 14:02:38 140260849645440 [Note]
/usr/mariadb/mariadb/bin//mysqld: ready for connections.
Version: '10.1.5-MariaDB-log'  socket: '/tmp/mysql.BH.sock'  port: 3307
 MariaDB Server
150617 17:02:14 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see http://kb.askmonty.org/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
 
Server version: 10.1.5-MariaDB-log
key_buffer_size=629145600
read_buffer_size=4194304
max_used_connections=23
max_threads=102
thread_count=16
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads =
1452041 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x0x7f90d0353008
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7f9103539e80 thread_stack 0x48000
mysys/stacktrace.c:247(my_print_stacktrace)[0x7f9106c50b0b]
sql/signal_handler.cc:153(handle_fatal_signal)[0x7f91067eadb5]
/lib64/libpthread.so.0(+0x10d00)[0x7f910537dd00]
sql/sql_plist.h:181(I_P_List_iterator)[0x7f910679cf03]
mysys/lf_hash.c:125(lfind)[0x7f9106c5a1a0]
sql/table_cache.cc:1084(tdc_iterate(THD*, char (*)(void*, void*), void*,
bool))[0x7f910679e4e3]
sql/table_cache.cc:267(tc_add_table(THD*, TABLE*))[0x7f910679e808]
sql/sql_base.cc:2572(open_table(THD*, TABLE_LIST*,
Open_table_context*))[0x7f9106633272]
sql/sql_base.cc:4016(open_and_process_table)[0x7f91066370c8]
sql/sql_base.cc:5184(open_and_lock_tables(THD*, DDL_options_st const&,
TABLE_LIST*, bool, unsigned int, Prelocking_strategy*))[0x7f9106637394]
sql/sql_parse.cc:5718(execute_sqlcom_select)[0x7f910666f083]
sql/sql_parse.cc:2938(mysql_execute_command(THD*))[0x7f910667a9ce]
sql/sql_parse.cc:7173(mysql_parse)[0x7f910667dc3d]
sql/sql_parse.cc:1470(dispatch_command(enum_server_command, THD*, char*,
unsigned int))[0x7f91066802a8]
sql/sql_parse.cc:1095(do_command(THD*))[0x7f910668083b]
sql/sql_connect.cc:1347(do_handle_one_connection(THD*))[0x7f91067435ef]
sql/sql_connect.cc:1260(handle_one_connection)[0x7f9106743747]
/lib64/libpthread.so.0(+0x7654)[0x7f9105374654]
/lib64/libc.so.6(clone+0x6d)[0x7f9104a987dd]
 
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x7f90c8424020): is an invalid pointer
Connection ID (thread ID): 516
Status: NOT_KILLED
 
Optimizer switch:
index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on
 
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
150617 17:02:14 mysqld_safe Number of processes running now: 0
150617 17:02:14 mysqld_safe mysqld restarted
2015-06-17 17:02:15 140086665152384 [Warning] 'THREAD_CONCURRENCY' is
deprecated and will be removed in a future release.
2015-06-17 17:02:15 140086665152384 [Note] /usr/mariadb/mariadb/bin//mysqld
(mysqld 10.1.5-MariaDB-log) starting as process 9785 ...
2015-06-17 17:02:15 140086665152384 [Note] mysqld: Aria engine: starting
recovery
recovered pages: 0% 10% 21% 31% 41% 51% 78% 90% 100% (0.0 seconds); tables
to flush: 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18
17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 (0.3 seconds);
2015-06-17 17:02:15 140086665152384 [Note] mysqld: Aria engine: recovery
done
2015-06-17 17:02:15 140086665152384 [Note] Plugin 'InnoDB' is disabled.
2015-06-17 17:02:15 140086665152384 [Note] Plugin 'FEEDBACK' is disabled.
2015-06-17 17:02:15 140086665152384 [Note] Recovering after a crash using
tc.log
2015-06-17 17:02:15 140086665152384 [Note] Starting crash recovery...
2015-06-17 17:02:15 140086665152384 [Note] Crash recovery finished.
2015-06-17 17:02:15 140086665152384 [Note] Server socket created on IP:
'::'.
2015-06-17 17:02:15 140086665152384 [Note] Event Scheduler: Loaded 0 events
2015-06-17 17:02:15 140086665152384 [Note] WSREP: Read nil XID from storage
engines, skipping position init
2015-06-17 17:02:15 140086665152384 [Note] WSREP: wsrep_load(): loading
provider library 'none'
2015-06-17 17:02:15 140086665152384 [Note]
/usr/mariadb/mariadb/bin//mysqld: ready for connections.
Version: '10.1.5-MariaDB-log'  socket: '/tmp/mysql.BH.sock'  port: 3307
 MariaDB Server
2015-06-17 17:02:19 140086617798400 [ERROR] mysqld: Table
'./conquista_comercial/est_atu' is marked as crashed and should be repaired
2015-06-17 17:02:19 140086617798400 [Warning] Checking table:
'./conquista_comercial/est_atu'
2015-06-17 17:02:19 140086617798400 [ERROR] mysqld: Table
'./conquista_comercial/est_spa' is marked as crashed and should be repaired
2015-06-17 17:02:19 140086617798400 [Warning] Checking table:
'./conquista_comercial/est_spa'
2015-06-17 17:02:51 140086617798400 [ERROR] mysqld: Table
'./conquista_comercial/est_dia' is marked as crashed and should be repaired
2015-06-17 17:02:51 140086617798400 [Warning] Checking table:
'./conquista_comercial/est_dia'
2015-06-17 17:02:52 140086617798400 [ERROR] mysqld: Table
'./conquista_comercial/rastreabilidade_impressao' is marked as crashed and
should be repaired
2015-06-17 17:02:52 140086617798400 [Warning] Checking table:
'./conquista_comercial/rastreabilidade_impressao'
2015-06-17 17:29:58 140086663948032 [ERROR] mysqld: Table
'./conquista_comercial/est_atu_erros' is marked as crashed and should be
repaired
2015-06-17 17:29:58 140086663948032 [Warning] Checking table:
'./conquista_comercial/est_atu_erros'
2015-06-17 19:24:04 140086538364672 [Note]
/usr/mariadb/mariadb/bin//mysqld: Normal shutdown
 
2015-06-17 19:24:04 140086538364672 [Note] WSREP: Stop replication
2015-06-17 19:24:06 140086538364672 [Note] Event Scheduler: Purging the
queue. 0 events
2015-06-17 19:24:07 140086538364672 [Note]
/usr/mariadb/mariadb/bin//mysqld: Shutdown complete
 
150617 19:24:07 mysqld_safe mysqld from pid file /home/mysql/pid/local.pid
ended



 Comments   
Comment by Sergey Vojtovich [ 2015-06-19 ]

elenst, could you try to reproduce this?

Comment by Elena Stepanova [ 2015-06-19 ]

See also MDEV-8333

Comment by Elena Stepanova [ 2015-06-19 ]

RQG grammar mdev8339.yy

query_init:
	CREATE TABLE IF NOT EXISTS t1 (`pk` INT AUTO_INCREMENT PRIMARY KEY, `c` VARCHAR(32), KEY(`c`)) ENGINE=MyISAM; CREATE TABLE IF NOT EXISTS t2 LIKE t1; CREATE TABLE IF NOT EXISTS t3 LIKE t1; CREATE TABLE IF NOT EXISTS t4 LIKE t1;
 
query:
	SELECT * FROM { 't'.(int(rand(500))+1) } ORDER BY `pk` LIMIT 1 |
	INSERT INTO { 't'.(int(rand(500))+1) } (`c`) VALUES ( _english ), ( _english ), ( _english ), ( _english ), ( _english ) |
	DELETE FROM { 't'.(int(rand(500))+1) } WHERE `c` = _english ORDER BY `pk` LIMIT _digit ;

RQG command line

perl ./runall-new.pl --mysqld=--table-open-cache=3 --threads=2 --duration=300 --queries=100M --skip-gendata --grammar=mdev8339.yy --basedir=<basedir> --vardir=<vardir>

Note: the degenerate value table_open_cache=3 is not critical, it just makes the test faster (less tables to create). However, the problem is reproducible as well with e.g. the default value table_open_cache=400 and with 500 tables.

Stack trace from 10.1.5 release bintar

#2  <signal handler called>
#3  0x00000000007288c1 in I_P_List_iterator (a=..., this=<optimized out>) at /home/buildbot/buildbot/build/sql/sql_plist.h:181
#4  free_tables_back (this=<optimized out>) at /home/buildbot/buildbot/build/sql/table_cache.h:110
#5  tc_add_table_callback (element=0x7f45e7d2cea8, arg=0x7f45e6b20700) at /home/buildbot/buildbot/build/sql/table_cache.cc:237
#6  0x0000000000be883b in lfind (head=0x7f45e7c12078, cs=0x0, hashnr=0, key=0x7f45e6b20700 "test", keylen=<optimized out>, cursor=0x7f45e6b20570, pins=0x7f4610d08180, callback=0x728850 <tc_add_table_callback(TDC_element*, tc_add_table_arg*)>) at /home/buildbot/buildbot/build/mysys/lf_hash.c:125
#7  0x0000000000be8afe in lf_hash_iterate (hash=0x1475200, pins=0x7f4610d08180, action=0x728850 <tc_add_table_callback(TDC_element*, tc_add_table_arg*)>, argument=0x7f45e6b20700) at /home/buildbot/buildbot/build/mysys/lf_hash.c:517
#8  0x0000000000727870 in tdc_iterate (thd=0x7f45f4b86008, action=0x728850 <tc_add_table_callback(TDC_element*, tc_add_table_arg*)>, argument=0x7f45e6b20700, no_dups=false) at /home/buildbot/buildbot/build/sql/table_cache.cc:1082
#9  0x0000000000728658 in tc_add_table (thd=0x7f45f4b86008, table=0x7f45e7068c08) at /home/buildbot/buildbot/build/sql/table_cache.cc:265
#10 0x0000000000586e65 in open_table (thd=0x7f45f4b86008, table_list=0x7f45e7c20140, ot_ctx=0x7f45e6b21090) at /home/buildbot/buildbot/build/sql/sql_base.cc:2572
#11 0x00000000005883b1 in open_and_process_table (ot_ctx=<optimized out>, has_prelocking_list=<optimized out>, prelocking_strategy=<optimized out>, flags=<optimized out>, counter=<optimized out>, tables=<optimized out>, lex=0x7f45f4b897c8, thd=<optimized out>) at /home/buildbot/buildbot/build/sql/sql_base.cc:4016
#12 open_tables (thd=0x7f45f4b86008, options=..., start=0x7f45e6b21148, counter=0x7f45e6b2116c, flags=0, prelocking_strategy=0x7f45e6b21920) at /home/buildbot/buildbot/build/sql/sql_base.cc:4526
#13 0x00000000005888eb in open_and_lock_tables (thd=0x3, options=..., tables=0x7f45e7c20140, derived=true, flags=3889352776, prelocking_strategy=0x42e1) at /home/buildbot/buildbot/build/sql/sql_base.cc:5184
#14 0x00000000005a79cf in open_and_lock_tables (flags=<optimized out>, derived=<optimized out>, tables=<optimized out>, thd=<optimized out>) at /home/buildbot/buildbot/build/sql/sql_base.h:530
#15 mysql_delete (thd=0x7f45f4b86008, table_list=0x7f45e7c20140, conds=0x7f45e7c208d8, order_list=0x7f45f4b8a1b8, limit=1, options=0, result=0x0) at /home/buildbot/buildbot/build/sql/sql_delete.cc:249
#16 0x00000000005d2955 in mysql_execute_command (thd=0x7f45f4b86008) at /home/buildbot/buildbot/build/sql/sql_parse.cc:4041
#17 0x00000000005d6fe6 in mysql_parse (thd=0x7f45f4b86008, rawbuf=0x7f45e7c20020 "DELETE FROM t2 WHERE `c` = 'could' ORDER BY `pk` LIMIT 1", length=<optimized out>, parser_state=0x7f45e6b23010) at /home/buildbot/buildbot/build/sql/sql_parse.cc:7173
#18 0x00000000005da10d in dispatch_command (command=COM_QUERY, thd=0x7f45f4b86008, packet=0x7f45f4b8c009 "DELETE FROM t2 WHERE `c` = 'could' ORDER BY `pk` LIMIT 1", packet_length=56) at /home/buildbot/buildbot/build/sql/sql_parse.cc:1470
#19 0x00000000005da517 in do_command (thd=0x7f45f4b86008) at /home/buildbot/buildbot/build/sql/sql_parse.cc:1093
#20 0x00000000006b3d90 in do_handle_one_connection (thd_arg=<optimized out>) at /home/buildbot/buildbot/build/sql/sql_connect.cc:1347
#21 0x00000000006b3fb2 in handle_one_connection (arg=0x7f45f4b86008) at /home/buildbot/buildbot/build/sql/sql_connect.cc:1258
#22 0x00007f4612ceeb50 in start_thread (arg=<optimized out>) at pthread_create.c:304
#23 0x00007f4611a2995d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#24 0x0000000000000000 in ?? ()

Stack trace from debug build of 10.1 commit 93fc04ff1dc613a9ad75ccc57988a6b57b94b6cb

#3  <signal handler called>
#4  0x00007f25c7069af9 in TDC_element::free_tables_back (this=0x7f25a145f710) at 10.1/sql/table_cache.h:110
#5  0x00007f25c706759f in tc_add_table_callback (element=0x7f25a145f710, arg=0x7f25c686f9a0) at 10.1/sql/table_cache.cc:237
#6  0x00007f25c76f9d2d in lfind (head=0x7f25a737a270, cs=0x0, hashnr=0, key=0x7f25c686f9a0 "test", keylen=0, cursor=0x7f25c686f820, pins=0x7f25c3d5f180, callback=0x7f25c7067565 <tc_add_table_callback(TDC_element*, tc_add_table_arg*)>) at 10.1/mysys/lf_hash.c:125
#7  0x00007f25c76fa9a4 in lf_hash_iterate (hash=0x7f25c81a0040, pins=0x7f25c3d5f180, action=0x7f25c7067565 <tc_add_table_callback(TDC_element*, tc_add_table_arg*)>, argument=0x7f25c686f9a0) at 10.1/mysys/lf_hash.c:517
#8  0x00007f25c7069649 in tdc_iterate (thd=0x7f25a130c070, action=0x7f25c7067565 <tc_add_table_callback(TDC_element*, tc_add_table_arg*)>, argument=0x7f25c686f9a0, no_dups=false) at 10.1/sql/table_cache.cc:1082
#9  0x00007f25c706778c in tc_add_table (thd=0x7f25a130c070, table=0x7f25a14e8e70) at 10.1/sql/table_cache.cc:265
#10 0x00007f25c6e3491d in open_table (thd=0x7f25a130c070, table_list=0x7f25a142c290, ot_ctx=0x7f25c686ffd0) at 10.1/sql/sql_base.cc:2572
#11 0x00007f25c6e37070 in open_and_process_table (thd=0x7f25a130c070, lex=0x7f25a130f968, tables=0x7f25a142c290, counter=0x7f25c6870074, flags=0, prelocking_strategy=0x7f25c68700e0, has_prelocking_list=false, ot_ctx=0x7f25c686ffd0) at 10.1/sql/sql_base.cc:4016
#12 0x00007f25c6e38165 in open_tables (thd=0x7f25a130c070, options=..., start=0x7f25c6870058, counter=0x7f25c6870074, flags=0, prelocking_strategy=0x7f25c68700e0) at 10.1/sql/sql_base.cc:4526
#13 0x00007f25c6e3933e in open_and_lock_tables (thd=0x7f25a130c070, options=..., tables=0x7f25a142c290, derived=true, flags=0, prelocking_strategy=0x7f25c68700e0) at 10.1/sql/sql_base.cc:5184
#14 0x00007f25c6e2ce3b in open_and_lock_tables (thd=0x7f25a130c070, tables=0x7f25a142c290, derived=true, flags=0) at 10.1/sql/sql_base.h:530
#15 0x00007f25c6ea836a in execute_sqlcom_select (thd=0x7f25a130c070, all_tables=0x7f25a142c290) at 10.1/sql/sql_parse.cc:5718
#16 0x00007f25c6e9eb24 in mysql_execute_command (thd=0x7f25a130c070) at 10.1/sql/sql_parse.cc:2938
#17 0x00007f25c6eabdff in mysql_parse (thd=0x7f25a130c070, rawbuf=0x7f25a142c088 "SELECT * FROM t4 ORDER BY `pk` LIMIT 1", length=38, parser_state=0x7f25c68711f0) at 10.1/sql/sql_parse.cc:7173
#18 0x00007f25c6e9ad78 in dispatch_command (command=COM_QUERY, thd=0x7f25a130c070, packet=0x7f25a13ed071 "SELECT * FROM t4 ORDER BY `pk` LIMIT 1", packet_length=38) at 10.1/sql/sql_parse.cc:1470
#19 0x00007f25c6e99a98 in do_command (thd=0x7f25a130c070) at 10.1/sql/sql_parse.cc:1093
#20 0x00007f25c6fdb40c in do_handle_one_connection (thd_arg=0x7f25a130c070) at 10.1/sql/sql_connect.cc:1347
#21 0x00007f25c6fdb151 in handle_one_connection (arg=0x7f25a130c070) at 10.1/sql/sql_connect.cc:1258
#22 0x00007f25c654fb50 in start_thread (arg=<optimized out>) at pthread_create.c:304
#23 0x00007f25c45e495d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112

Comment by Sergey Vojtovich [ 2015-06-25 ]

serg, please review fix for this bug.

Generated at Thu Feb 08 07:26:25 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.