[MDEV-14272] Mariadb crashes with signal 11 when using federatedx engine and galera Created: 2017-11-03  Updated: 2018-07-25  Resolved: 2017-12-28

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - Federated
Affects Version/s: 10.1
Fix Version/s: 10.1.31

Type: Bug Priority: Major
Reporter: Jiri Lunacek Assignee: Sergei Golubchik
Resolution: Fixed Votes: 0
Labels: None
Environment:

Tested on CentOS 6 and CentOS 7.4
Mariadb 10.1.20, 10.1.25, 10.1.28.
Current package versions:
MariaDB-common-10.1.28-1.el7.centos.x86_64
MariaDB-server-10.1.28-1.el7.centos.x86_64
MariaDB-compat-10.1.28-1.el7.centos.x86_64
MariaDB-client-10.1.28-1.el7.centos.x86_64
MariaDB-shared-10.1.28-1.el7.centos.x86_64


Issue Links:
Relates
relates to MDEV-12951 Server crash [mysqld got exception 0x... Closed
relates to MDEV-14414 mysqld got signal 11 Closed
relates to MDEV-14523 [ERROR] mysqld got signal 11 Closed
relates to MDEV-15617 [ERROR] mysqld got signal 11 ; Closed

 Description   

This is a clone of closed bug MDEV-14048
We just encountered the crash on 10.1.28

Tested on versions above.

3 node galera installation, requests go to only one (primary node).

Some queries against a database full of federatedx engine tables causes the server to crash. After restart the node syncs back to the cluster with no apparent damage.

Not every query against the federatedx engine tables causes the crash. We were not able to identify the exact query. The ones suggested by log of query pointer after crash executed successfully when run by hand in console.

Additional note:
Disabling query cache seemed to have prevented the crash, but sent our server performance to the gutter. So we re-enabled the case and stopped using federatedx engine tables.

Crash log:

171103  8:13:21 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
 
Server version: 10.1.28-MariaDB
key_buffer_size=67108864
read_buffer_size=262144
max_used_connections=91
max_threads=3002
thread_count=15
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1574808519 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x7f2d5bff4008
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7f3039cd7cb0 thread_stack 0xa0000
/usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x55d203c0092e]
/usr/sbin/mysqld(handle_fatal_signal+0x305)[0x55d203723ce5]
/lib64/libpthread.so.0(+0xf5e0)[0x7f31aac0b5e0]
/usr/sbin/mysqld(_ZN17Query_cache_block11headers_lenEv+0x0)[0x55d20355d810]
/usr/sbin/mysqld(_ZN17Query_cache_block4dataEv+0x11)[0x55d20355d831]
/usr/sbin/mysqld(_ZN11Query_cache6insertEP3THDP15Query_cache_tlsPKcmj+0x5c)[0x55d20356121c]
/usr/sbin/mysqld(net_real_write+0x45)[0x55d20350ae05]
/usr/sbin/mysqld(net_flush+0x23)[0x55d20350b143]
/usr/sbin/mysqld(net_write_command+0x3fb)[0x55d20350b8ab]
/usr/sbin/mysqld(cli_advanced_command+0xe2)[0x55d203703782]
/usr/sbin/mysqld(mysql_close_slow_part+0x54)[0x55d203700d44]
/usr/sbin/mysqld(mysql_close+0x1a)[0x55d203700d7a]
/usr/lib64/mysql/plugin/ha_federatedx.so(_ZN19federatedx_io_mysqlD1Ev+0x24)[0x7f309b1f9094]
/usr/lib64/mysql/plugin/ha_federatedx.so(_ZN14federatedx_txn5closeEP18st_fedrated_server+0x69)[0x7f309b1f83b9]
/usr/lib64/mysql/plugin/ha_federatedx.so(+0xa6ff)[0x7f309b1f16ff]
/usr/lib64/mysql/plugin/ha_federatedx.so(+0xa8b5)[0x7f309b1f18b5]
/usr/lib64/mysql/plugin/ha_federatedx.so(_ZN13ha_federatedx5closeEv+0x98)[0x7f309b1f5398]
/usr/sbin/mysqld(_Z8closefrmP5TABLEb+0x38)[0x55d203643bf8]
/usr/sbin/mysqld(_Z18intern_close_tableP5TABLE+0x36)[0x55d203550466]
/usr/sbin/mysqld(_Z12tc_add_tableP3THDP5TABLE+0x42c)[0x55d2036d7dac]
/usr/sbin/mysqld(_Z10open_tableP3THDP10TABLE_LISTP18Open_table_context+0xccd)[0x55d20355300d]
/usr/sbin/mysqld(_Z11open_tablesP3THDRK14DDL_options_stPP10TABLE_LISTPjjP19Prelocking_strategy+0x100a)[0x55d20355735a]
/usr/sbin/mysqld(_Z20open_and_lock_tablesP3THDRK14DDL_options_stP10TABLE_LISTbjP19Prelocking_strategy+0x34)[0x55d203557684]
mysys/stacktrace.c:268(my_print_stacktrace)[0x55d2034d9fcc]
sql/sql_cache.cc:844(Query_cache_block::headers_len())[0x55d20359e9b0]
sql-common/client.c:3963(me/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.1.28/sql-common/client.c:3953)[0x55d2035a1702]
sql/sql_string.h:312(String::free())[0x55d2035a2021]
sql/table_cache.cc:291(tc_add_table(THD*, TABLE*))[0x55d2035a4971]
sql/sql_parse.cc:5833(execute_sqlcom_select(THD*, TABLE_LIST*))[0x55d2035a59c6]
sql/sql_connect.cc:1349(do_handle_one_connection(THD*))[0x55d20367161a]
sql/sql_connect.cc:1263(handle_one_connection)[0x55d2036717c0]
/lib64/libpthread.so.0(+0x7e25)[0x7f31aac03e25]
/lib64/libc.so.6(clone+0x6d)[0x7f31a8fa734d]



 Comments   
Comment by Elena Stepanova [ 2017-11-07 ]

Thanks for the report.

It appears that MDEV-12951 also affects 10.1, while we only targeted and fixed in for 10.2.
Same test case from MDEV-12951:

--source include/have_innodb.inc
 
INSTALL SONAME 'ha_federated';
 
SET GLOBAL query_cache_size= 16*1024*1024;
SET GLOBAL query_cache_type= 1;
 
CREATE TABLE t1 (i INT) ENGINE=InnoDB;
eval
CREATE TABLE t2 (i INT) ENGINE=FEDERATED 
  CONNECTION="mysql://root@localhost:$MASTER_MYPORT/test/t1";
 
SELECT * FROM t2;

10.1 28c3459aa7c5db94

#3  <signal handler called>
#4  0x000055becbeda765 in QUERY_PROFILE::new_status (this=0x8f8f8f8f8f8f8f8f, status_arg=0x55becc6d476a "Waiting for query cache lock", function_arg=0x55becc6e9b1b <Query_cache::try_lock(THD*, Query_cache::Cache_try_lock_mode)::__FUNCTION__> "try_lock", file_arg=0x55becc6e7260 "/data/src/10.1/sql/sql_cache.cc", line_arg=603) at /data/src/10.1/sql/sql_profile.cc:312
#5  0x000055becbce4de8 in PROFILING::status_change (this=0x7f490a38b578, status_arg=0x55becc6d476a "Waiting for query cache lock", function_arg=0x55becc6e9b1b <Query_cache::try_lock(THD*, Query_cache::Cache_try_lock_mode)::__FUNCTION__> "try_lock", file_arg=0x55becc6e7260 "/data/src/10.1/sql/sql_cache.cc", line_arg=603) at /data/src/10.1/sql/sql_profile.h:312
#6  0x000055becbce53e2 in THD::enter_stage (this=0x7f490a388070, stage=0x55becce85db0 <stage_waiting_for_query_cache_lock>, calling_func=0x55becc6e9b1b <Query_cache::try_lock(THD*, Query_cache::Cache_try_lock_mode)::__FUNCTION__> "try_lock", calling_file=0x55becc6e7260 "/data/src/10.1/sql/sql_cache.cc", calling_line=603) at /data/src/10.1/sql/sql_class.h:2066
#7  0x000055becbd604be in set_thd_stage_info (thd_arg=0x7f490a388070, new_stage=0x55becce85db0 <stage_waiting_for_query_cache_lock>, old_stage=0x7f49152cda68, calling_func=0x55becc6e9b1b <Query_cache::try_lock(THD*, Query_cache::Cache_try_lock_mode)::__FUNCTION__> "try_lock", calling_file=0x55becc6e7260 "/data/src/10.1/sql/sql_cache.cc", calling_line=603) at /data/src/10.1/sql/sql_class.cc:557
#8  0x000055becbd5e89e in Query_cache_wait_state::Query_cache_wait_state (this=0x7f49152cda60, thd=0x7f490a388070, func=0x55becc6e9b1b <Query_cache::try_lock(THD*, Query_cache::Cache_try_lock_mode)::__FUNCTION__> "try_lock", file=0x55becc6e7260 "/data/src/10.1/sql/sql_cache.cc", line=603) at /data/src/10.1/sql/sql_cache.cc:432
#9  0x000055becbd50fd8 in Query_cache::try_lock (this=0x55becd0814c0 <query_cache>, thd=0x7f490a388070, mode=Query_cache::WAIT) at /data/src/10.1/sql/sql_cache.cc:603
#10 0x000055becbd51fd5 in Query_cache::insert (this=0x55becd0814c0 <query_cache>, thd=0x7f490a388070, query_cache_tls=0x7f490a388328, packet=0x7f48fce69070 "\001", length=5, pkt_nr=1) at /data/src/10.1/sql/sql_cache.cc:1082
#11 0x000055becbd51f05 in query_cache_insert (thd_arg=0x7f490a388070, packet=0x7f48fce69070 "\001", length=5, pkt_nr=1) at /data/src/10.1/sql/sql_cache.cc:1057
#12 0x000055becbce67fa in net_real_write (net=0x7f48fcc68a70, packet=0x7f48fce69070 "\001", len=5) at /data/src/10.1/sql/net_serv.cc:606
#13 0x000055becbce6129 in net_flush (net=0x7f48fcc68a70) at /data/src/10.1/sql/net_serv.cc:363
#14 0x000055becbce653e in net_write_command (net=0x7f48fcc68a70, command=1 '\001', header=0x0, head_len=0, packet=0x0, len=0) at /data/src/10.1/sql/net_serv.cc:501
#15 0x000055becbfa38e1 in cli_advanced_command (mysql=0x7f48fcc68a70, command=COM_QUIT, header=0x0, header_length=0, arg=0x0, arg_length=0, skip_check=1 '\001', stmt=0x0) at /data/src/10.1/sql-common/client.c:701
#16 0x000055becbfaabbf in mysql_close_slow_part (mysql=0x7f48fcc68a70) at /data/src/10.1/sql-common/client.c:3950
#17 0x000055becbfaac38 in mysql_close (mysql=0x7f48fcc68a70) at /data/src/10.1/sql-common/client.c:3962
#18 0x00007f49130b5ec3 in ha_federated::close (this=0x7f48fccb0888) at /data/src/10.1/storage/federated/ha_federated.cc:1693
#19 0x000055becbfe6fcb in handler::ha_close (this=0x7f48fccb0888) at /data/src/10.1/sql/handler.cc:2596
#20 0x000055becbe97fc5 in closefrm (table=0x7f48fcc81c70, free_share=true) at /data/src/10.1/sql/table.cc:3054
#21 0x000055becbd394c3 in intern_close_table (table=0x7f48fcc81c70) at /data/src/10.1/sql/sql_base.cc:354
#22 0x000055becbf689e5 in tc_purge (mark_flushed=true) at /data/src/10.1/sql/table_cache.cc:204
#23 0x000055becbd398e6 in close_cached_tables (thd=0x0, tables=0x0, wait_for_refresh=false, timeout=31536000) at /data/src/10.1/sql/sql_base.cc:485
#24 0x000055becbf6935c in tdc_start_shutdown () at /data/src/10.1/sql/table_cache.cc:460
#25 0x000055becbcd5e02 in clean_up (print_message=true) at /data/src/10.1/sql/mysqld.cc:2120
#26 0x000055becbcd5a28 in unireg_end () at /data/src/10.1/sql/mysqld.cc:2002
#27 0x000055becbcd593f in kill_server (sig_ptr=0x0) at /data/src/10.1/sql/mysqld.cc:1930
#28 0x000055becbcd5967 in kill_server_thread (arg=0x7f49153f61d0) at /data/src/10.1/sql/mysqld.cc:1953
#29 0x000055becc292b98 in pfs_spawn_thread (arg=0x7f48fc42c0f0) at /data/src/10.1/storage/perfschema/pfs.cc:1861
#30 0x00007f4914ff1494 in start_thread (arg=0x7f49152ceb00) at pthread_create.c:333
#31 0x00007f49133aa93f in clone () from /lib/x86_64-linux-gnu/libc.so.6

Generated at Thu Feb 08 08:12:16 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.