[MDEV-24279] Segfault after 1 day and 5 minutes uptime Created: 2020-11-25  Updated: 2021-05-03  Resolved: 2020-12-12

Status: Closed
Project: MariaDB Server
Component/s: Information Schema, Plugin - feedback
Affects Version/s: 10.3.27, 10.3, 10.4, 10.5
Fix Version/s: 10.3.28, 10.4.18, 10.5.9

Type: Bug Priority: Blocker
Reporter: Kees Hoekzema Assignee: Sergei Golubchik
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Duplicate
duplicates MDEV-24315 Mariadb crash schema_table_store_reco... Closed

 Description   

Operating system: CentOS 7
Yum repository: http://yum.mariadb.org/10.3/centos7-amd64
CPU & mem: 2x Intel Gold 6128 CPU, 256G memory

After upgrading from 10.3.21 to 10.3.27 mariadb started crashing after around one day of uptime. After a few crashes a pattern emerged: it would crash after exactly 1 day and 5 minutes

2020-11-16 11:13:53 0 [Note] /usr/sbin/mysqld: ready for connections.
2020-11-17 11:18:54 [ERROR] mysqld got signal 11 ;
 
2020-11-17 11:19:28 0 [Note] /usr/sbin/mysqld: ready for connections.
2020-11-18 11:24:28 [ERROR] mysqld got signal 11 ;
 
2020-11-18 11:25:01 0 [Note] /usr/sbin/mysqld: ready for connections.
2020-11-19 11:30:01 [ERROR] mysqld got signal 11 ;
 
2020-11-19 11:30:36 0 [Note] /usr/sbin/mysqld: ready for connections.
2020-11-20 11:35:36 [ERROR] mysqld got signal 11 ;
 
2020-11-20 11:36:22 0 [Note] /usr/sbin/mysqld: ready for connections.
2020-11-21 11:41:23 [ERROR] mysqld got signal 11 ;
 
2020-11-21 11:42:12 0 [Note] /usr/sbin/mysqld: ready for connections.
2020-11-22 11:47:12 [ERROR] mysqld got signal 11 ;
 
2020-11-22 11:47:58 0 [Note] /usr/sbin/mysqld: ready for connections.
2020-11-23 11:52:58 [ERROR] mysqld got signal 11 ;
 
2020-11-24  8:59:17 0 [Note] /usr/sbin/mysqld: ready for connections.
2020-11-25  9:04:17 [ERROR] mysqld got signal 11 ;

The stacktrace is the same for every segfault:

Thread pointer: 0x7f636c0eae58
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7f6391b1a080 thread_stack 0x49000
/usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x55e20502266e]
/usr/sbin/mysqld(handle_fatal_signal+0x30f)[0x55e204ab9e7f]
/lib64/libpthread.so.0(+0xf630)[0x7f857b611630]
/usr/sbin/mysqld(_Z25schema_table_store_recordP3THDP5TABLE+0x39)[0x55e2049421a9]
/usr/sbin/mysqld(+0x63c2fa)[0x55e2049462fa]
/usr/sbin/mysqld(_Z14fill_variablesP3THDP10TABLE_LISTP4Item+0x126)[0x55e204949706]
/usr/sbin/mysqld(+0xcd6fd8)[0x55e204fe0fd8]
/usr/sbin/mysqld(+0x4f4bc9)[0x55e2047febc9]
/usr/sbin/mysqld(+0xcd717a)[0x55e204fe117a]
pthread_create.c:0(start_thread)[0x7f857b609ea5]
/lib64/libc.so.6(clone+0x6d)[0x7f85799a996d]
 
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x0): (null)
Connection ID (thread ID): 6
Status: NOT_KILLED

Seeing the low connection id it looks like a system thread.

The full latest crash log

201125  9:04:17 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.
 
Server version: 10.3.27-MariaDB
key_buffer_size=1610612736
read_buffer_size=4194304
max_used_connections=95
max_threads=452
thread_count=73
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 7137201 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x7f636c0eae58
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7f6391b1a080 thread_stack 0x49000
/usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x55e20502266e]
/usr/sbin/mysqld(handle_fatal_signal+0x30f)[0x55e204ab9e7f]
/lib64/libpthread.so.0(+0xf630)[0x7f857b611630]
/usr/sbin/mysqld(_Z25schema_table_store_recordP3THDP5TABLE+0x39)[0x55e2049421a9]
/usr/sbin/mysqld(+0x63c2fa)[0x55e2049462fa]
/usr/sbin/mysqld(_Z14fill_variablesP3THDP10TABLE_LISTP4Item+0x126)[0x55e204949706]
/usr/sbin/mysqld(+0xcd6fd8)[0x55e204fe0fd8]
/usr/sbin/mysqld(+0x4f4bc9)[0x55e2047febc9]
/usr/sbin/mysqld(+0xcd717a)[0x55e204fe117a]
pthread_create.c:0(start_thread)[0x7f857b609ea5]
/lib64/libc.so.6(clone+0x6d)[0x7f85799a996d]
 
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x0): (null)
Connection ID (thread ID): 6
Status: NOT_KILLED
 
Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on
 
The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
information that should help you find out what is causing the crash.
 
We think the query pointer is invalid, but we will try to print it anyway. 
Query: 
 
Writing a core file...
Working directory at /var/lib/mysql
Resource Limits:
Limit                     Soft Limit           Hard Limit           Units     
Max cpu time              unlimited            unlimited            seconds   
Max file size             unlimited            unlimited            bytes     
Max data size             unlimited            unlimited            bytes     
Max stack size            8388608              unlimited            bytes     
Max core file size        0                    unlimited            bytes     
Max resident set          unlimited            unlimited            bytes     
Max processes             1028777              1028777              processes 
Max open files            16384                16384                files     
Max locked memory         65536                65536                bytes     
Max address space         unlimited            unlimited            bytes     
Max file locks            unlimited            unlimited            locks     
Max pending signals       1028777              1028777              signals   
Max msgqueue size         819200               819200               bytes     
Max nice priority         0                    0                    
Max realtime priority     0                    0                    
Max realtime timeout      unlimited            unlimited            us        
Core pattern: core



 Comments   
Comment by Elena Stepanova [ 2020-11-25 ]

Do you have feedback plugin enabled?
If so, please try to disable it and see if it helps.

Comment by Kees Hoekzema [ 2020-11-25 ]

The feedback plugin is indeed loaded, i will disable it for now and wait at least a day (well, two as i need to wait for offhours to restart the server, `uninstall plugin` doesn't work)

Comment by Elena Stepanova [ 2020-11-25 ]

The failure appeared in 10.3 after this commit:

commit e64084d5a3a72462fa6263d1d0a86e72c0ba0d47
Author: Sergei Golubchik
Date:   Sat Aug 1 13:12:50 2020 +0200
 
    MDEV-21201 No records produced in information_schema query, depending on projection

10.5 657fcdf430

#3  <signal handler called>
#4  0x000055863b460f88 in heap_write (info=0x0, record=0x7f976804ce80 "\377\023") at /data/src/10.5-bug/storage/heap/hp_write.c:37
#5  0x000055863b459b3c in ha_heap::write_row (this=0x7f9768044020, buf=0x7f976804ce80 "\377\023") at /data/src/10.5-bug/storage/heap/ha_heap.cc:239
#6  0x000055863ad2d0ee in handler::ha_write_tmp_row (this=0x7f9768044020, buf=0x7f976804ce80 "\377\023") at /data/src/10.5-bug/sql/sql_class.h:7029
#7  0x000055863ad409bb in schema_table_store_record (thd=0x7f9768031b48, table=0x7f9768030250) at /data/src/10.5-bug/sql/sql_show.cc:3868
#8  0x000055863ad40734 in show_status_array (thd=0x7f9768031b48, wild=0x0, variables=0x7f976805bdd0, scope=SHOW_OPT_GLOBAL, status_var=0x0, prefix=0x55863ba734a8 "", table=0x7f9768030250, ucase_names=true, cond=0x7f9768057a08) at /data/src/10.5-bug/sql/sql_show.cc:3787
#9  0x000055863ad5246c in fill_variables (thd=0x7f9768031b48, tables=0x7f977dbf27a0, cond=0x7f97680385f8) at /data/src/10.5-bug/sql/sql_show.cc:7826
#10 0x000055863b9d18cb in feedback::fill_feedback (thd=0x7f9768031b48, tables=0x7f977dbf27a0, unused=0x0) at /data/src/10.5-bug/plugin/feedback/feedback.cc:215
#11 0x000055863b9d302f in feedback::send_report (when=0x0) at /data/src/10.5-bug/plugin/feedback/sender_thread.cc:211
#12 0x000055863b9d3431 in feedback::background_thread (arg=0x0) at /data/src/10.5-bug/plugin/feedback/sender_thread.cc:282
#13 0x00007f97a319e609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#14 0x00007f97a2d72293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Start the server with --feedback --feedback_debug_first_interval=5 --feedback_debug_startup_interval=5, all other defaults.

Comment by Juan Gabriel Covas [ 2020-12-02 ]

Hello, I can confirm the crash is gone once disabled the feedback plugin via config (feedback=OFF) and restarted mariadb, waited one day to check. I can confirm my duplicated opened ticket: MDEV-24315

Generated at Thu Feb 08 09:28:47 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.