[MDEV-14161] frequently crashs on unknown reason Created: 2017-10-27  Updated: 2018-10-02

Status: Open
Project: MariaDB Server
Component/s: None
Affects Version/s: 10.1.28
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Uwe Muenzberg Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None
Environment:

Dragonflybsd 4.8


Attachments: File WS-00034GHRH2.lit.msb.msc.err_delphi-client     HTML File crashlog     Text File errorlog.txt     Text File gdb1.txt     Text File gdb2.txt     Text File gdb2.txt     File libmysql.dll     File master1.err     File my.cnf     File mysql.pas    

 Description   

Server crashs with Signal 10 (sometimes with 11) frequently (2 times per month). We see the bug allready in earlier versions, but this time we get a coredump. We can upload the coredump and the mysql_safe binary if needed.

171026 18:45:06 [ERROR] mysqld got signal 10 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.
 
Server version: 10.1.28-MariaDB
key_buffer_size=134217728
read_buffer_size=131072
max_used_connections=133
max_threads=153
thread_count=62
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 467072 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x821e50008
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x0 thread_stack 0x48400
0x19972e9 <my_print_stacktrace+0x29> at /usr/local/libexec/mysqld
0x158aab5 <handle_fatal_signal+0x2f5> at /usr/local/libexec/mysqld
0x7ffffffff003 <???> at ???
0x801eae0a9 <???> at ???
0x801ea88f1 <???> at ???
0x801ea8c18 <???> at ???
0x802365cde <_tcb_dtor+0x9> at /usr/lib/libpthread.so.0
0x8023653a3 <_thr_free+0x62> at /usr/lib/libpthread.so.0
0x8023657bc <_thr_gc+0x1ca> at /usr/lib/libpthread.so.0
0x8023657f1 <_thr_alloc+0x21> at /usr/lib/libpthread.so.0
0x80235fe51 <pthread_create+0x6b> at /usr/lib/libpthread.so.0
0x139442a <_Z34create_thread_to_handle_connectionP3THD+0xba> at /usr/local/libexec/mysqld
0x1394f61 <_Z26handle_connections_socketsv+0x711> at /usr/local/libexec/mysqld
0x139c639 <_Z11mysqld_mainiPPc+0x2719> at /usr/local/libexec/mysqld
 
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x0): is an invalid pointer
Connection ID (thread ID): 377520
Status: NOT_KILLED
 
Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_m
erge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_
sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=o
n,orderby_uses_equalities=off
 
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
 
We think the query pointer is invalid, but we will try to print it anyway. 
Query: 
 
171026 18:45:25 mysqld_safe mysqld restarted



 Comments   
Comment by Elena Stepanova [ 2017-10-30 ]

um,

We don't have Dragonfly in-house. For starters, could you please get all threads' stack trace and full stack trace from the coredump that you stored, like

gdb --batch --eval-command="thread apply all bt"
gdb --batch --eval-command="thread apply all bt full"

(or whatever debugging equivalent you have on the system).
Please also attach your cnf file(s) and the error log beginning from server startup and up to the end of server restart after the crash.

Thank you.

Comment by Uwe Muenzberg [ 2017-11-01 ]

Here comes our my.cnf, the error log and the output of gdb --batch --eval-command="thread apply all bt full" (gdb2.txt). It seems
there are no debugging symbols, we hope it can still help.
The server is the master in a replication whith one slave. The slave (same hardware, same os) is always stable for a lot of month.

my.cnf master1.err gdb2.txt

Comment by Uwe Muenzberg [ 2017-12-04 ]

We upgraded our server to 5.0-RELEASE of DragonFlyBSD this weekend, because we see some changes in libthread.so.
We hope that these changes also fix our issues. I will give feedback no later than January if it is stable now.

Comment by Uwe Muenzberg [ 2017-12-13 ]

The OS-update doesn't solve our issues. We are not sure, if this is the same problem, but we saw the next crash this morning.
I append the relevant part of log file and the output of gdb commands.
crashlog gdb1.txt gdb2.txt

Comment by Elena Stepanova [ 2017-12-28 ]

um, could you please clarify, are you only getting the second kind of the crash after the upgrade (SIGSEGV on a query), or are you still getting the one in libthread, too?

I don't know what we can do about the libthread one, there isn't much happening on MariaDB side; but the second one might well relate to MariaDB.

Comment by Uwe Muenzberg [ 2018-01-03 ]

During the last month we saw one or two crashs per month. At this time the server runs stable since 2017-12-13. If we see the next crash, I will give a feedback.

Comment by Uwe Muenzberg [ 2018-01-24 ]

We went forward to 10.1.30 on 2018-01-06. Yesterday we got a crash again. The addr2line process hangs and we kill it this morning manually. After that mysql restarted.

errorlog.txt

Comment by Elena Stepanova [ 2018-01-26 ]

From the attached error log (to make it searchable):

180123 21:04:47 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.
 
Server version: 10.1.30-MariaDB
key_buffer_size=134217728
read_buffer_size=131072
max_used_connections=84
max_threads=153
thread_count=52
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 467099 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x824d99008
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7ffffb6db348 thread_stack 0x48400
0x1998e49 <my_print_stacktrace+0x29> at /usr/local/libexec/mysqld
0x158bcc5 <handle_fatal_signal+0x2f5> at /usr/local/libexec/mysqld
0x7fffffdfe003 <???> at ???
0x158c487 <_ZN7handler26get_dynamic_partition_infoEP15PARTITION_STATSj+0xc7> at /usr/local/libexec/mysqld
0x198bd0b <my_qsort+0x58b> at /usr/local/libexec/mysqld
0x1471688 <_Z21fill_key_cache_tablesP3THDP10TABLE_LISTP4Item+0x368> at /usr/local/libexec/mysqld
0x14854cb <_Z14get_all_tablesP3THDP10TABLE_LISTP4Item+0x9ab> at /usr/local/libexec/mysqld
0x1486cd3 <_Z24get_schema_tables_resultP4JOIN23enum_schema_table_state+0x283> at /usr/local/libexec/mysqld
0x146d5fe <_ZN4JOIN10exec_innerEv+0x6de> at /usr/local/libexec/mysqld
0x146f69b <_ZN4JOIN4execEv+0x4b> at /usr/local/libexec/mysqld
0x146c137 <_Z12mysql_selectP3THDPPP4ItemP10TABLE_LISTjR4ListIS1_ES2_jP8st_orderSB_S2_SB_yP13select_resultP18st_select_lex_unitP13st_select_lex+0xe7> at /usr/local/libexec/mysqld
0x146cb50 <_Z13handle_selectP3THDP3LEXP13select_resultm+0x130> at /usr/local/libexec/mysqld
0x1376471 <_init+0x21e1> at /usr/local/libexec/mysqld
0x142075a <_Z21mysql_execute_commandP3THD+0x6b4a> at /usr/local/libexec/mysqld
0x14227ff <_Z11mysql_parseP3THDPcjP12Parser_state+0x2ff> at /usr/local/libexec/mysqld
0x1425bee <_Z16dispatch_command19enum_server_commandP3THDPcj+0x238e> at /usr/local/libexec/mysqld
0x14263a4 <_Z10do_commandP3THD+0x134> at /usr/local/libexec/mysqld
0x14e18c9 <_Z24do_handle_one_connectionP3THD+0x189> at /usr/local/libexec/mysqld
0x14e1a77 <handle_one_connection+0x37> at /usr/local/libexec/mysqld
0x80235cdf9 <pthread_detach+0x27e> at /usr/lib/libpthread.so.0
 
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x81ef39020): is an invalid pointer
Connection ID (thread ID): 2388760
Status: NOT_KILLED
 
Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=off
 
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
 
We think the query pointer is invalid, but we will try to print it anyway. 
Query: SELECT * FROM REFERENTIAL_CONSTRAINTS LIMIT 1

Comment by Uwe Muenzberg [ 2018-04-11 ]

Our server is now stable for more than 2 month. We believe, but we are not absolutly sure, that our issue was triggered by an old client. These client, written in delphi3, use an old component and libmysql for database access. The client connected, made some querys and disconnected 6 times per minute. About 5 to 8 of these clients was on net. On each connect the client read the information schema to provide functions like FieldByName() etc. We saw typical questions often in the processlist. The client was refactored to leave the connection open and disconnect only when the client is closed. It seems that these changes solved our issue.

Comment by Elena Stepanova [ 2018-07-20 ]

um, do you happen to remember whether you upgraded to 10.1.31 before the failure stopped happening?
Timing fits, on April 11th you wrote that you hadn't experienced the problem for over 2 months, and 10.1.31 was released in the beginning of February.
If you indeed upgraded soon after release, the disappearance of the problem could be related to the bugfix released in MDEV-11539.

Comment by Uwe Muenzberg [ 2018-07-21 ]

We used 10.1.30 until 2018-05-22 and then upgraded to 10.1.32. We never used 10.1.31. Our systems have been always stable since february, when we saw our last crash (without any debugging hints).

Comment by Elena Stepanova [ 2018-07-21 ]

Thanks for the info.

Comment by Uwe Muenzberg [ 2018-10-02 ]

I believe I can explain now what action caused the issue.

We use some database clients written in Delphi3. All of them are using the component
TMySQL for database operations. If a program connect to database using this component
it reads the structure of all databases on the server from information_schema to provide
a function called "getDatabases". If many databases hosted on the server this can take
a long time and produces heavy load on server and client machine.

I built a test client using this component which connects to the database server, do
a simple query, and disconnects. The client repeat this action driven by a timer
about 12 times per minute. If I run about 30 of these clients at the same time
the server crashes some time after start of test. (see appended log file)

Because we never used the getDatabases-function we removed this code from the
component. This reduced the server load and the time for the connect operation
dramatically and at the end - the server seems to be allways stable now.

FYI I appended the source code of TMySQL and the associated dll.

WS-00034GHRH2.lit.msb.msc.err_delphi-client libmysql.dll mysql.pas

Generated at Thu Feb 08 08:11:23 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.