[MDEV-10898] mysqld signal 11 (10.1.17-MariaDB-1~xenial, galera 25.3.17-xenial) Created: 2016-09-26  Updated: 2019-05-21  Resolved: 2019-05-21

Status: Closed
Project: MariaDB Server
Component/s: Galera
Affects Version/s: 10.1
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Brendan P Assignee: Jan Lindström (Inactive)
Resolution: Incomplete Votes: 1
Labels: None


 Description   

We've recently setup a 3 node cluster with one server acting as master doing all the writes. Every so often either of the other 2 nodes doing little read load will crash, perhaps every few days one will crash and then the other a day or two later. It doesn't appear to matter the load on them. It seems the crash is again in libpthread (MDEV-10768 is similar).

160926 11:38:06 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.
 
Server version: 10.1.17-MariaDB-1~xenial
key_buffer_size=16777216
read_buffer_size=524288
max_used_connections=1501
max_threads=1537
thread_count=56
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 7129709 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x0x7ee0ce412008
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7f11d75f7cd8 thread_stack 0x48400
(my_addr_resolve failure: fork)
/usr/sbin/mysqld(my_print_stacktrace+0x2e) [0x564e8628753e]
/usr/sbin/mysqld(handle_fatal_signal+0x2d5) [0x564e85dd62e5]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x113d0) [0x7f11da84c3d0]
/usr/sbin/mysqld(Field_blob::get_length(unsigned char const*, unsigned int)+0x28) [0x564e85dc3e38]
/usr/sbin/mysqld(Field_blob::unpack(unsigned char*, unsigned char const*, unsigned char const*, unsigned int)+0x3d) [0x564e85dc47fd]
/usr/sbin/mysqld(unpack_row(rpl_group_info*, TABLE*, unsigned int, unsigned char const*, st_bitmap const*, unsigned char const**, unsigned long*, unsigned char const*)+0x370) [0x564e85ea9660]
/usr/sbin/mysqld(Rows_log_event::write_row(rpl_group_info*, bool)+0xf7) [0x564e85ea5207]
/usr/sbin/mysqld(Write_rows_log_event::do_exec_row(rpl_group_info*)+0x7d) [0x564e85ea568d]
/usr/sbin/mysqld(Rows_log_event::do_apply_event(rpl_group_info*)+0x336) [0x564e85e98f86]
/usr/sbin/mysqld(wsrep_apply_cb+0x632) [0x564e85d82df2]
/usr/lib/galera/libgalera_smm.so(galera::TrxHandle::apply(void*, wsrep_cb_status (*)(void*, void const*, unsigned long, unsigned int, wsrep_trx_meta const*), wsrep_trx_meta const&) const+0x106) [0x7f11d6d21216]
/usr/lib/galera/libgalera_smm.so(+0x22761f) [0x7f11d6d7261f]
/usr/lib/galera/libgalera_smm.so(galera::ReplicatorSMM::apply_trx(void*, galera::TrxHandle*)+0xbe) [0x7f11d6d751fe]
/usr/lib/galera/libgalera_smm.so(galera::ReplicatorSMM::process_trx(void*, galera::TrxHandle*)+0x13e) [0x7f11d6d77c1e]
/usr/lib/galera/libgalera_smm.so(galera::GcsActionSource::dispatch(void*, gcs_action const&, bool&)+0x1d8) [0x7f11d6d4db98]
/usr/lib/galera/libgalera_smm.so(galera::GcsActionSource::process(void*, bool&)+0x76) [0x7f11d6d4f8e6]
/usr/lib/galera/libgalera_smm.so(galera::ReplicatorSMM::async_recv(void*)+0x83) [0x7f11d6d781f3]
/usr/lib/galera/libgalera_smm.so(galera_recv+0x2b) [0x7f11d6d923db]
/usr/sbin/mysqld(+0x541de7) [0x564e85d83de7]
/usr/sbin/mysqld(start_wsrep_THD+0x465) [0x564e85d73905]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76fa) [0x7f11da8426fa]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f11d9eedb5d]
 
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x0): 
Connection ID (thread ID): 12
Status: NOT_KILLED
 
Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=off
 
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.



 Comments   
Comment by Brendan P [ 2016-10-10 ]

Confirmed on 10.1.18 as well.

Comment by Nirbhay Choubey (Inactive) [ 2016-10-10 ]

spikestabber Do you have the exact that causes this? You could find it in the server's general log or binary log.

Comment by Brendan P [ 2016-10-10 ]

There is no specific query that I know of because we haven't been passing any traffic to these nodes that crash randomly whatsoever. Node B and C will SST and sync up fine from node A, then a day or two later crash with the above error, never do they crash at the same time which should narrow out the theory that its a specific query passed via replication. Never does node A (doing all the writes) ever crash... just B and C.

Anyhow do you happen to have a debug compiled binary you can attach so we can investigate further?
Thanks,

Comment by Nirbhay Choubey (Inactive) [ 2016-10-10 ]

Hi spikestabber,

There is no specific query that I know of because we haven't been passing any traffic to these nodes that crash randomly whatsoever. Node B and C will SST and sync up fine from node A, then a day or two later crash with the above error, never do they crash at the same time which should narrow out the theory that its a specific query passed via replication. Never does node A (doing all the writes) ever crash... just B and C.

The node (B or C, in this case) crashes while applying the events generated on node A. So, we need to find the exact query/row event that causes this.
You can enable general logging on B and C to log commands getting executed, enable binary logging --log-bin (with --log-slave-updates) to make node B/C log the binary log events that it receives from node A.

Anyhow do you happen to have a debug compiled binary you can attach so we can investigate further?

No, but you can build one yourself. Here are the instructions : https://mariadb.com/kb/en/mariadb/generic-build-instructions/

Generated at Thu Feb 08 07:45:46 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.