[MDEV-10768] Galera segfault on transaction abort after failed wsrep_grant_mdl_exception() Created: 2016-09-07  Updated: 2016-09-29  Resolved: 2016-09-29

Status: Closed
Project: MariaDB Server
Component/s: Galera
Affects Version/s: 10.1.14
Fix Version/s: 10.1.18

Type: Bug Priority: Major
Reporter: Hartmut Holzgraefe Assignee: Nirbhay Choubey (Inactive)
Resolution: Duplicate Votes: 0
Labels: None

Issue Links:
Duplicate
duplicates MDEV-9416 MariaDB galera got signal 11 when alt... Closed

 Description   

A galera node crashed with a segmentation fault in `ha_abort_transaction()` calling a unknown pthread library function.

The backtrace looks similar to MDEV-9416 but the originating statement here was an `ALTER TABLE`, most likely the addition of another partition to a partitioned table.

The weird part is that the actual crash happens in libpthread, I don't see anything in ha_abort_transaction() calling any pthread functions. With the exception of the DBUG_ macros, but as this happened with a non-debug production binary these should not be compiled in, right?

160831 20:50:13 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.
 
Server version: 10.1.14-MariaDB
key_buffer_size=1073741824
read_buffer_size=131072
max_used_connections=401
max_threads=1002
thread_count=438
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 3249432 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x0x7f67b9412008
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7f679b035d40 thread_stack 0x48400
(my_addr_resolve failure: fork)
/usr/sbin/mysqld(my_print_stacktrace+0x2b) [0x7f96ebc9797b]
/usr/sbin/mysqld(handle_fatal_signal+0x475) [0x7f96eb7f6825]
/lib64/libpthread.so.0(+0x3062a0f710) [0x7f96eadfc710]
/usr/sbin/mysqld(ha_abort_transaction(THD*, THD*, char)+0xa7) [0x7f96eb800807]
/usr/sbin/mysqld(wsrep_abort_thd(void*, void*, char)+0x154) [0x7f96eb7a7b54]
/usr/sbin/mysqld(wsrep_grant_mdl_exception(MDL_context*, MDL_ticket*, MDL_key const*)+0x1c9) [0x7f96eb799669]
/usr/sbin/mysqld(MDL_lock::can_grant_lock(enum_mdl_type, MDL_context*, bool) const+0x11f) [0x7f96eb752dbf]
/usr/sbin/mysqld(MDL_context::try_acquire_lock_impl(MDL_request*, MDL_ticket**)+0xf5) [0x7f96eb753995]
/usr/sbin/mysqld(MDL_context::acquire_lock(MDL_request*, double)+0x30) [0x7f96eb753e40]
/usr/sbin/mysqld(MDL_context::upgrade_shared_lock(MDL_ticket*, enum_mdl_type, double)+0xb8) [0x7f96eb7547e8]
/usr/sbin/mysqld(mysql_alter_table(THD*, char*, char*, HA_CREATE_INFO*, TABLE_LIST*, Alter_info*, unsigned int, st_order*, bool)+0x2081) [0x7f96eb70c7d1]
/usr/sbin/mysqld(Sql_cmd_alter_table::execute(THD*)+0x5d7) [0x7f96eb74e817]
/usr/sbin/mysqld(mysql_execute_command(THD*)+0x140f) [0x7f96eb684a4f]
/usr/sbin/mysqld(mysql_parse(THD*, char*, unsigned int, Parser_state*)+0x28d) [0x7f96eb68c8fd]
/usr/sbin/mysqld(Query_log_event::do_apply_event(rpl_group_info*, char const*, unsigned int)+0x1208) [0x7f96eb8bdea8]
/usr/sbin/mysqld(wsrep_apply_cb(void*, void const*, unsigned long, unsigned int, wsrep_trx_meta const*)+0x36a) [0x7f96eb7a509a]
/usr/lib64/galera/libgalera_smm.so(galera::TrxHandle::apply(void*, wsrep_cb_status (*)(void*, void const*, unsigned long, unsigned int, wsrep_trx_meta const*), wsrep_trx_meta const&) const+0xd3) [0x7f96e5d10063]
/usr/lib64/galera/libgalera_smm.so(+0x21d1c3) [0x7f96e5d471c3]
/usr/lib64/galera/libgalera_smm.so(galera::ReplicatorSMM::apply_trx(void*, galera::TrxHandle*)+0xa4) [0x7f96e5d49074]
/usr/lib64/galera/libgalera_smm.so(galera::ReplicatorSMM::process_trx(void*, galera::TrxHandle*)+0x40) [0x7f96e5d4a090]
/usr/lib64/galera/libgalera_smm.so(galera::GcsActionSource::dispatch(void*, gcs_action const&, bool&)+0x3f3) [0x7f96e5d2adf3]
/usr/lib64/galera/libgalera_smm.so(galera::GcsActionSource::process(void*, bool&)+0x5b) [0x7f96e5d2c1bb]
/usr/lib64/galera/libgalera_smm.so(galera::ReplicatorSMM::async_recv(void*)+0x6d) [0x7f96e5d4be5d]
/usr/lib64/galera/libgalera_smm.so(galera_recv+0x23) [0x7f96e5d5cc23]
/usr/sbin/mysqld(+0x57a172) [0x7f96eb7a6172]
/usr/sbin/mysqld(start_wsrep_THD+0x403) [0x7f96eb797733]
/lib64/libpthread.so.0(+0x3062a079d1) [0x7f96eadf49d1]
/lib64/libc.so.6(clone+0x6d) [0x7f96e92da9dd]
 
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x7f67b9464715): is an invalid pointer
Connection ID (thread ID): 8
Status: NOT_KILLED
 
Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on
 
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.



 Comments   
Comment by Hartmut Holzgraefe [ 2016-09-07 ]

Unlike in MDEV-9416 there were no prior messages in the error log right before the crash, the most recent message was 20 minutes earlier where another ADD PARTITION failed as a partition by that name already existed. Similar errors had happened before so I don't think this is related to the crash ...

Comment by Jan Lindström (Inactive) [ 2016-09-09 ]

http://lists.askmonty.org/pipermail/commits/2016-September/009836.html

But maybe you should do proper fix for this as I do not follow why trans->ha_list should contain
NULL handlertons. You have already talked with serg and if he says that this is fundamentally broken.

Comment by Nirbhay Choubey (Inactive) [ 2016-09-19 ]

jplindst serg: I think we'd be better off using the way 10.0-galera has it (given there is only one engine that supports galera atm) :

ha_wsrep_abort_transaction(...) {
...
  handlerton *hton= installed_htons[DB_TYPE_INNODB];
  if (hton && hton->wsrep_abort_transaction)
  {
    hton->wsrep_abort_transaction(hton, bf_thd, victim_thd, signal);
  } 
  else 
  {
    WSREP_WARN("cannot abort InnoDB transaction");
  }
...
}

Generated at Thu Feb 08 07:44:46 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.