[MDEV-9416] MariaDB galera got signal 11 when altering table add unique index Created: 2016-01-14  Updated: 2018-05-16  Resolved: 2016-09-29

Status: Closed
Project: MariaDB Server
Component/s: Data Definition - Alter Table
Affects Version/s: 10.1.9
Fix Version/s: 10.1.18

Type: Bug Priority: Critical
Reporter: Chengrui Wang Assignee: Sergei Golubchik
Resolution: Fixed Votes: 1
Labels: replication
Environment:

Server version: 10.1.9-MariaDB-log MariaDB Server
CentOS 6.6
2.6.32-504.el6.x86_64 #1 SMP Wed Oct 15 04:27:16 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

3 nodes of mariadb galera cluster


Issue Links:
Duplicate
duplicates MDEV-11147 Crash during ALTER TABLE DROP PARTITION Closed
duplicates MDEV-11266 MariaDB crashes every other day Closed
is duplicated by MDEV-10768 Galera segfault on transaction abort ... Closed
is duplicated by MDEV-10835 Denial Of Service - Crash Any Galera ... Closed
Relates
relates to MDEV-10732 buffer overflow detected - mariadb cr... Closed
Sprint: 10.1.18

 Description   

Updated at 2016/1/15
From the stack trace and the code, it seems wsrep thread is trying to abort a QUERY_COMMITING?
I'm not sure what does that mean, from the literal, wsrep is aborting a committing tran? So what's happened?
Does the alter table fail cause this?(current node fail, but another node unique index created), Or
alter table add unique index has been succefully commited but not flushed to disk yet, then wsrep
failed to abort but hit signal 11 that caused the current node remains unchange?

wsrep_grant_mdl_exception()
... ...
    else if (granted_thd->wsrep_query_state == QUERY_COMMITTING)
    {
      WSREP_DEBUG("MDL granted, but committing thd abort scheduled");
      ticket->wsrep_report(wsrep_debug);
      mysql_mutex_unlock(&granted_thd->LOCK_wsrep_thd);
      wsrep_abort_thd((void*)request_thd, (void*)granted_thd, 1);
      ret = FALSE;
ha_abort_transaction()
... ...
  THD_TRANS *trans= &victim_thd->transaction.all;
  Ha_trx_info *ha_info= trans->ha_list, *ha_info_next;
 
  for (; ha_info; ha_info= ha_info_next)
  {
    handlerton *hton= ha_info->ht();
    if (!hton->abort_transaction)               // <--- signal 11 should occur here

I assume we are trying to abort the wsrep thread for this alter table command replication.
It seems we don't regist the corrent storage engine info info the transaction
==============================================================================
In an online enviroment, I need to modify the index type to be unique, so I fire an alter
command

ALTER TABLE `ht_user_loginfo`
	DROP INDEX `Index_uniqueid`,
	ADD UNIQUE INDEX `Index_uniqueid` (`uniqueid`)

to my mariadb galera cluster 10.1.9 with 3 nodes (two data nodes and one arbitrator node),
unfortunately, the mariadb server crashed, from the errorlog,I can see
it htis signal 11,
Then I check another data node of cluster, I found the unique index is created successfully,
while current node iis failed.

The status showed below is captured after I recovered the cluster from the crash.

MariaDB [(none)]> show status like 'wsrep%';
+------------------------------+-----------------------------------------------------+
| Variable_name                | Value                                               |
+------------------------------+-----------------------------------------------------+
| wsrep_apply_oooe             | 0.695295                                            |
| wsrep_apply_oool             | 0.001338                                            |
| wsrep_apply_window           | 7.136138                                            |
| wsrep_causal_reads           | 0                                                   |
| wsrep_cert_deps_distance     | 116.807631                                          |
| wsrep_cert_index_size        | 85                                                  |
| wsrep_cert_interval          | 8.518146                                            |
| wsrep_cluster_conf_id        | 3                                                   |
| wsrep_cluster_size           | 3                                                   |
| wsrep_cluster_state_uuid     | 41c2f17c-b8dd-11e5-a2f0-d29d6194f4fd                |
| wsrep_cluster_status         | Primary                                             |
| wsrep_commit_oooe            | 0.000000                                            |
| wsrep_commit_oool            | 0.000000                                            |
| wsrep_commit_window          | 6.429097                                            |
| wsrep_connected              | ON                                                  |
| wsrep_evs_delayed            |                                                     |
| wsrep_evs_evict_list         |                                                     |
| wsrep_evs_repl_latency       | 0.000330851/0.000559662/0.00195228/0.000179164/1643 |
| wsrep_evs_state              | OPERATIONAL                                         |
| wsrep_flow_control_paused    | 0.009547                                            |
| wsrep_flow_control_paused_ns | 1819437628239                                       |
| wsrep_flow_control_recv      | 534776                                              |
| wsrep_flow_control_sent      | 0                                                   |
| wsrep_gcomm_uuid             | c6e2e2f9-ad43-11e5-874e-e29556fbb355                |
| wsrep_incoming_addresses     | 192.168.6.101:3306,192.168.6.21:3306,               |
| wsrep_last_committed         | 63491248                                            |
| wsrep_local_bf_aborts        | 0                                                   |
| wsrep_local_cached_downto    | 63263077                                            |
| wsrep_local_cert_failures    | 13                                                  |
| wsrep_local_commits          | 63491216                                            |
| wsrep_local_index            | 1                                                   |
| wsrep_local_recv_queue       | 0                                                   |
| wsrep_local_recv_queue_avg   | 0.005204                                            |
| wsrep_local_recv_queue_max   | 4                                                   |
| wsrep_local_recv_queue_min   | 0                                                   |
| wsrep_local_replays          | 0                                                   |
| wsrep_local_send_queue       | 0                                                   |
| wsrep_local_send_queue_avg   | 1.841741                                            |
| wsrep_local_send_queue_max   | 62                                                  |
| wsrep_local_send_queue_min   | 0                                                   |
| wsrep_local_state            | 4                                                   |
| wsrep_local_state_comment    | Synced                                              |
| wsrep_local_state_uuid       | 41c2f17c-b8dd-11e5-a2f0-d29d6194f4fd                |
| wsrep_protocol_version       | 7                                                   |
| wsrep_provider_name          | Galera                                              |
| wsrep_provider_vendor        | Codership Oy <info@codership.com>                   |
| wsrep_provider_version       | 25.3.9(r3385)                                       |
| wsrep_ready                  | ON                                                  |
| wsrep_received               | 620329                                              |
| wsrep_received_bytes         | 4963695                                             |
| wsrep_repl_data_bytes        | 31427131711                                         |
| wsrep_repl_keys              | 272911424                                           |
| wsrep_repl_keys_bytes        | 3643668604                                          |
| wsrep_repl_other_bytes       | 0                                                   |
| wsrep_replicated             | 63491247                                            |
| wsrep_replicated_bytes       | 39134240123                                         |
| wsrep_thread_count           | 2                                                   |
+------------------------------+-----------------------------------------------------+

below is the errorlog

2016-01-12  9:35:24 140579879008000 [Warning] IP address '168.8.44.132' could not be resolved: Temporary failure in name resolution
2016-01-12 11:29:15 140579879008000 [Note] InnoDB: Online DDL : Start
2016-01-12 11:29:15 140579879008000 [Note] InnoDB: Online DDL : Start reading clustered index of the table and create temporary files
2016-01-12 11:29:16 140579879008000 [Note] InnoDB: Online DDL : End of reading clustered index of the table and create temporary files
2016-01-12 11:29:16 140579879008000 [Note] InnoDB: Online DDL : Start merge-sorting index `Index_uniqueid`--temporary-- (1 / 1), estimated cost : 20.0000
2016-01-12 11:29:16 140579879008000 [Note] InnoDB: Online DDL : merge-sorting has estimated 32 runs
2016-01-12 11:29:16 140579879008000 [Note] InnoDB: Online DDL : merge-sorting current run 1 estimated 32 runs
2016-01-12 11:29:16 140579879008000 [Note] InnoDB: Online DDL : merge-sorting current run 2 estimated 16 runs
2016-01-12 11:29:16 140579879008000 [Note] InnoDB: Online DDL : merge-sorting current run 3 estimated 8 runs
2016-01-12 11:29:16 140579879008000 [Note] InnoDB: Online DDL : merge-sorting current run 4 estimated 4 runs
2016-01-12 11:29:16 140579879008000 [Note] InnoDB: Online DDL : merge-sorting current run 5 estimated 2 runs
2016-01-12 11:29:16 140579879008000 [Note] InnoDB: Online DDL : End of  merge-sorting index `Index_uniqueid`--temporary-- (1 / 1)
2016-01-12 11:29:16 140579879008000 [Note] InnoDB: Online DDL : Start building index `Index_uniqueid`--temporary-- (1 / 1), estimated cost : 30.0000
2016-01-12 11:29:17 140579879008000 [Note] InnoDB: Online DDL : End of building index `Index_uniqueid`--temporary-- (1 / 1)
2016-01-12 11:29:17 140579879008000 [Note] InnoDB: Online DDL : Start applying row log
2016-01-12 11:29:17 140579879008000 [Note] InnoDB: Online DDL : End of applying row log
2016-01-12 11:29:17 140579879008000 [Note] InnoDB: Online DDL : Completed
2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
 
160112 11:29:17 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see http://kb.askmonty.org/en/reporting-bugs
 
Server version: 10.1.9-MariaDB-log
key_buffer_size=67108864
read_buffer_size=1048576
max_used_connections=171
max_threads=3002
thread_count=152
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 27793254 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
 
Thread pointer: 0x0x7fdb78869008
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7fdb4dc02dd8 thread_stack 0x48000
mysys/stacktrace.c:247(my_print_stacktrace)[0xbf822e]
sql/signal_handler.cc:160(handle_fatal_signal)[0x760e24]
/lib64/libpthread.so.0[0x35c500f710]
sql/handler.cc:6084(ha_abort_transaction(THD*, THD*, char))[0x7637ca]
sql/wsrep_thd.cc:603(wsrep_abort_thd(void*, void*, char))[0x6ff448]
sql/wsrep_mysqld.cc:1732(wsrep_grant_mdl_exception(MDL_context*, MDL_ticket*, MDL_key const*))[0x6f2423]
sql/mdl.cc:1173(MDL_lock::Ticket_list::add_ticket(MDL_ticket*))[0x6928cf]
psi/mysql_thread.h:662(inline_mysql_mutex_lock)[0x695580]
sql/mdl.cc:2241(MDL_context::upgrade_shared_lock(MDL_ticket*, enum_mdl_type, double))[0x695c94]
sql/sql_base.cc:1882(wait_while_table_is_used)[0x5612c7]
sql/sql_table.cc:7068(mysql_inplace_alter_table)[0x637846]
sql/sql_table.cc:8874(mysql_alter_table(THD*, char*, char*, HA_CREATE_INFO*, TABLE_LIST*, Alter_info*, unsigned int, st_order*, bool))[0x63fb0a]
sql/sql_alter.cc:327(Sql_cmd_alter_table::execute(THD*))[0x68dff3]
sql/sql_parse.cc:5677(mysql_execute_command(THD*))[0x5afa9c]
sql/sql_parse.cc:7302(mysql_parse(THD*, char*, unsigned int, Parser_state*))[0x5b7ed9]
sql/sql_parse.cc:7126(wsrep_mysql_parse)[0x5b7f68]
sql/sql_parse.cc:1489(dispatch_command(enum_server_command, THD*, char*, unsigned int))[0x5ba035]
sql/sql_parse.cc:1111(do_command(THD*))[0x5bb2f7]
sql/sql_connect.cc:1349(do_handle_one_connection(THD*))[0x689883]
sql/sql_connect.cc:1263(handle_one_connection)[0x689aa2]
perfschema/pfs.cc:1863(pfs_spawn_thread)[0xafa679]
 
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x7fdb690c1020): is an invalid pointer
Connection ID (thread ID): 5105
Status: NOT_KILLED
 
Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on
 
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
160112 11:29:19 mysqld_safe Number of processes running now: 0
160112 11:29:19 mysqld_safe WSREP: not restarting wsrep node automatically
160112 11:29:19 mysqld_safe mysqld from pid file /mysqldata/3306/mysqld.pid ended



 Comments   
Comment by Chengrui Wang [ 2016-05-06 ]

Hit this problem again on our online system

2016-05-05 17:34:22 139671025166080 [Warning] WSREP: Cannot abort transaction.
2016-05-05 17:34:22 139671025166080 [Warning] WSREP: Cannot abort transaction.
160505 17:34:22 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see http://kb.askmonty.org/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.

Server version: 10.1.9-MariaDB-log
key_buffer_size=67108864
read_buffer_size=1048576
max_used_connections=366
max_threads=3002
thread_count=108
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 27793254 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0x7f0742dcf008
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7f07b1d71dd8 thread_stack 0x48000
mysys/stacktrace.c:247(my_print_stacktrace)[0xbf822e]
sql/signal_handler.cc:160(handle_fatal_signal)[0x760e24]
/lib64/libpthread.so.0[0x3ec1c0f710]
sql/handler.cc:6084(ha_abort_transaction(THD*, THD*, char))[0x7637ca]
sql/wsrep_thd.cc:603(wsrep_abort_thd(void*, void*, char))[0x6ff448]
sql/wsrep_mysqld.cc:1732(wsrep_grant_mdl_exception(MDL_context*, MDL_ticket*, MDL_key const*))[0x6f2423]
sql/mdl.cc:1173(MDL_lock::Ticket_list::add_ticket(MDL_ticket*))[0x6928cf]
psi/mysql_thread.h:662(inline_mysql_mutex_lock)[0x695580]
sql/mdl.cc:2241(MDL_context::upgrade_shared_lock(MDL_ticket*, enum_mdl_type, double))[0x695c94]
sql/sql_base.cc:1882(wait_while_table_is_used)[0x5612c7]
sql/sql_table.cc:7068(mysql_inplace_alter_table)[0x637846]
sql/sql_table.cc:8874(mysql_alter_table(THD*, char*, char*, HA_CREATE_INFO*, TABLE_LIST*, Alter_info*, unsigned int, st_order*, bool))[0x63fb0a]
sql/sql_alter.cc:327(Sql_cmd_alter_table::execute(THD*))[0x68dff3]
sql/sql_parse.cc:5677(mysql_execute_command(THD*))[0x5afa9c]
sql/sql_parse.cc:7302(mysql_parse(THD*, char*, unsigned int, Parser_state*))[0x5b7ed9]
sql/sql_parse.cc:7126(wsrep_mysql_parse)[0x5b7f68]
sql/sql_parse.cc:1489(dispatch_command(enum_server_command, THD*, char*, unsigned int))[0x5ba035]
sql/sql_parse.cc:1111(do_command(THD*))[0x5bb2f7]
sql/sql_connect.cc:1349(do_handle_one_connection(THD*))[0x689883]
sql/sql_connect.cc:1263(handle_one_connection)[0x689aa2]
perfschema/pfs.cc:1863(pfs_spawn_thread)[0xafa679]
/lib64/libpthread.so.0[0x3ec1c079d1]
/lib64/libc.so.6(clone+0x6d)[0x3ec18e89dd]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Some pointers may be invalid and cause the dump to abort.
Query (0x7f078fa10020): is an invalid pointer
Connection ID (thread ID): 162514
Status: NOT_KILLED

Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on

The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
160505 17:34:24 mysqld_safe Number of processes running now: 0
160505 17:34:24 mysqld_safe WSREP: not restarting wsrep node automatically
160505 17:34:24 mysqld_safe mysqld from pid file /mysqldata/3306/mysqld.pid ended

Comment by Hartmut Holzgraefe [ 2016-09-08 ]

Crash seems to be due to a an invalid handlerton pointer on line 6084:

    6069 int ha_abort_transaction(THD *bf_thd, THD *victim_thd, my_bool signal)
    6070 {
    6071   DBUG_ENTER("ha_abort_transaction");
    6072   if (!WSREP(bf_thd) &&
    6073       !(bf_thd->variables.wsrep_OSU_method == WSREP_OSU_RSU &&
    6074         bf_thd->wsrep_exec_mode == TOTAL_ORDER)) {
    6075     DBUG_RETURN(0);
    6076   }
    6077 
    6078   THD_TRANS *trans= &victim_thd->transaction.all;
    6079   Ha_trx_info *ha_info= trans->ha_list, *ha_info_next;
    6080 
    6081   for (; ha_info; ha_info= ha_info_next)
    6082   {
    6083     handlerton *hton= ha_info->ht();
>>> 6084     if (!hton->abort_transaction)
    6085     {
    6086       /* Skip warning for binlog SE */
    6087       if (hton->db_type != DB_TYPE_BINLOG)
    6088       {
    6089         WSREP_WARN("Cannot abort transaction.");
    6090       }
    6091     }
    6092     else
    6093     {
    6094       hton->abort_transaction(hton, bf_thd, victim_thd, signal);
    6095     }
    6096     ha_info_next= ha_info->next();
    6097   }  
    6098   DBUG_RETURN(0);
    6099 }

Comment by Nirbhay Choubey (Inactive) [ 2016-09-24 ]

http://lists.askmonty.org/pipermail/commits/2016-September/009914.html

Generated at Thu Feb 08 07:34:30 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.