Details
-
Bug
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Fixed
-
10.1.9
-
Server version: 10.1.9-MariaDB-log MariaDB Server
CentOS 6.6
2.6.32-504.el6.x86_64 #1 SMP Wed Oct 15 04:27:16 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
3 nodes of mariadb galera cluster
-
10.1.18
Description
Updated at 2016/1/15
From the stack trace and the code, it seems wsrep thread is trying to abort a QUERY_COMMITING?
I'm not sure what does that mean, from the literal, wsrep is aborting a committing tran? So what's happened?
Does the alter table fail cause this?(current node fail, but another node unique index created), Or
alter table add unique index has been succefully commited but not flushed to disk yet, then wsrep
failed to abort but hit signal 11 that caused the current node remains unchange?
wsrep_grant_mdl_exception()
|
... ...
|
else if (granted_thd->wsrep_query_state == QUERY_COMMITTING) |
{
|
WSREP_DEBUG("MDL granted, but committing thd abort scheduled"); |
ticket->wsrep_report(wsrep_debug);
|
mysql_mutex_unlock(&granted_thd->LOCK_wsrep_thd);
|
wsrep_abort_thd((void*)request_thd, (void*)granted_thd, 1); |
ret = FALSE;
|
ha_abort_transaction()
|
... ...
|
THD_TRANS *trans= &victim_thd->transaction.all;
|
Ha_trx_info *ha_info= trans->ha_list, *ha_info_next;
|
|
for (; ha_info; ha_info= ha_info_next) |
{
|
handlerton *hton= ha_info->ht();
|
if (!hton->abort_transaction) // <--- signal 11 should occur here |
I assume we are trying to abort the wsrep thread for this alter table command replication.
It seems we don't regist the corrent storage engine info info the transaction
==============================================================================
In an online enviroment, I need to modify the index type to be unique, so I fire an alter
command
ALTER TABLE `ht_user_loginfo` |
DROP INDEX `Index_uniqueid`, |
ADD UNIQUE INDEX `Index_uniqueid` (`uniqueid`) |
to my mariadb galera cluster 10.1.9 with 3 nodes (two data nodes and one arbitrator node),
unfortunately, the mariadb server crashed, from the errorlog,I can see
it htis signal 11,
Then I check another data node of cluster, I found the unique index is created successfully,
while current node iis failed.
The status showed below is captured after I recovered the cluster from the crash.
MariaDB [(none)]> show status like 'wsrep%';
|
+------------------------------+-----------------------------------------------------+
|
| Variable_name | Value |
|
+------------------------------+-----------------------------------------------------+
|
| wsrep_apply_oooe | 0.695295 |
|
| wsrep_apply_oool | 0.001338 |
|
| wsrep_apply_window | 7.136138 |
|
| wsrep_causal_reads | 0 |
|
| wsrep_cert_deps_distance | 116.807631 |
|
| wsrep_cert_index_size | 85 |
|
| wsrep_cert_interval | 8.518146 |
|
| wsrep_cluster_conf_id | 3 |
|
| wsrep_cluster_size | 3 |
|
| wsrep_cluster_state_uuid | 41c2f17c-b8dd-11e5-a2f0-d29d6194f4fd |
|
| wsrep_cluster_status | Primary |
|
| wsrep_commit_oooe | 0.000000 |
|
| wsrep_commit_oool | 0.000000 |
|
| wsrep_commit_window | 6.429097 |
|
| wsrep_connected | ON |
|
| wsrep_evs_delayed | |
|
| wsrep_evs_evict_list | |
|
| wsrep_evs_repl_latency | 0.000330851/0.000559662/0.00195228/0.000179164/1643 |
|
| wsrep_evs_state | OPERATIONAL |
|
| wsrep_flow_control_paused | 0.009547 |
|
| wsrep_flow_control_paused_ns | 1819437628239 |
|
| wsrep_flow_control_recv | 534776 |
|
| wsrep_flow_control_sent | 0 |
|
| wsrep_gcomm_uuid | c6e2e2f9-ad43-11e5-874e-e29556fbb355 |
|
| wsrep_incoming_addresses | 192.168.6.101:3306,192.168.6.21:3306, |
|
| wsrep_last_committed | 63491248 |
|
| wsrep_local_bf_aborts | 0 |
|
| wsrep_local_cached_downto | 63263077 |
|
| wsrep_local_cert_failures | 13 |
|
| wsrep_local_commits | 63491216 |
|
| wsrep_local_index | 1 |
|
| wsrep_local_recv_queue | 0 |
|
| wsrep_local_recv_queue_avg | 0.005204 |
|
| wsrep_local_recv_queue_max | 4 |
|
| wsrep_local_recv_queue_min | 0 |
|
| wsrep_local_replays | 0 |
|
| wsrep_local_send_queue | 0 |
|
| wsrep_local_send_queue_avg | 1.841741 |
|
| wsrep_local_send_queue_max | 62 |
|
| wsrep_local_send_queue_min | 0 |
|
| wsrep_local_state | 4 |
|
| wsrep_local_state_comment | Synced |
|
| wsrep_local_state_uuid | 41c2f17c-b8dd-11e5-a2f0-d29d6194f4fd |
|
| wsrep_protocol_version | 7 |
|
| wsrep_provider_name | Galera |
|
| wsrep_provider_vendor | Codership Oy <info@codership.com> |
|
| wsrep_provider_version | 25.3.9(r3385) |
|
| wsrep_ready | ON |
|
| wsrep_received | 620329 |
|
| wsrep_received_bytes | 4963695 |
|
| wsrep_repl_data_bytes | 31427131711 |
|
| wsrep_repl_keys | 272911424 |
|
| wsrep_repl_keys_bytes | 3643668604 |
|
| wsrep_repl_other_bytes | 0 |
|
| wsrep_replicated | 63491247 |
|
| wsrep_replicated_bytes | 39134240123 |
|
| wsrep_thread_count | 2 |
|
+------------------------------+-----------------------------------------------------+
|
below is the errorlog
2016-01-12 9:35:24 140579879008000 [Warning] IP address '168.8.44.132' could not be resolved: Temporary failure in name resolution
|
2016-01-12 11:29:15 140579879008000 [Note] InnoDB: Online DDL : Start
|
2016-01-12 11:29:15 140579879008000 [Note] InnoDB: Online DDL : Start reading clustered index of the table and create temporary files
|
2016-01-12 11:29:16 140579879008000 [Note] InnoDB: Online DDL : End of reading clustered index of the table and create temporary files
|
2016-01-12 11:29:16 140579879008000 [Note] InnoDB: Online DDL : Start merge-sorting index `Index_uniqueid`--temporary-- (1 / 1), estimated cost : 20.0000
|
2016-01-12 11:29:16 140579879008000 [Note] InnoDB: Online DDL : merge-sorting has estimated 32 runs
|
2016-01-12 11:29:16 140579879008000 [Note] InnoDB: Online DDL : merge-sorting current run 1 estimated 32 runs
|
2016-01-12 11:29:16 140579879008000 [Note] InnoDB: Online DDL : merge-sorting current run 2 estimated 16 runs
|
2016-01-12 11:29:16 140579879008000 [Note] InnoDB: Online DDL : merge-sorting current run 3 estimated 8 runs
|
2016-01-12 11:29:16 140579879008000 [Note] InnoDB: Online DDL : merge-sorting current run 4 estimated 4 runs
|
2016-01-12 11:29:16 140579879008000 [Note] InnoDB: Online DDL : merge-sorting current run 5 estimated 2 runs
|
2016-01-12 11:29:16 140579879008000 [Note] InnoDB: Online DDL : End of merge-sorting index `Index_uniqueid`--temporary-- (1 / 1)
|
2016-01-12 11:29:16 140579879008000 [Note] InnoDB: Online DDL : Start building index `Index_uniqueid`--temporary-- (1 / 1), estimated cost : 30.0000
|
2016-01-12 11:29:17 140579879008000 [Note] InnoDB: Online DDL : End of building index `Index_uniqueid`--temporary-- (1 / 1)
|
2016-01-12 11:29:17 140579879008000 [Note] InnoDB: Online DDL : Start applying row log
|
2016-01-12 11:29:17 140579879008000 [Note] InnoDB: Online DDL : End of applying row log
|
2016-01-12 11:29:17 140579879008000 [Note] InnoDB: Online DDL : Completed
|
2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
|
2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
|
2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
|
2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
|
2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
|
2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
|
2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
|
2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
|
2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
|
2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
|
2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
|
2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
|
2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
|
2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
|
2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
|
2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
|
2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
|
2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
|
2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
|
2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
|
2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
|
|
160112 11:29:17 [ERROR] mysqld got signal 11 ;
|
This could be because you hit a bug. It is also possible that this binary
|
or one of the libraries it was linked against is corrupt, improperly built,
|
or misconfigured. This error can also be caused by malfunctioning hardware.
|
|
To report this bug, see http://kb.askmonty.org/en/reporting-bugs
|
|
Server version: 10.1.9-MariaDB-log
|
key_buffer_size=67108864
|
read_buffer_size=1048576
|
max_used_connections=171
|
max_threads=3002
|
thread_count=152
|
It is possible that mysqld could use up to
|
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 27793254 K bytes of memory
|
Hope that's ok; if not, decrease some variables in the equation.
|
|
|
Thread pointer: 0x0x7fdb78869008
|
Attempting backtrace. You can use the following information to find out
|
where mysqld died. If you see no messages after this, something went
|
terribly wrong...
|
stack_bottom = 0x7fdb4dc02dd8 thread_stack 0x48000
|
mysys/stacktrace.c:247(my_print_stacktrace)[0xbf822e]
|
sql/signal_handler.cc:160(handle_fatal_signal)[0x760e24]
|
/lib64/libpthread.so.0[0x35c500f710]
|
sql/handler.cc:6084(ha_abort_transaction(THD*, THD*, char))[0x7637ca]
|
sql/wsrep_thd.cc:603(wsrep_abort_thd(void*, void*, char))[0x6ff448]
|
sql/wsrep_mysqld.cc:1732(wsrep_grant_mdl_exception(MDL_context*, MDL_ticket*, MDL_key const*))[0x6f2423]
|
sql/mdl.cc:1173(MDL_lock::Ticket_list::add_ticket(MDL_ticket*))[0x6928cf]
|
psi/mysql_thread.h:662(inline_mysql_mutex_lock)[0x695580]
|
sql/mdl.cc:2241(MDL_context::upgrade_shared_lock(MDL_ticket*, enum_mdl_type, double))[0x695c94]
|
sql/sql_base.cc:1882(wait_while_table_is_used)[0x5612c7]
|
sql/sql_table.cc:7068(mysql_inplace_alter_table)[0x637846]
|
sql/sql_table.cc:8874(mysql_alter_table(THD*, char*, char*, HA_CREATE_INFO*, TABLE_LIST*, Alter_info*, unsigned int, st_order*, bool))[0x63fb0a]
|
sql/sql_alter.cc:327(Sql_cmd_alter_table::execute(THD*))[0x68dff3]
|
sql/sql_parse.cc:5677(mysql_execute_command(THD*))[0x5afa9c]
|
sql/sql_parse.cc:7302(mysql_parse(THD*, char*, unsigned int, Parser_state*))[0x5b7ed9]
|
sql/sql_parse.cc:7126(wsrep_mysql_parse)[0x5b7f68]
|
sql/sql_parse.cc:1489(dispatch_command(enum_server_command, THD*, char*, unsigned int))[0x5ba035]
|
sql/sql_parse.cc:1111(do_command(THD*))[0x5bb2f7]
|
sql/sql_connect.cc:1349(do_handle_one_connection(THD*))[0x689883]
|
sql/sql_connect.cc:1263(handle_one_connection)[0x689aa2]
|
perfschema/pfs.cc:1863(pfs_spawn_thread)[0xafa679]
|
|
Trying to get some variables.
|
Some pointers may be invalid and cause the dump to abort.
|
Query (0x7fdb690c1020): is an invalid pointer
|
Connection ID (thread ID): 5105
|
Status: NOT_KILLED
|
|
Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on
|
|
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
|
information that should help you find out what is causing the crash.
|
160112 11:29:19 mysqld_safe Number of processes running now: 0
|
160112 11:29:19 mysqld_safe WSREP: not restarting wsrep node automatically
|
160112 11:29:19 mysqld_safe mysqld from pid file /mysqldata/3306/mysqld.pid ended
|
Attachments
Issue Links
- duplicates
-
MDEV-11147 Crash during ALTER TABLE DROP PARTITION
-
- Closed
-
-
MDEV-11266 MariaDB crashes every other day
-
- Closed
-
- is duplicated by
-
MDEV-10768 Galera segfault on transaction abort after failed wsrep_grant_mdl_exception()
-
- Closed
-
-
MDEV-10835 Denial Of Service - Crash Any Galera Node - Race Condition: CREATE TABLE IF NOT EXISTS / INSERT
-
- Closed
-
- relates to
-
MDEV-10732 buffer overflow detected - mariadb crashing regularly
-
- Closed
-
- links to
Hit this problem again on our online system
2016-05-05 17:34:22 139671025166080 [Warning] WSREP: Cannot abort transaction.
2016-05-05 17:34:22 139671025166080 [Warning] WSREP: Cannot abort transaction.
160505 17:34:22 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
To report this bug, see http://kb.askmonty.org/en/reporting-bugs
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Server version: 10.1.9-MariaDB-log
key_buffer_size=67108864
read_buffer_size=1048576
max_used_connections=366
max_threads=3002
thread_count=108
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 27793254 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
Thread pointer: 0x0x7f0742dcf008
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7f07b1d71dd8 thread_stack 0x48000
mysys/stacktrace.c:247(my_print_stacktrace)[0xbf822e]
sql/signal_handler.cc:160(handle_fatal_signal)[0x760e24]
/lib64/libpthread.so.0[0x3ec1c0f710]
sql/handler.cc:6084(ha_abort_transaction(THD*, THD*, char))[0x7637ca]
sql/wsrep_thd.cc:603(wsrep_abort_thd(void*, void*, char))[0x6ff448]
sql/wsrep_mysqld.cc:1732(wsrep_grant_mdl_exception(MDL_context*, MDL_ticket*, MDL_key const*))[0x6f2423]
sql/mdl.cc:1173(MDL_lock::Ticket_list::add_ticket(MDL_ticket*))[0x6928cf]
psi/mysql_thread.h:662(inline_mysql_mutex_lock)[0x695580]
sql/mdl.cc:2241(MDL_context::upgrade_shared_lock(MDL_ticket*, enum_mdl_type, double))[0x695c94]
sql/sql_base.cc:1882(wait_while_table_is_used)[0x5612c7]
sql/sql_table.cc:7068(mysql_inplace_alter_table)[0x637846]
sql/sql_table.cc:8874(mysql_alter_table(THD*, char*, char*, HA_CREATE_INFO*, TABLE_LIST*, Alter_info*, unsigned int, st_order*, bool))[0x63fb0a]
sql/sql_alter.cc:327(Sql_cmd_alter_table::execute(THD*))[0x68dff3]
sql/sql_parse.cc:5677(mysql_execute_command(THD*))[0x5afa9c]
sql/sql_parse.cc:7302(mysql_parse(THD*, char*, unsigned int, Parser_state*))[0x5b7ed9]
sql/sql_parse.cc:7126(wsrep_mysql_parse)[0x5b7f68]
sql/sql_parse.cc:1489(dispatch_command(enum_server_command, THD*, char*, unsigned int))[0x5ba035]
sql/sql_parse.cc:1111(do_command(THD*))[0x5bb2f7]
sql/sql_connect.cc:1349(do_handle_one_connection(THD*))[0x689883]
sql/sql_connect.cc:1263(handle_one_connection)[0x689aa2]
perfschema/pfs.cc:1863(pfs_spawn_thread)[0xafa679]
/lib64/libpthread.so.0[0x3ec1c079d1]
/lib64/libc.so.6(clone+0x6d)[0x3ec18e89dd]
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Some pointers may be invalid and cause the dump to abort.
Query (0x7f078fa10020): is an invalid pointer
Connection ID (thread ID): 162514
Status: NOT_KILLED
Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
160505 17:34:24 mysqld_safe Number of processes running now: 0
160505 17:34:24 mysqld_safe WSREP: not restarting wsrep node automatically
160505 17:34:24 mysqld_safe mysqld from pid file /mysqldata/3306/mysqld.pid ended