Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-9416

MariaDB galera got signal 11 when altering table add unique index

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 10.1.9
    • Fix Version/s: 10.1.18
    • Labels:
    • Environment:
      Server version: 10.1.9-MariaDB-log MariaDB Server
      CentOS 6.6
      2.6.32-504.el6.x86_64 #1 SMP Wed Oct 15 04:27:16 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

      3 nodes of mariadb galera cluster
    • Sprint:
      10.1.18

      Description

      Updated at 2016/1/15
      From the stack trace and the code, it seems wsrep thread is trying to abort a QUERY_COMMITING?
      I'm not sure what does that mean, from the literal, wsrep is aborting a committing tran? So what's happened?
      Does the alter table fail cause this?(current node fail, but another node unique index created), Or
      alter table add unique index has been succefully commited but not flushed to disk yet, then wsrep
      failed to abort but hit signal 11 that caused the current node remains unchange?

      wsrep_grant_mdl_exception()
      ... ...
          else if (granted_thd->wsrep_query_state == QUERY_COMMITTING)
          {
            WSREP_DEBUG("MDL granted, but committing thd abort scheduled");
            ticket->wsrep_report(wsrep_debug);
            mysql_mutex_unlock(&granted_thd->LOCK_wsrep_thd);
            wsrep_abort_thd((void*)request_thd, (void*)granted_thd, 1);
            ret = FALSE;
      ha_abort_transaction()
      ... ...
        THD_TRANS *trans= &victim_thd->transaction.all;
        Ha_trx_info *ha_info= trans->ha_list, *ha_info_next;
       
        for (; ha_info; ha_info= ha_info_next)
        {
          handlerton *hton= ha_info->ht();
          if (!hton->abort_transaction)               // <--- signal 11 should occur here
      

      I assume we are trying to abort the wsrep thread for this alter table command replication.
      It seems we don't regist the corrent storage engine info info the transaction
      ==============================================================================
      In an online enviroment, I need to modify the index type to be unique, so I fire an alter
      command

      ALTER TABLE `ht_user_loginfo`
      	DROP INDEX `Index_uniqueid`,
      	ADD UNIQUE INDEX `Index_uniqueid` (`uniqueid`)
      

      to my mariadb galera cluster 10.1.9 with 3 nodes (two data nodes and one arbitrator node),
      unfortunately, the mariadb server crashed, from the errorlog,I can see
      it htis signal 11,
      Then I check another data node of cluster, I found the unique index is created successfully,
      while current node iis failed.

      The status showed below is captured after I recovered the cluster from the crash.

      MariaDB [(none)]> show status like 'wsrep%';
      +------------------------------+-----------------------------------------------------+
      | Variable_name                | Value                                               |
      +------------------------------+-----------------------------------------------------+
      | wsrep_apply_oooe             | 0.695295                                            |
      | wsrep_apply_oool             | 0.001338                                            |
      | wsrep_apply_window           | 7.136138                                            |
      | wsrep_causal_reads           | 0                                                   |
      | wsrep_cert_deps_distance     | 116.807631                                          |
      | wsrep_cert_index_size        | 85                                                  |
      | wsrep_cert_interval          | 8.518146                                            |
      | wsrep_cluster_conf_id        | 3                                                   |
      | wsrep_cluster_size           | 3                                                   |
      | wsrep_cluster_state_uuid     | 41c2f17c-b8dd-11e5-a2f0-d29d6194f4fd                |
      | wsrep_cluster_status         | Primary                                             |
      | wsrep_commit_oooe            | 0.000000                                            |
      | wsrep_commit_oool            | 0.000000                                            |
      | wsrep_commit_window          | 6.429097                                            |
      | wsrep_connected              | ON                                                  |
      | wsrep_evs_delayed            |                                                     |
      | wsrep_evs_evict_list         |                                                     |
      | wsrep_evs_repl_latency       | 0.000330851/0.000559662/0.00195228/0.000179164/1643 |
      | wsrep_evs_state              | OPERATIONAL                                         |
      | wsrep_flow_control_paused    | 0.009547                                            |
      | wsrep_flow_control_paused_ns | 1819437628239                                       |
      | wsrep_flow_control_recv      | 534776                                              |
      | wsrep_flow_control_sent      | 0                                                   |
      | wsrep_gcomm_uuid             | c6e2e2f9-ad43-11e5-874e-e29556fbb355                |
      | wsrep_incoming_addresses     | 192.168.6.101:3306,192.168.6.21:3306,               |
      | wsrep_last_committed         | 63491248                                            |
      | wsrep_local_bf_aborts        | 0                                                   |
      | wsrep_local_cached_downto    | 63263077                                            |
      | wsrep_local_cert_failures    | 13                                                  |
      | wsrep_local_commits          | 63491216                                            |
      | wsrep_local_index            | 1                                                   |
      | wsrep_local_recv_queue       | 0                                                   |
      | wsrep_local_recv_queue_avg   | 0.005204                                            |
      | wsrep_local_recv_queue_max   | 4                                                   |
      | wsrep_local_recv_queue_min   | 0                                                   |
      | wsrep_local_replays          | 0                                                   |
      | wsrep_local_send_queue       | 0                                                   |
      | wsrep_local_send_queue_avg   | 1.841741                                            |
      | wsrep_local_send_queue_max   | 62                                                  |
      | wsrep_local_send_queue_min   | 0                                                   |
      | wsrep_local_state            | 4                                                   |
      | wsrep_local_state_comment    | Synced                                              |
      | wsrep_local_state_uuid       | 41c2f17c-b8dd-11e5-a2f0-d29d6194f4fd                |
      | wsrep_protocol_version       | 7                                                   |
      | wsrep_provider_name          | Galera                                              |
      | wsrep_provider_vendor        | Codership Oy <info@codership.com>                   |
      | wsrep_provider_version       | 25.3.9(r3385)                                       |
      | wsrep_ready                  | ON                                                  |
      | wsrep_received               | 620329                                              |
      | wsrep_received_bytes         | 4963695                                             |
      | wsrep_repl_data_bytes        | 31427131711                                         |
      | wsrep_repl_keys              | 272911424                                           |
      | wsrep_repl_keys_bytes        | 3643668604                                          |
      | wsrep_repl_other_bytes       | 0                                                   |
      | wsrep_replicated             | 63491247                                            |
      | wsrep_replicated_bytes       | 39134240123                                         |
      | wsrep_thread_count           | 2                                                   |
      +------------------------------+-----------------------------------------------------+
      

      below is the errorlog

      2016-01-12  9:35:24 140579879008000 [Warning] IP address '168.8.44.132' could not be resolved: Temporary failure in name resolution
      2016-01-12 11:29:15 140579879008000 [Note] InnoDB: Online DDL : Start
      2016-01-12 11:29:15 140579879008000 [Note] InnoDB: Online DDL : Start reading clustered index of the table and create temporary files
      2016-01-12 11:29:16 140579879008000 [Note] InnoDB: Online DDL : End of reading clustered index of the table and create temporary files
      2016-01-12 11:29:16 140579879008000 [Note] InnoDB: Online DDL : Start merge-sorting index `Index_uniqueid`--temporary-- (1 / 1), estimated cost : 20.0000
      2016-01-12 11:29:16 140579879008000 [Note] InnoDB: Online DDL : merge-sorting has estimated 32 runs
      2016-01-12 11:29:16 140579879008000 [Note] InnoDB: Online DDL : merge-sorting current run 1 estimated 32 runs
      2016-01-12 11:29:16 140579879008000 [Note] InnoDB: Online DDL : merge-sorting current run 2 estimated 16 runs
      2016-01-12 11:29:16 140579879008000 [Note] InnoDB: Online DDL : merge-sorting current run 3 estimated 8 runs
      2016-01-12 11:29:16 140579879008000 [Note] InnoDB: Online DDL : merge-sorting current run 4 estimated 4 runs
      2016-01-12 11:29:16 140579879008000 [Note] InnoDB: Online DDL : merge-sorting current run 5 estimated 2 runs
      2016-01-12 11:29:16 140579879008000 [Note] InnoDB: Online DDL : End of  merge-sorting index `Index_uniqueid`--temporary-- (1 / 1)
      2016-01-12 11:29:16 140579879008000 [Note] InnoDB: Online DDL : Start building index `Index_uniqueid`--temporary-- (1 / 1), estimated cost : 30.0000
      2016-01-12 11:29:17 140579879008000 [Note] InnoDB: Online DDL : End of building index `Index_uniqueid`--temporary-- (1 / 1)
      2016-01-12 11:29:17 140579879008000 [Note] InnoDB: Online DDL : Start applying row log
      2016-01-12 11:29:17 140579879008000 [Note] InnoDB: Online DDL : End of applying row log
      2016-01-12 11:29:17 140579879008000 [Note] InnoDB: Online DDL : Completed
      2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
      2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
      2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
      2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
      2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
      2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
      2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
      2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
      2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
      2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
      2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
      2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
      2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
      2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
      2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
      2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
      2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
      2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
      2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
      2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
      2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
       
      160112 11:29:17 [ERROR] mysqld got signal 11 ;
      This could be because you hit a bug. It is also possible that this binary
      or one of the libraries it was linked against is corrupt, improperly built,
      or misconfigured. This error can also be caused by malfunctioning hardware.
       
      To report this bug, see http://kb.askmonty.org/en/reporting-bugs
       
      Server version: 10.1.9-MariaDB-log
      key_buffer_size=67108864
      read_buffer_size=1048576
      max_used_connections=171
      max_threads=3002
      thread_count=152
      It is possible that mysqld could use up to
      key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 27793254 K  bytes of memory
      Hope that's ok; if not, decrease some variables in the equation.
       
       
      Thread pointer: 0x0x7fdb78869008
      Attempting backtrace. You can use the following information to find out
      where mysqld died. If you see no messages after this, something went
      terribly wrong...
      stack_bottom = 0x7fdb4dc02dd8 thread_stack 0x48000
      mysys/stacktrace.c:247(my_print_stacktrace)[0xbf822e]
      sql/signal_handler.cc:160(handle_fatal_signal)[0x760e24]
      /lib64/libpthread.so.0[0x35c500f710]
      sql/handler.cc:6084(ha_abort_transaction(THD*, THD*, char))[0x7637ca]
      sql/wsrep_thd.cc:603(wsrep_abort_thd(void*, void*, char))[0x6ff448]
      sql/wsrep_mysqld.cc:1732(wsrep_grant_mdl_exception(MDL_context*, MDL_ticket*, MDL_key const*))[0x6f2423]
      sql/mdl.cc:1173(MDL_lock::Ticket_list::add_ticket(MDL_ticket*))[0x6928cf]
      psi/mysql_thread.h:662(inline_mysql_mutex_lock)[0x695580]
      sql/mdl.cc:2241(MDL_context::upgrade_shared_lock(MDL_ticket*, enum_mdl_type, double))[0x695c94]
      sql/sql_base.cc:1882(wait_while_table_is_used)[0x5612c7]
      sql/sql_table.cc:7068(mysql_inplace_alter_table)[0x637846]
      sql/sql_table.cc:8874(mysql_alter_table(THD*, char*, char*, HA_CREATE_INFO*, TABLE_LIST*, Alter_info*, unsigned int, st_order*, bool))[0x63fb0a]
      sql/sql_alter.cc:327(Sql_cmd_alter_table::execute(THD*))[0x68dff3]
      sql/sql_parse.cc:5677(mysql_execute_command(THD*))[0x5afa9c]
      sql/sql_parse.cc:7302(mysql_parse(THD*, char*, unsigned int, Parser_state*))[0x5b7ed9]
      sql/sql_parse.cc:7126(wsrep_mysql_parse)[0x5b7f68]
      sql/sql_parse.cc:1489(dispatch_command(enum_server_command, THD*, char*, unsigned int))[0x5ba035]
      sql/sql_parse.cc:1111(do_command(THD*))[0x5bb2f7]
      sql/sql_connect.cc:1349(do_handle_one_connection(THD*))[0x689883]
      sql/sql_connect.cc:1263(handle_one_connection)[0x689aa2]
      perfschema/pfs.cc:1863(pfs_spawn_thread)[0xafa679]
       
      Trying to get some variables.
      Some pointers may be invalid and cause the dump to abort.
      Query (0x7fdb690c1020): is an invalid pointer
      Connection ID (thread ID): 5105
      Status: NOT_KILLED
       
      Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on
       
      The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
      information that should help you find out what is causing the crash.
      160112 11:29:19 mysqld_safe Number of processes running now: 0
      160112 11:29:19 mysqld_safe WSREP: not restarting wsrep node automatically
      160112 11:29:19 mysqld_safe mysqld from pid file /mysqldata/3306/mysqld.pid ended
      

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              serg Sergei Golubchik
              Reporter:
              wangcr Chengrui Wang
              Votes:
              1 Vote for this issue
              Watchers:
              8 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: