Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-5901

EITS: killing the server leaves statistical tables in "marked as crashed" state

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • 10.0.9
    • 10.0.10
    • None

    Description

      If one does the following sequence of operations

      • make some action that updates statistical tables (e.g. ANALYZE TABLE ... PERSISTENT FOR ALL).
      • kill the server
      • start the server again

      then any action that attempts read from EITS tables will not be able to open the tables anymore. Opening the table will fail with "table marked as crashed" error.

      This task is about making EITS tables more resilient to the scenario.

      There are two things to be done:
      1. Flush statistical table to disk as soon as we've made any modification (similar to what is done to mysql.proc)
      2. Enable auto-repair for statistical tables, like it happens with regular myisam tables.

      Attachments

        Activity

          Hint from Monty: check out the code in sp.cc:

              if (table->file->ha_write_row(table->record[0]))
                ret= SP_WRITE_ROW_FAILED;
              /* Make change permanent and avoid 'table is marked as crashed' errors */
              table->file->extra(HA_EXTRA_FLUSH);

          Note the HA_EXTRA_FLUSH call. We will need to add it to EITS tables.

          psergei Sergei Petrunia added a comment - Hint from Monty: check out the code in sp.cc: if (table->file->ha_write_row(table->record[0])) ret= SP_WRITE_ROW_FAILED; /* Make change permanent and avoid 'table is marked as crashed' errors */ table->file->extra(HA_EXTRA_FLUSH); Note the HA_EXTRA_FLUSH call. We will need to add it to EITS tables.

          I'm also trying to investigate what is needed for auto-repair.

          1. Auto-repair doesn't work for mysql.proc table.
          I run a CREATE PROCEDURE ... , and kill the server right after ha_myisam::write_row(). If I restart the server and attempt to use mysql.proc again, I get:

          mysql> create procedure p4() begin select now(); end //
          ERROR 145 (HY000): Table './mysql/proc' is marked as crashed and should be repaired

          2. Auto-repair does work for regular tables.

          mysql> insert into t21 values (2);
          ERROR 2013 (HY000): Lost connection to MySQL server during query  
          ^^ -- I intentionally kill the server
           
          mysql> select * from t21;
          ERROR 2006 (HY000): MySQL server has gone away
          No connection. Trying to reconnect...
          Connection id:    3
          Current database: test
           
          +------+
          | a    |
          +------+
          |    1 |
          |    2 |
          +------+
          2 rows in set, 7 warnings (18 min 49.03 sec)
           
          mysql> show warnings\G
          Message: Table './test/t21' is marked as crashed and should be repaired
          Message: Table 't21' is marked as crashed and should be repaired
          Message: 1 client is using or hasn't closed the table properly
          Message: Size of datafile is: 14       Should be: 7
          Message: Record-count is not ok; is 2   Should be: 1
          Message: Found 2 key parts. Should be: 1
          Message: Number of rows changed from 1 to 2
          7 rows in set (0.00 sec)

          Code-wise, auto-repair happens in open_table() and open_tables(). In open_table, there is this code:

                else if (share->crashed)
                  (void) ot_ctx->request_backoff_action(Open_table_context::OT_REPAIR,
                                                        table_list);

          and open_tables() has:

                error= open_and_process_table(thd, thd->lex, tables, counter,
                                              flags, prelocking_strategy,
                                              has_prelocking_list, &ot_ctx,
                                              &new_frm_mem);
           
                if (error)
                {
                  if (ot_ctx.can_recover_from_failed_open())

          psergei Sergei Petrunia added a comment - I'm also trying to investigate what is needed for auto-repair. 1. Auto-repair doesn't work for mysql.proc table. I run a CREATE PROCEDURE ... , and kill the server right after ha_myisam::write_row(). If I restart the server and attempt to use mysql.proc again, I get: mysql> create procedure p4() begin select now(); end // ERROR 145 (HY000): Table './mysql/proc' is marked as crashed and should be repaired 2. Auto-repair does work for regular tables. mysql> insert into t21 values (2); ERROR 2013 (HY000): Lost connection to MySQL server during query ^^ -- I intentionally kill the server   mysql> select * from t21; ERROR 2006 (HY000): MySQL server has gone away No connection. Trying to reconnect... Connection id: 3 Current database: test   +------+ | a | +------+ | 1 | | 2 | +------+ 2 rows in set, 7 warnings (18 min 49.03 sec)   mysql> show warnings\G Message: Table './test/t21' is marked as crashed and should be repaired Message: Table 't21' is marked as crashed and should be repaired Message: 1 client is using or hasn't closed the table properly Message: Size of datafile is: 14 Should be: 7 Message: Record-count is not ok; is 2 Should be: 1 Message: Found 2 key parts. Should be: 1 Message: Number of rows changed from 1 to 2 7 rows in set (0.00 sec) Code-wise, auto-repair happens in open_table() and open_tables(). In open_table, there is this code: else if (share->crashed) (void) ot_ctx->request_backoff_action(Open_table_context::OT_REPAIR, table_list); and open_tables() has: error= open_and_process_table(thd, thd->lex, tables, counter, flags, prelocking_strategy, has_prelocking_list, &ot_ctx, &new_frm_mem);   if (error) { if (ot_ctx.can_recover_from_failed_open())

          When I try debugging a failure to open a statistical table, I see a difference in this call:

          Open_table_context::request_backoff_action (this=0x7ffff7e9ede0, action_arg=Open_table_context::OT_REPAIR,

          Here,

          (gdb) print action_arg
          $312 = Open_table_context::OT_REPAIR
          (gdb) print m_has_locks
          $313 = true

          and because of that we don't take any action.

          psergei Sergei Petrunia added a comment - When I try debugging a failure to open a statistical table, I see a difference in this call: Open_table_context::request_backoff_action (this=0x7ffff7e9ede0, action_arg=Open_table_context::OT_REPAIR, Here, (gdb) print action_arg $312 = Open_table_context::OT_REPAIR (gdb) print m_has_locks $313 = true and because of that we don't take any action.

          The reason is that open_and_lock_tables() is structured like this:

            if (open_tables(thd, &tables, &counter, flags, prelocking_strategy))
              goto err;
            ...
            if (lock_tables(thd, tables, counter, flags))
              goto err;
           
            (void) read_statistics_for_tables_if_needed(thd, tables);

          Statistical tables are opened after the regular tables have been opened and locked (I'm wondering why can't we open them at the same time?). Because of that, deadlock prevention logic prevents repair.

          psergei Sergei Petrunia added a comment - The reason is that open_and_lock_tables() is structured like this: if (open_tables(thd, &tables, &counter, flags, prelocking_strategy)) goto err; ... if (lock_tables(thd, tables, counter, flags)) goto err;   (void) read_statistics_for_tables_if_needed(thd, tables); Statistical tables are opened after the regular tables have been opened and locked (I'm wondering why can't we open them at the same time?). Because of that, deadlock prevention logic prevents repair.

          If I force execution in Open_table_context::request_backoff_action to allow repair, then I get an assertion

          thd->mdl_context.is_lock_owner(MDL_key::TABLE, table->s->db.str, table->s->table_name.str, MDL_SHARED)

          in close_thread_table() for table test.t10.

          psergei Sergei Petrunia added a comment - If I force execution in Open_table_context::request_backoff_action to allow repair, then I get an assertion thd->mdl_context.is_lock_owner(MDL_key::TABLE, table->s->db.str, table->s->table_name.str, MDL_SHARED) in close_thread_table() for table test.t10.

          It seems, auto-repair (item#2) is difficult to do. I will only implement flushing (item#1), for now.

          psergei Sergei Petrunia added a comment - It seems, auto-repair (item#2) is difficult to do. I will only implement flushing (item#1), for now.

          People

            psergei Sergei Petrunia
            psergei Sergei Petrunia
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.