Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-17813

Crash in instant ALTER TABLE due to purge concurrently emptying table

    XMLWordPrintable

Details

    Description

      The following deterministic test case reproduces a problem that was reported in MDEV-17721:

      --source include/have_innodb.inc
      --source include/have_debug.inc
      --source include/have_debug_sync.inc
       
      SET @saved_frequency = @@GLOBAL.innodb_purge_rseg_truncate_frequency;
      SET GLOBAL innodb_purge_rseg_truncate_frequency = 1;
       
      CREATE TABLE t1 (f2 INT) ENGINE=InnoDB;
      INSERT INTO t1 SET f2=1;
      ALTER TABLE t1 ADD COLUMN f1 INT;
       
      connect (purge_control,localhost,root);
      START TRANSACTION WITH CONSISTENT SNAPSHOT;
       
      connection default;
      DELETE FROM t1;
       
      SET DEBUG_SYNC='innodb_commit_inplace_alter_table_enter SIGNAL go WAIT_FOR do';
      send ALTER TABLE t1 ADD COLUMN f3 INT;
       
      connection purge_control;
      SET DEBUG_SYNC='now WAIT_FOR go';
      COMMIT;
      --source include/wait_all_purged.inc
      SET DEBUG_SYNC='now SIGNAL do';
      disconnect purge_control;
       
      connection default;
      reap;
      SET DEBUG_SYNC=RESET;
      DROP TABLE t1;
      SET GLOBAL innodb_purge_rseg_truncate_frequency = @saved_frequency;
      

      This does not crash in 10.3, so MDEV-11369 does not seem to be affected.
      The original test did not do ALTER TABLE t1 ADD COLUMN f3 INT but the following, which would only be instantaneous after MDEV-15562:

      send ALTER TABLE t1 DROP f2;
      

      The crash looks like this:

      10.4 46a411088c5abbacda4e52e89f4dbcf52a451f7a

      mysqld: /mariadb/10.4/storage/innobase/handler/handler0alter.cc:338: void dict_index_t::instant_add_field(const dict_index_t&): Assertion `instant.n_core_fields == n_core_fields' failed.
      ...
      #4  0x0000555556186653 in dict_index_t::instant_add_field (
          this=0x7fffa80238d8, instant=...)
          at /mariadb/10.4/storage/innobase/handler/handler0alter.cc:338
      #5  0x0000555556187a6d in dict_table_t::instant_column (this=0x7fffa8020348, 
          table=..., col_map=0x7fffa8035de8)
          at /mariadb/10.4/storage/innobase/handler/handler0alter.cc:487
      #6  0x0000555556170908 in innobase_instant_try (ha_alter_info=0x7ffff41f2b60, 
          ctx=0x7fffa80176b0, altered_table=0x7fffa8036bd8, table=0x7fffa8025e78, 
          trx=0x7ffff4cec300)
          at /mariadb/10.4/storage/innobase/handler/handler0alter.cc:5294
      

      In order to fix this, we must rebuild the ctx->instant_table metadata in innobase_instant_try() if ctx->old_table->is_instant() != ctx->old_was_instant. And while doing that, we must already be holding an exclusive latch on the leftmost leaf page, so that purge cannot empty the table. This implies that the following code has to be moved earlier in the function:

      	mtr_t mtr;
      	mtr.start();
      	index->set_modified(mtr);
      	btr_pcur_t pcur;
      	btr_pcur_open_at_index_side(true, index, BTR_MODIFY_TREE, &pcur, true,
      				    0, &mtr);
      

      Furthermore, there appears to be a possible race condition earlier during the ALTER TABLE. Both during instant_alter_column_possible() and ctx->prepare_instant(), a purge thread could invoke dict_index_t::clear_instant_add() when the table becomes empty. That should not be a problem in MDEV-11369 in 10.3, but it is serious in 10.4, because index fields in the current dict_table_t object can be removed or permuted. It ought to be possible to prevent this by holding index->lock in S mode, because that should prevent modifications of the root page, which is the page that needs to be modified when the table becomes empty.

      Because of partitioned tables, I think that each inplace_alter function in ha_innobase must acquire and release index->lock independently. I do not see any way to avoid rebuilding the the ctx->instant_table metadata.

      Attachments

        Issue Links

          Activity

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.