Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-36493

Atomic CREATE OR REPLACE ... SELECT blocks InnoDB purge

Details

    Description

      As marko noticed and tests confirm, the atomic CREATE OR REPLACE ... SELECT blocks InnoDB purge for the whole duration of the query. I don't know if there is such a requirement that it mustn't, but as I understand, marko considers it to be a problem, so I'm filing it.

      Here is a demo in MTR format. It shouldn't go to the regression suite of course.

      --source include/have_sequence.inc
      --source include/have_innodb.inc
       
      --connect (con1,localhost,root,,)
      # This is to create some activity worth purging
      --delimiter $
      CREATE TABLE x (id INT PRIMARY KEY, val TEXT) ENGINE=InnoDB $
      --send
        LOOP
          START TRANSACTION;
          INSERT INTO x VALUES (1, 'abc');
          DELETE FROM x WHERE id = 1;
          COMMIT;
        END LOOP $
      --delimiter ;
       
      --connection default
      CREATE TABLE cor (a INT) ENGINE=InnoDB;
      --send CREATE OR REPLACE TABLE cor ENGINE=InnoDB AS SELECT seq FROM seq_1_to_10000000
       
      --connect (con2,localhost,root,,)
      # This is a monitor
      --disable_query_log
      while (1)
      {
        --let $status= `show engine innodb status`
        --eval SELECT SUBSTRING("$status", LOCATE("Purge done for trx's n:o", "$status"), 85) AS st
        --sleep 1
      }
      

      bb-main-monty 6cb6e8285c50a0bbcb2b599535b8d02b7bba5256

      Purge done for trx's n:o < 32 undo n:o < 1 state: running
      History list length 4936
      st
      Purge done for trx's n:o < 32 undo n:o < 1 state: running
      History list length 8240
      ...
      ...
      ...
      Purge done for trx's n:o < 32 undo n:o < 1 state: running
      History list length 532453
      

      The above is with the default drop_before_create_or_replace=OFF. With drop_before_create_or_replace=ON the purge keeps happening.

      Attachments

        Issue Links

          Activity

            marko Marko Mäkelä added a comment - - edited

            The underlying problem is that there are no metadata locks acquired on intermediate table names. In MariaDB Server 10.6, it the third part of MDEV-25506, a work-around had been implemented:

            dict_table_t::mdl_name: The original table name for the purpose of acquiring MDL in purge, to prevent a race condition between a DDL transaction that is dropping a table, and purge processing undo log records of DML that had executed before the DDL operation. For #sql-backup- tables during ALTER TABLE...ALGORITHM=COPY, the dict_table_t::mdl_name will differ from dict_table_t::name.

            In MDEV-36122 this work-around turned out to be insufficient. For ALTER IGNORE TABLE, InnoDB would write row-level undo log records while not knowing the final table name.

            I believe that the correct fix of this problem is to remove dict_table_t::mdl_name altogether, by doing the following:

            1. Ensure that all ha_innobase::rename_table() are covered by MDL_EXCLUSIVE on both the source and target names. Add debug assertions to prove this.
            2. Ensure that all ha_innobase::delete_table() are covered by MDL_EXCLUSIVE.
            3. In trx_purge_table_open(), acquire MDL on the same names as those that would be acquired by the SQL layer for invoking handler::rename_table() or handler::delete_table().

            Note that the InnoDB dict_table_t::name is in the filename-safe encoding, while the MDL names currently are in the SQL encoding. Possibly, the MDL names should be something like #mysql50##sql-backup… for these "filename-unsafe" intermediate names.

            I can fix the trx_purge_table_open() part and remove the dict_table_t::mdl_name, once the other parts have been implemented.

            The problem here is that there are purgeable entries for the old copy of the table (before CREATE OR REPLACE TABLE…SELECT). The purge is blocked and using 100% of one CPU thread because the dict_table_t::mdl_name is using the same name as the being-rebuilt table, which is exclusively locked. If we just acquired MDL on the actual table name, then there would be no exclusive lock on the old table name except when the tables are being renamed at the start, as well as renamed or dropped at end of the execution of CREATE OR REPLACE TABLE. The old table could either be dropped or renamed back in the end, depending on whether the operation succeeded. By proper use of MDL, the purge of history would only be blocked for a very short period at the start and end of the CREATE OR REPLACE TABLE statement.

            marko Marko Mäkelä added a comment - - edited The underlying problem is that there are no metadata locks acquired on intermediate table names. In MariaDB Server 10.6, it the third part of MDEV-25506 , a work-around had been implemented: dict_table_t::mdl_name : The original table name for the purpose of acquiring MDL in purge, to prevent a race condition between a DDL transaction that is dropping a table, and purge processing undo log records of DML that had executed before the DDL operation. For #sql-backup- tables during ALTER TABLE...ALGORITHM=COPY , the dict_table_t::mdl_name will differ from dict_table_t::name . In MDEV-36122 this work-around turned out to be insufficient. For ALTER IGNORE TABLE , InnoDB would write row-level undo log records while not knowing the final table name. I believe that the correct fix of this problem is to remove dict_table_t::mdl_name altogether, by doing the following: Ensure that all ha_innobase::rename_table() are covered by MDL_EXCLUSIVE on both the source and target names. Add debug assertions to prove this. Ensure that all ha_innobase::delete_table() are covered by MDL_EXCLUSIVE . In trx_purge_table_open() , acquire MDL on the same names as those that would be acquired by the SQL layer for invoking handler::rename_table() or handler::delete_table() . Note that the InnoDB dict_table_t::name is in the filename-safe encoding, while the MDL names currently are in the SQL encoding. Possibly, the MDL names should be something like #mysql50##sql-backup… for these "filename-unsafe" intermediate names. I can fix the trx_purge_table_open() part and remove the dict_table_t::mdl_name , once the other parts have been implemented. The problem here is that there are purgeable entries for the old copy of the table (before CREATE OR REPLACE TABLE…SELECT ). The purge is blocked and using 100% of one CPU thread because the dict_table_t::mdl_name is using the same name as the being-rebuilt table, which is exclusively locked. If we just acquired MDL on the actual table name, then there would be no exclusive lock on the old table name except when the tables are being renamed at the start, as well as renamed or dropped at end of the execution of CREATE OR REPLACE TABLE . The old table could either be dropped or renamed back in the end, depending on whether the operation succeeded. By proper use of MDL, the purge of history would only be blocked for a very short period at the start and end of the CREATE OR REPLACE TABLE statement.

            People

              monty Michael Widenius
              elenst Elena Stepanova
              Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.