Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-29566

Failed to read from the .par file upon concurrent DDL/SELECT with partition pruning

Details

    Description

      The test case is non-deterministic, run with --repeat=N. It fails for me (without rr) within a few attempts on various build types, but it can vary on different machines. The test is rr-able with --rr=-h, although it takes much longer, set repeat=N to a high value for rr.

      --source include/have_innodb.inc
      --source include/have_partition.inc
       
      --connect (con1,localhost,root,,test)
      CREATE TABLE t1 (pk INT PRIMARY KEY) ENGINE=InnoDB PARTITION BY HASH (pk) PARTITIONS 4;
      CREATE TABLE t2 (pk INT PRIMARY KEY) ENGINE=InnoDB;
      SET SESSION INNODB_LOCK_WAIT_TIMEOUT= 0;
      --send
        DROP TABLE t1;
      --connection default
      DROP TABLE t2;
      --error 0,ER_NO_SUCH_TABLE
      SELECT * FROM t1 PARTITION (p3);
       
      # Cleanup
      --connection con1
      --reap
      --disconnect con1
      

      10.6 5e270ca2

      mysqltest: At line 13: query 'SELECT * FROM t1 PARTITION (p3)' failed with wrong errno ER_FAILED_READ_FROM_PAR_FILE (1696): 'Failed to read from the .par file', instead of  (0)...
      

      The failure started happening (or, if it existed before, its probability hugely increased) after this commit in 10.6.8:

      commit 8840583a92243f6ac543689148ca79c85fa0a09d
      Author: Marko Mäkelä
      Date:   Fri Mar 18 10:52:08 2022 +0200
       
          MDEV-27909 InnoDB: Failing assertion: state == TRX_STATE_NOT_STARTED ... on DDL
      

      Attachments

        Issue Links

          Activity

            I failed to reproduce this under rr, but I reproduced it easily without rr. The following change would fix this for me:

            diff --git a/storage/innobase/handler/ha_innodb.cc b/storage/innobase/handler/ha_innodb.cc
            index 515f66d6ba3..4c506998741 100644
            --- a/storage/innobase/handler/ha_innodb.cc
            +++ b/storage/innobase/handler/ha_innodb.cc
            @@ -13613,7 +13613,8 @@ int ha_innobase::delete_table(const char *name)
                   dict_sys.unfreeze();
                 }
             
            -    const bool skip_wait{table->name.is_temporary()};
            +    const bool skip_wait{table->name.is_temporary() ||
            +                         dict_table_is_partition(table)};
             
                 if (table_stats && index_stats &&
                     !strcmp(table_stats->name.m_name, TABLE_STATS_NAME) &&
            

            Even if I revert the above and add STATS_PERSISTENT=0 to one of the table definitions, then the test will not fail either.

            If we were to apply this code change, then an existing test case could be extended to cover partitioned tables:

            diff --git a/mysql-test/suite/innodb/t/innodb_stats_drop_locked.test b/mysql-test/suite/innodb/t/innodb_stats_drop_locked.test
            index 6532816bb37..dc67cd41de6 100644
            --- a/mysql-test/suite/innodb/t/innodb_stats_drop_locked.test
            +++ b/mysql-test/suite/innodb/t/innodb_stats_drop_locked.test
            @@ -4,12 +4,16 @@
             #
             
             -- source include/have_innodb.inc
            +-- source include/have_partition.inc
             
             CREATE DATABASE unlocked;
             CREATE TABLE unlocked.t1(a INT PRIMARY KEY) ENGINE=INNODB STATS_PERSISTENT=0;
             CREATE DATABASE locked;
             CREATE TABLE locked.t1(a INT PRIMARY KEY) ENGINE=INNODB STATS_PERSISTENT=1;
             
            +CREATE TABLE locked.t1p(pk INT PRIMARY KEY) ENGINE=InnoDB STATS_PERSISTENT=1
            +PARTITION BY HASH (pk) PARTITIONS 4;
            +
             CREATE TABLE innodb_stats_drop_locked (c INT, KEY c_key (c))
             ENGINE=INNODB STATS_PERSISTENT=1;
             ANALYZE TABLE innodb_stats_drop_locked;
            @@ -35,14 +39,26 @@ SHOW CREATE TABLE innodb_stats_drop_locked;
             DROP TABLE innodb_stats_drop_locked;
             
             DROP DATABASE unlocked;
            +
            +# Partitions will always be dropped despite locking conflicts.
            +DROP TABLE locked.t1p;
            +--error ER_NO_SUCH_TABLE
            +SELECT * FROM locked.t1p;
            +
             --error ER_LOCK_WAIT_TIMEOUT
             DROP DATABASE locked;
             -- disconnect con1
             -- connection default
             COMMIT;
             
            +SELECT COUNT(*) FROM mysql.innodb_table_stats WHERE database_name='locked';
            +SELECT COUNT(*) FROM mysql.innodb_index_stats WHERE database_name='locked';
            +
             DROP DATABASE locked;
             
            +SELECT table_name FROM mysql.innodb_table_stats WHERE database_name='locked';
            +SELECT table_name FROM mysql.innodb_index_stats WHERE database_name='locked';
            +
             # the stats should be there
             
             SELECT table_name FROM mysql.innodb_table_stats
            

            I am not convinced that this really is the correct fix. Dropping a partition could fail for other reasons too, and ha_partition::delete_table() should be able to propagate the error to the caller, instead of blindly ignoring it.

            Based on my quick tests, it appears that ha_partition would correctly handle the failure to drop the very first partition. If the first partition was successfully dropped, it will just assume that the rest will succeed. Even worse, there is no API that would request the storage engine to drop multiple partitions atomically. If some partitions were successfully dropped and some not, we are sort-of lost, because there is no way to reach a consistent state.

            marko Marko Mäkelä added a comment - I failed to reproduce this under rr , but I reproduced it easily without rr . The following change would fix this for me: diff --git a/storage/innobase/handler/ha_innodb.cc b/storage/innobase/handler/ha_innodb.cc index 515f66d6ba3..4c506998741 100644 --- a/storage/innobase/handler/ha_innodb.cc +++ b/storage/innobase/handler/ha_innodb.cc @@ -13613,7 +13613,8 @@ int ha_innobase::delete_table(const char *name) dict_sys.unfreeze(); } - const bool skip_wait{table->name.is_temporary()}; + const bool skip_wait{table->name.is_temporary() || + dict_table_is_partition(table)}; if (table_stats && index_stats && !strcmp(table_stats->name.m_name, TABLE_STATS_NAME) && Even if I revert the above and add STATS_PERSISTENT=0 to one of the table definitions, then the test will not fail either. If we were to apply this code change, then an existing test case could be extended to cover partitioned tables: diff --git a/mysql-test/suite/innodb/t/innodb_stats_drop_locked.test b/mysql-test/suite/innodb/t/innodb_stats_drop_locked.test index 6532816bb37..dc67cd41de6 100644 --- a/mysql-test/suite/innodb/t/innodb_stats_drop_locked.test +++ b/mysql-test/suite/innodb/t/innodb_stats_drop_locked.test @@ -4,12 +4,16 @@ # -- source include/have_innodb.inc +-- source include/have_partition.inc CREATE DATABASE unlocked; CREATE TABLE unlocked.t1(a INT PRIMARY KEY) ENGINE=INNODB STATS_PERSISTENT=0; CREATE DATABASE locked; CREATE TABLE locked.t1(a INT PRIMARY KEY) ENGINE=INNODB STATS_PERSISTENT=1; +CREATE TABLE locked.t1p(pk INT PRIMARY KEY) ENGINE=InnoDB STATS_PERSISTENT=1 +PARTITION BY HASH (pk) PARTITIONS 4; + CREATE TABLE innodb_stats_drop_locked (c INT, KEY c_key (c)) ENGINE=INNODB STATS_PERSISTENT=1; ANALYZE TABLE innodb_stats_drop_locked; @@ -35,14 +39,26 @@ SHOW CREATE TABLE innodb_stats_drop_locked; DROP TABLE innodb_stats_drop_locked; DROP DATABASE unlocked; + +# Partitions will always be dropped despite locking conflicts. +DROP TABLE locked.t1p; +--error ER_NO_SUCH_TABLE +SELECT * FROM locked.t1p; + --error ER_LOCK_WAIT_TIMEOUT DROP DATABASE locked; -- disconnect con1 -- connection default COMMIT; +SELECT COUNT(*) FROM mysql.innodb_table_stats WHERE database_name='locked'; +SELECT COUNT(*) FROM mysql.innodb_index_stats WHERE database_name='locked'; + DROP DATABASE locked; +SELECT table_name FROM mysql.innodb_table_stats WHERE database_name='locked'; +SELECT table_name FROM mysql.innodb_index_stats WHERE database_name='locked'; + # the stats should be there SELECT table_name FROM mysql.innodb_table_stats I am not convinced that this really is the correct fix. Dropping a partition could fail for other reasons too, and ha_partition::delete_table() should be able to propagate the error to the caller, instead of blindly ignoring it. Based on my quick tests, it appears that ha_partition would correctly handle the failure to drop the very first partition. If the first partition was successfully dropped, it will just assume that the rest will succeed. Even worse, there is no API that would request the storage engine to drop multiple partitions atomically. If some partitions were successfully dropped and some not, we are sort-of lost, because there is no way to reach a consistent state.

            According to monty, handler::delete_table() is currently indeed assumed not to fail as part of ha_partition operations.

            Based on that, I will proceed with my above fix. That is, if acquiring an exclusive table lock on mysql.innodb_table_stats or mysql.innodb_index_stats fails, ha_innobase::delete_table() will proceed to remove the partition, without attempting to delete the corresponding records from those persistent statistics tables.

            marko Marko Mäkelä added a comment - According to monty , handler::delete_table() is currently indeed assumed not to fail as part of ha_partition operations. Based on that, I will proceed with my above fix. That is, if acquiring an exclusive table lock on mysql.innodb_table_stats or mysql.innodb_index_stats fails, ha_innobase::delete_table() will proceed to remove the partition, without attempting to delete the corresponding records from those persistent statistics tables.

            My fix seems to have introduced sporadic test failures:

            parts.part_supported_sql_func_innodb 'innodb' w3 [ fail ]
                    Test ended at 2022-09-26 14:07:46
             
            CURRENT_TEST: parts.part_supported_sql_func_innodb
            mysqltest: In included file "./suite/parts/inc/partition_supported_sql_funcs.inc": 
            included from ./suite/parts/inc/part_supported_sql_funcs_main.inc at line 133:
            included from /home/buildbot/maria-slave/fulltest-debug-big-only/build/mysql-test/suite/parts/t/part_supported_sql_func_innodb.test at line 44:
            At line 234: query 'alter table t66
            reorganize partition s1 into
            (partition p0 values less than ($valsqlfunc),
            partition p1 values less than maxvalue)' failed: ER_DUP_KEY (1022): Can't write; duplicate key in table 'mysql.innodb_table_stats'
            

            This could be worked around by disabling InnoDB persistent statistics in all tests of partitioned tables, or by setting innodb_stats_auto_recalc=OFF so that the background task of recomputing statistics would not compete for table locks with ha_innobase::delete_table().

            I think that we really need an API that would allow DDL operations on partitioned tables to be executed in an atomic transaction.

            marko Marko Mäkelä added a comment - My fix seems to have introduced sporadic test failures : parts.part_supported_sql_func_innodb 'innodb' w3 [ fail ] Test ended at 2022-09-26 14:07:46   CURRENT_TEST: parts.part_supported_sql_func_innodb mysqltest: In included file "./suite/parts/inc/partition_supported_sql_funcs.inc": included from ./suite/parts/inc/part_supported_sql_funcs_main.inc at line 133: included from /home/buildbot/maria-slave/fulltest-debug-big-only/build/mysql-test/suite/parts/t/part_supported_sql_func_innodb.test at line 44: At line 234: query 'alter table t66 reorganize partition s1 into (partition p0 values less than ($valsqlfunc), partition p1 values less than maxvalue)' failed: ER_DUP_KEY (1022): Can't write; duplicate key in table 'mysql.innodb_table_stats' This could be worked around by disabling InnoDB persistent statistics in all tests of partitioned tables, or by setting innodb_stats_auto_recalc=OFF so that the background task of recomputing statistics would not compete for table locks with ha_innobase::delete_table() . I think that we really need an API that would allow DDL operations on partitioned tables to be executed in an atomic transaction.

            I reverted the work-around because of the regression.

            I think that fixing this involves changing code outside InnoDB. Ideally, all partitioning ALTER TABLE operations should use a single atomic transaction inside the storage engine, so that in the test case of the Description, DROP TABLE t1 would be atomic, as implied or ‘promised’ by MDEV-17567.

            marko Marko Mäkelä added a comment - I reverted the work-around because of the regression. I think that fixing this involves changing code outside InnoDB. Ideally, all partitioning ALTER TABLE operations should use a single atomic transaction inside the storage engine, so that in the test case of the Description, DROP TABLE t1 would be atomic, as implied or ‘promised’ by MDEV-17567 .

            Some regression tests may also fail due to this bug. Here is one example from buildbot:

            10.6 bb29712b450525ba9e956033387361830361932f

            parts.partition_special_innodb 'innodb'  w6 [ fail ]
                    Test ended at 2022-11-29 17:28:21
             
            CURRENT_TEST: parts.partition_special_innodb
            mysqltest: At line 119: query 'DROP TABLE t1' failed: ER_LOCK_WAIT_TIMEOUT (1205): Lock wait timeout exceeded; try restarting transaction
            

            marko Marko Mäkelä added a comment - Some regression tests may also fail due to this bug. Here is one example from buildbot: 10.6 bb29712b450525ba9e956033387361830361932f parts.partition_special_innodb 'innodb' w6 [ fail ] Test ended at 2022-11-29 17:28:21   CURRENT_TEST: parts.partition_special_innodb mysqltest: At line 119: query 'DROP TABLE t1' failed: ER_LOCK_WAIT_TIMEOUT (1205): Lock wait timeout exceeded; try restarting transaction

            While testing MDEV-30100, mleich reproduced an rr replay trace for a case where DROP TABLE IF EXISTS n failed to drop the second partition of the table, and subsequent attempts to access the table would result in ER_FAILED_READ_FROM_PAR_FILE.

            marko Marko Mäkelä added a comment - While testing MDEV-30100 , mleich reproduced an rr replay trace for a case where DROP TABLE IF EXISTS n failed to drop the second partition of the table, and subsequent attempts to access the table would result in ER_FAILED_READ_FROM_PAR_FILE .

            The following failure that occasionally occurs in any branch since 10.6 could be related to InnoDB persistent statistics and this design problem:

            CURRENT_TEST: parts.partition_special_innodb
            mysqltest: At line 119: query 'DROP TABLE t1' failed: ER_LOCK_WAIT_TIMEOUT (1205): Lock wait timeout exceeded; try restarting transaction
            

            In MariaDB Server 10.0 to 10.5, the regression test disable InnoDB persistent statistics; see MDEV-4750.

            marko Marko Mäkelä added a comment - The following failure that occasionally occurs in any branch since 10.6 could be related to InnoDB persistent statistics and this design problem: CURRENT_TEST: parts.partition_special_innodb mysqltest: At line 119: query 'DROP TABLE t1' failed: ER_LOCK_WAIT_TIMEOUT (1205): Lock wait timeout exceeded; try restarting transaction In MariaDB Server 10.0 to 10.5, the regression test disable InnoDB persistent statistics; see MDEV-4750 .

            People

              monty Michael Widenius
              elenst Elena Stepanova
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.