[MDEV-29566] Failed to read from the .par file upon concurrent DDL/SELECT with partition pruning - Jira

Details

Type: Bug
Status: Stalled (View Workflow)
Priority: Critical
Resolution: Unresolved
Affects Version/s: 10.6, 10.7(EOL), 10.8(EOL), 10.9(EOL), 10.10(EOL), 10.11
Fix Version/s: 10.6, 10.11
Component/s: Partitioning, Storage Engine - InnoDB
Labels:
- regression

Description

The test case is non-deterministic, run with --repeat=N. It fails for me (without rr) within a few attempts on various build types, but it can vary on different machines. The test is rr-able with --rr=-h, although it takes much longer, set repeat=N to a high value for rr.

--source include/have_innodb.inc

--source include/have_partition.inc

--connect (con1,localhost,root,,test)

CREATE TABLE t1 (pk INT PRIMARY KEY) ENGINE=InnoDB PARTITION BY HASH (pk) PARTITIONS 4;

CREATE TABLE t2 (pk INT PRIMARY KEY) ENGINE=InnoDB;

SET SESSION INNODB_LOCK_WAIT_TIMEOUT= 0;

--send

  DROP TABLE t1;

--connection default

DROP TABLE t2;

--error 0,ER_NO_SUCH_TABLE

SELECT * FROM t1 PARTITION (p3);

# Cleanup

--connection con1

--reap

--disconnect con1

10.6 5e270ca2
mysqltest: At line 13: query 'SELECT * FROM t1 PARTITION (p3)' failed with wrong errno ER_FAILED_READ_FROM_PAR_FILE (1696): 'Failed to read from the .par file', instead of (0)...

The failure started happening (or, if it existed before, its probability hugely increased) after this commit in 10.6.8:

commit 8840583a92243f6ac543689148ca79c85fa0a09d

Author: Marko Mäkelä

Date:   Fri Mar 18 10:52:08 2022 +0200

    MDEV-27909 InnoDB: Failing assertion: state == TRX_STATE_NOT_STARTED ... on DDL

Attachments

Issue Links

relates to

MDEV-17567 Atomic DDL

Closed

MDEV-27180 Fully atomic partitioning DDL operations

In Review

MDEV-27618 Atomic DDL is not very atomic on partitioned tables

Confirmed

MDEV-30583 DB_DUPLICATE_KEY on CREATE OR REPLACE after ALTER TABLE change engine

Stalled

MDEV-35811 main.mysqldump-system fails with extra entries in innodb_index_stats

Open

MDEV-36352 main.partition_explicit_prune sporadically crashes on loong64

Open

(1 relates to)

Activity

Ascending order - Click to sort in descending order

Marko Mäkelä added a comment - 2022-09-19 08:56

I failed to reproduce this under rr, but I reproduced it easily without rr. The following change would fix this for me:

diff --git a/storage/innobase/handler/ha_innodb.cc b/storage/innobase/handler/ha_innodb.cc

index 515f66d6ba3..4c506998741 100644

--- a/storage/innobase/handler/ha_innodb.cc

+++ b/storage/innobase/handler/ha_innodb.cc

@@ -13613,7 +13613,8 @@ int ha_innobase::delete_table(const char *name)

       dict_sys.unfreeze();

-    const bool skip_wait{table->name.is_temporary()};

+    const bool skip_wait{table->name.is_temporary() ||

+                         dict_table_is_partition(table)};

     if (table_stats && index_stats &&

         !strcmp(table_stats->name.m_name, TABLE_STATS_NAME) &&

Even if I revert the above and add STATS_PERSISTENT=0 to one of the table definitions, then the test will not fail either.

If we were to apply this code change, then an existing test case could be extended to cover partitioned tables:

diff --git a/mysql-test/suite/innodb/t/innodb_stats_drop_locked.test b/mysql-test/suite/innodb/t/innodb_stats_drop_locked.test

index 6532816bb37..dc67cd41de6 100644

--- a/mysql-test/suite/innodb/t/innodb_stats_drop_locked.test

+++ b/mysql-test/suite/innodb/t/innodb_stats_drop_locked.test

@@ -4,12 +4,16 @@

 -- source include/have_innodb.inc

+-- source include/have_partition.inc

 CREATE DATABASE unlocked;

 CREATE TABLE unlocked.t1(a INT PRIMARY KEY) ENGINE=INNODB STATS_PERSISTENT=0;

 CREATE DATABASE locked;

 CREATE TABLE locked.t1(a INT PRIMARY KEY) ENGINE=INNODB STATS_PERSISTENT=1;

+CREATE TABLE locked.t1p(pk INT PRIMARY KEY) ENGINE=InnoDB STATS_PERSISTENT=1

+PARTITION BY HASH (pk) PARTITIONS 4;

 CREATE TABLE innodb_stats_drop_locked (c INT, KEY c_key (c))

 ENGINE=INNODB STATS_PERSISTENT=1;

 ANALYZE TABLE innodb_stats_drop_locked;

@@ -35,14 +39,26 @@ SHOW CREATE TABLE innodb_stats_drop_locked;

 DROP TABLE innodb_stats_drop_locked;

 DROP DATABASE unlocked;

+# Partitions will always be dropped despite locking conflicts.

+DROP TABLE locked.t1p;

+--error ER_NO_SUCH_TABLE

+SELECT * FROM locked.t1p;

 --error ER_LOCK_WAIT_TIMEOUT

 DROP DATABASE locked;

 -- disconnect con1

 -- connection default

 COMMIT;

+SELECT COUNT(*) FROM mysql.innodb_table_stats WHERE database_name='locked';

+SELECT COUNT(*) FROM mysql.innodb_index_stats WHERE database_name='locked';

 DROP DATABASE locked;

+SELECT table_name FROM mysql.innodb_table_stats WHERE database_name='locked';

+SELECT table_name FROM mysql.innodb_index_stats WHERE database_name='locked';

 # the stats should be there

 SELECT table_name FROM mysql.innodb_table_stats

I am not convinced that this really is the correct fix. Dropping a partition could fail for other reasons too, and ha_partition::delete_table() should be able to propagate the error to the caller, instead of blindly ignoring it.

Based on my quick tests, it appears that ha_partition would correctly handle the failure to drop the very first partition. If the first partition was successfully dropped, it will just assume that the rest will succeed. Even worse, there is no API that would request the storage engine to drop multiple partitions atomically. If some partitions were successfully dropped and some not, we are sort-of lost, because there is no way to reach a consistent state.

Marko Mäkelä added a comment - 2022-09-19 08:56 I failed to reproduce this under rr , but I reproduced it easily without rr . The following change would fix this for me: diff --git a/storage/innobase/handler/ha_innodb.cc b/storage/innobase/handler/ha_innodb.cc index 515f66d6ba3..4c506998741 100644 --- a/storage/innobase/handler/ha_innodb.cc +++ b/storage/innobase/handler/ha_innodb.cc @@ -13613,7 +13613,8 @@ int ha_innobase::delete_table(const char *name) dict_sys.unfreeze(); } - const bool skip_wait{table->name.is_temporary()}; + const bool skip_wait{table->name.is_temporary() || + dict_table_is_partition(table)}; if (table_stats && index_stats && !strcmp(table_stats->name.m_name, TABLE_STATS_NAME) && Even if I revert the above and add STATS_PERSISTENT=0 to one of the table definitions, then the test will not fail either. If we were to apply this code change, then an existing test case could be extended to cover partitioned tables: diff --git a/mysql-test/suite/innodb/t/innodb_stats_drop_locked.test b/mysql-test/suite/innodb/t/innodb_stats_drop_locked.test index 6532816bb37..dc67cd41de6 100644 --- a/mysql-test/suite/innodb/t/innodb_stats_drop_locked.test +++ b/mysql-test/suite/innodb/t/innodb_stats_drop_locked.test @@ -4,12 +4,16 @@ # -- source include/have_innodb.inc +-- source include/have_partition.inc CREATE DATABASE unlocked; CREATE TABLE unlocked.t1(a INT PRIMARY KEY) ENGINE=INNODB STATS_PERSISTENT=0; CREATE DATABASE locked; CREATE TABLE locked.t1(a INT PRIMARY KEY) ENGINE=INNODB STATS_PERSISTENT=1; +CREATE TABLE locked.t1p(pk INT PRIMARY KEY) ENGINE=InnoDB STATS_PERSISTENT=1 +PARTITION BY HASH (pk) PARTITIONS 4; + CREATE TABLE innodb_stats_drop_locked (c INT, KEY c_key (c)) ENGINE=INNODB STATS_PERSISTENT=1; ANALYZE TABLE innodb_stats_drop_locked; @@ -35,14 +39,26 @@ SHOW CREATE TABLE innodb_stats_drop_locked; DROP TABLE innodb_stats_drop_locked; DROP DATABASE unlocked; + +# Partitions will always be dropped despite locking conflicts. +DROP TABLE locked.t1p; +--error ER_NO_SUCH_TABLE +SELECT * FROM locked.t1p; + --error ER_LOCK_WAIT_TIMEOUT DROP DATABASE locked; -- disconnect con1 -- connection default COMMIT; +SELECT COUNT(*) FROM mysql.innodb_table_stats WHERE database_name='locked'; +SELECT COUNT(*) FROM mysql.innodb_index_stats WHERE database_name='locked'; + DROP DATABASE locked; +SELECT table_name FROM mysql.innodb_table_stats WHERE database_name='locked'; +SELECT table_name FROM mysql.innodb_index_stats WHERE database_name='locked'; + # the stats should be there SELECT table_name FROM mysql.innodb_table_stats I am not convinced that this really is the correct fix. Dropping a partition could fail for other reasons too, and ha_partition::delete_table() should be able to propagate the error to the caller, instead of blindly ignoring it. Based on my quick tests, it appears that ha_partition would correctly handle the failure to drop the very first partition. If the first partition was successfully dropped, it will just assume that the rest will succeed. Even worse, there is no API that would request the storage engine to drop multiple partitions atomically. If some partitions were successfully dropped and some not, we are sort-of lost, because there is no way to reach a consistent state.

Marko Mäkelä added a comment - 2022-09-26 12:08

According to monty, handler::delete_table() is currently indeed assumed not to fail as part of ha_partition operations.

Based on that, I will proceed with my above fix. That is, if acquiring an exclusive table lock on mysql.innodb_table_stats or mysql.innodb_index_stats fails, ha_innobase::delete_table() will proceed to remove the partition, without attempting to delete the corresponding records from those persistent statistics tables.

Marko Mäkelä added a comment - 2022-09-26 12:08 According to monty , handler::delete_table() is currently indeed assumed not to fail as part of ha_partition operations. Based on that, I will proceed with my above fix. That is, if acquiring an exclusive table lock on mysql.innodb_table_stats or mysql.innodb_index_stats fails, ha_innobase::delete_table() will proceed to remove the partition, without attempting to delete the corresponding records from those persistent statistics tables.

Marko Mäkelä added a comment - 2022-09-27 07:23

My fix seems to have introduced sporadic test failures:

parts.part_supported_sql_func_innodb 'innodb' w3 [ fail ]

        Test ended at 2022-09-26 14:07:46

CURRENT_TEST: parts.part_supported_sql_func_innodb

mysqltest: In included file "./suite/parts/inc/partition_supported_sql_funcs.inc":

included from ./suite/parts/inc/part_supported_sql_funcs_main.inc at line 133:

included from /home/buildbot/maria-slave/fulltest-debug-big-only/build/mysql-test/suite/parts/t/part_supported_sql_func_innodb.test at line 44:

At line 234: query 'alter table t66

reorganize partition s1 into

(partition p0 values less than ($valsqlfunc),

partition p1 values less than maxvalue)' failed: ER_DUP_KEY (1022): Can't write; duplicate key in table 'mysql.innodb_table_stats'

This could be worked around by disabling InnoDB persistent statistics in all tests of partitioned tables, or by setting innodb_stats_auto_recalc=OFF so that the background task of recomputing statistics would not compete for table locks with ha_innobase::delete_table().

I think that we really need an API that would allow DDL operations on partitioned tables to be executed in an atomic transaction.

Marko Mäkelä added a comment - 2022-09-27 07:23 My fix seems to have introduced sporadic test failures : parts.part_supported_sql_func_innodb 'innodb' w3 [ fail ] Test ended at 2022-09-26 14:07:46 CURRENT_TEST: parts.part_supported_sql_func_innodb mysqltest: In included file "./suite/parts/inc/partition_supported_sql_funcs.inc": included from ./suite/parts/inc/part_supported_sql_funcs_main.inc at line 133: included from /home/buildbot/maria-slave/fulltest-debug-big-only/build/mysql-test/suite/parts/t/part_supported_sql_func_innodb.test at line 44: At line 234: query 'alter table t66 reorganize partition s1 into (partition p0 values less than ($valsqlfunc), partition p1 values less than maxvalue)' failed: ER_DUP_KEY (1022): Can't write; duplicate key in table 'mysql.innodb_table_stats' This could be worked around by disabling InnoDB persistent statistics in all tests of partitioned tables, or by setting innodb_stats_auto_recalc=OFF so that the background task of recomputing statistics would not compete for table locks with ha_innobase::delete_table() . I think that we really need an API that would allow DDL operations on partitioned tables to be executed in an atomic transaction.

Marko Mäkelä added a comment - 2022-09-27 07:58

I reverted the work-around because of the regression.

I think that fixing this involves changing code outside InnoDB. Ideally, all partitioning ALTER TABLE operations should use a single atomic transaction inside the storage engine, so that in the test case of the Description, DROP TABLE t1 would be atomic, as implied or ‘promised’ by ~~MDEV-17567~~.

Marko Mäkelä added a comment - 2022-09-27 07:58 I reverted the work-around because of the regression. I think that fixing this involves changing code outside InnoDB. Ideally, all partitioning ALTER TABLE operations should use a single atomic transaction inside the storage engine, so that in the test case of the Description, DROP TABLE t1 would be atomic, as implied or ‘promised’ by MDEV-17567 .

Marko Mäkelä added a comment - 2022-12-13 07:11

Some regression tests may also fail due to this bug. Here is one example from buildbot:

10.6 bb29712b450525ba9e956033387361830361932f
parts.partition_special_innodb 'innodb' w6 [ fail ]
Test ended at 2022-11-29 17:28:21

CURRENT_TEST: parts.partition_special_innodb
mysqltest: At line 119: query 'DROP TABLE t1' failed: ER_LOCK_WAIT_TIMEOUT (1205): Lock wait timeout exceeded; try restarting transaction

Marko Mäkelä added a comment - 2022-12-13 07:11 Some regression tests may also fail due to this bug. Here is one example from buildbot: 10.6 bb29712b450525ba9e956033387361830361932f parts.partition_special_innodb 'innodb' w6 [ fail ] Test ended at 2022-11-29 17:28:21 CURRENT_TEST: parts.partition_special_innodb mysqltest: At line 119: query 'DROP TABLE t1' failed: ER_LOCK_WAIT_TIMEOUT (1205): Lock wait timeout exceeded; try restarting transaction

Marko Mäkelä added a comment - 2023-08-16 06:05

While testing ~~MDEV-30100~~, mleich reproduced an rr replay trace for a case where DROP TABLE IF EXISTS n failed to drop the second partition of the table, and subsequent attempts to access the table would result in ER_FAILED_READ_FROM_PAR_FILE.

Marko Mäkelä added a comment - 2023-08-16 06:05 While testing MDEV-30100 , mleich reproduced an rr replay trace for a case where DROP TABLE IF EXISTS n failed to drop the second partition of the table, and subsequent attempts to access the table would result in ER_FAILED_READ_FROM_PAR_FILE .

Marko Mäkelä added a comment - 2023-10-16 06:43

The following failure that occasionally occurs in any branch since 10.6 could be related to InnoDB persistent statistics and this design problem:

CURRENT_TEST: parts.partition_special_innodb

mysqltest: At line 119: query 'DROP TABLE t1' failed: ER_LOCK_WAIT_TIMEOUT (1205): Lock wait timeout exceeded; try restarting transaction

In MariaDB Server 10.0 to 10.5, the regression test disable InnoDB persistent statistics; see ~~MDEV-4750~~.

Marko Mäkelä added a comment - 2023-10-16 06:43 The following failure that occasionally occurs in any branch since 10.6 could be related to InnoDB persistent statistics and this design problem: CURRENT_TEST: parts.partition_special_innodb mysqltest: At line 119: query 'DROP TABLE t1' failed: ER_LOCK_WAIT_TIMEOUT (1205): Lock wait timeout exceeded; try restarting transaction In MariaDB Server 10.0 to 10.5, the regression test disable InnoDB persistent statistics; see MDEV-4750 .

People

Assignee:: Michael Widenius

Reporter:: Elena Stepanova

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 2022-09-18 16:52

Updated:: 2025-03-24 06:31

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.