[MDEV-29566] Failed to read from the .par file upon concurrent DDL/SELECT with partition pruning Created: 2022-09-18  Updated: 2023-11-28

Status: Stalled
Project: MariaDB Server
Component/s: Partitioning, Storage Engine - InnoDB
Affects Version/s: 10.6, 10.7, 10.8, 10.9, 10.10, 10.11
Fix Version/s: 10.6, 10.11

Type: Bug Priority: Critical
Reporter: Elena Stepanova Assignee: Michael Widenius
Resolution: Unresolved Votes: 0
Labels: regression

Issue Links:
Relates
relates to MDEV-17567 Atomic DDL Closed
relates to MDEV-27180 Fully atomic partitioning DDL operations Stalled
relates to MDEV-30583 DB_DUPLICATE_KEY on CREATE OR REPLACE... Stalled

 Description   

The test case is non-deterministic, run with --repeat=N. It fails for me (without rr) within a few attempts on various build types, but it can vary on different machines. The test is rr-able with --rr=-h, although it takes much longer, set repeat=N to a high value for rr.

--source include/have_innodb.inc
--source include/have_partition.inc
 
--connect (con1,localhost,root,,test)
CREATE TABLE t1 (pk INT PRIMARY KEY) ENGINE=InnoDB PARTITION BY HASH (pk) PARTITIONS 4;
CREATE TABLE t2 (pk INT PRIMARY KEY) ENGINE=InnoDB;
SET SESSION INNODB_LOCK_WAIT_TIMEOUT= 0;
--send
  DROP TABLE t1;
--connection default
DROP TABLE t2;
--error 0,ER_NO_SUCH_TABLE
SELECT * FROM t1 PARTITION (p3);
 
# Cleanup
--connection con1
--reap
--disconnect con1

10.6 5e270ca2

mysqltest: At line 13: query 'SELECT * FROM t1 PARTITION (p3)' failed with wrong errno ER_FAILED_READ_FROM_PAR_FILE (1696): 'Failed to read from the .par file', instead of  (0)...

The failure started happening (or, if it existed before, its probability hugely increased) after this commit in 10.6.8:

commit 8840583a92243f6ac543689148ca79c85fa0a09d
Author: Marko Mäkelä
Date:   Fri Mar 18 10:52:08 2022 +0200
 
    MDEV-27909 InnoDB: Failing assertion: state == TRX_STATE_NOT_STARTED ... on DDL



 Comments   
Comment by Marko Mäkelä [ 2022-09-19 ]

I failed to reproduce this under rr, but I reproduced it easily without rr. The following change would fix this for me:

diff --git a/storage/innobase/handler/ha_innodb.cc b/storage/innobase/handler/ha_innodb.cc
index 515f66d6ba3..4c506998741 100644
--- a/storage/innobase/handler/ha_innodb.cc
+++ b/storage/innobase/handler/ha_innodb.cc
@@ -13613,7 +13613,8 @@ int ha_innobase::delete_table(const char *name)
       dict_sys.unfreeze();
     }
 
-    const bool skip_wait{table->name.is_temporary()};
+    const bool skip_wait{table->name.is_temporary() ||
+                         dict_table_is_partition(table)};
 
     if (table_stats && index_stats &&
         !strcmp(table_stats->name.m_name, TABLE_STATS_NAME) &&

Even if I revert the above and add STATS_PERSISTENT=0 to one of the table definitions, then the test will not fail either.

If we were to apply this code change, then an existing test case could be extended to cover partitioned tables:

diff --git a/mysql-test/suite/innodb/t/innodb_stats_drop_locked.test b/mysql-test/suite/innodb/t/innodb_stats_drop_locked.test
index 6532816bb37..dc67cd41de6 100644
--- a/mysql-test/suite/innodb/t/innodb_stats_drop_locked.test
+++ b/mysql-test/suite/innodb/t/innodb_stats_drop_locked.test
@@ -4,12 +4,16 @@
 #
 
 -- source include/have_innodb.inc
+-- source include/have_partition.inc
 
 CREATE DATABASE unlocked;
 CREATE TABLE unlocked.t1(a INT PRIMARY KEY) ENGINE=INNODB STATS_PERSISTENT=0;
 CREATE DATABASE locked;
 CREATE TABLE locked.t1(a INT PRIMARY KEY) ENGINE=INNODB STATS_PERSISTENT=1;
 
+CREATE TABLE locked.t1p(pk INT PRIMARY KEY) ENGINE=InnoDB STATS_PERSISTENT=1
+PARTITION BY HASH (pk) PARTITIONS 4;
+
 CREATE TABLE innodb_stats_drop_locked (c INT, KEY c_key (c))
 ENGINE=INNODB STATS_PERSISTENT=1;
 ANALYZE TABLE innodb_stats_drop_locked;
@@ -35,14 +39,26 @@ SHOW CREATE TABLE innodb_stats_drop_locked;
 DROP TABLE innodb_stats_drop_locked;
 
 DROP DATABASE unlocked;
+
+# Partitions will always be dropped despite locking conflicts.
+DROP TABLE locked.t1p;
+--error ER_NO_SUCH_TABLE
+SELECT * FROM locked.t1p;
+
 --error ER_LOCK_WAIT_TIMEOUT
 DROP DATABASE locked;
 -- disconnect con1
 -- connection default
 COMMIT;
 
+SELECT COUNT(*) FROM mysql.innodb_table_stats WHERE database_name='locked';
+SELECT COUNT(*) FROM mysql.innodb_index_stats WHERE database_name='locked';
+
 DROP DATABASE locked;
 
+SELECT table_name FROM mysql.innodb_table_stats WHERE database_name='locked';
+SELECT table_name FROM mysql.innodb_index_stats WHERE database_name='locked';
+
 # the stats should be there
 
 SELECT table_name FROM mysql.innodb_table_stats

I am not convinced that this really is the correct fix. Dropping a partition could fail for other reasons too, and ha_partition::delete_table() should be able to propagate the error to the caller, instead of blindly ignoring it.

Based on my quick tests, it appears that ha_partition would correctly handle the failure to drop the very first partition. If the first partition was successfully dropped, it will just assume that the rest will succeed. Even worse, there is no API that would request the storage engine to drop multiple partitions atomically. If some partitions were successfully dropped and some not, we are sort-of lost, because there is no way to reach a consistent state.

Comment by Marko Mäkelä [ 2022-09-26 ]

According to monty, handler::delete_table() is currently indeed assumed not to fail as part of ha_partition operations.

Based on that, I will proceed with my above fix. That is, if acquiring an exclusive table lock on mysql.innodb_table_stats or mysql.innodb_index_stats fails, ha_innobase::delete_table() will proceed to remove the partition, without attempting to delete the corresponding records from those persistent statistics tables.

Comment by Marko Mäkelä [ 2022-09-27 ]

My fix seems to have introduced sporadic test failures:

parts.part_supported_sql_func_innodb 'innodb' w3 [ fail ]
        Test ended at 2022-09-26 14:07:46
 
CURRENT_TEST: parts.part_supported_sql_func_innodb
mysqltest: In included file "./suite/parts/inc/partition_supported_sql_funcs.inc": 
included from ./suite/parts/inc/part_supported_sql_funcs_main.inc at line 133:
included from /home/buildbot/maria-slave/fulltest-debug-big-only/build/mysql-test/suite/parts/t/part_supported_sql_func_innodb.test at line 44:
At line 234: query 'alter table t66
reorganize partition s1 into
(partition p0 values less than ($valsqlfunc),
partition p1 values less than maxvalue)' failed: ER_DUP_KEY (1022): Can't write; duplicate key in table 'mysql.innodb_table_stats'

This could be worked around by disabling InnoDB persistent statistics in all tests of partitioned tables, or by setting innodb_stats_auto_recalc=OFF so that the background task of recomputing statistics would not compete for table locks with ha_innobase::delete_table().

I think that we really need an API that would allow DDL operations on partitioned tables to be executed in an atomic transaction.

Comment by Marko Mäkelä [ 2022-09-27 ]

I reverted the work-around because of the regression.

I think that fixing this involves changing code outside InnoDB. Ideally, all partitioning ALTER TABLE operations should use a single atomic transaction inside the storage engine, so that in the test case of the Description, DROP TABLE t1 would be atomic, as implied or ‘promised’ by MDEV-17567.

Comment by Marko Mäkelä [ 2022-12-13 ]

Some regression tests may also fail due to this bug. Here is one example from buildbot:

10.6 bb29712b450525ba9e956033387361830361932f

parts.partition_special_innodb 'innodb'  w6 [ fail ]
        Test ended at 2022-11-29 17:28:21
 
CURRENT_TEST: parts.partition_special_innodb
mysqltest: At line 119: query 'DROP TABLE t1' failed: ER_LOCK_WAIT_TIMEOUT (1205): Lock wait timeout exceeded; try restarting transaction

Comment by Marko Mäkelä [ 2023-08-16 ]

While testing MDEV-30100, mleich reproduced an rr replay trace for a case where DROP TABLE IF EXISTS n failed to drop the second partition of the table, and subsequent attempts to access the table would result in ER_FAILED_READ_FROM_PAR_FILE.

Comment by Marko Mäkelä [ 2023-10-16 ]

The following failure that occasionally occurs in any branch since 10.6 could be related to InnoDB persistent statistics and this design problem:

CURRENT_TEST: parts.partition_special_innodb
mysqltest: At line 119: query 'DROP TABLE t1' failed: ER_LOCK_WAIT_TIMEOUT (1205): Lock wait timeout exceeded; try restarting transaction

In MariaDB Server 10.0 to 10.5, the regression test disable InnoDB persistent statistics; see MDEV-4750.

Generated at Thu Feb 08 10:09:35 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.