Details
-
Bug
-
Status: Stalled (View Workflow)
-
Critical
-
Resolution: Unresolved
-
10.6, 10.7(EOL), 10.8(EOL), 10.9(EOL), 10.10(EOL), 10.11
Description
The test case is non-deterministic, run with --repeat=N. It fails for me (without rr) within a few attempts on various build types, but it can vary on different machines. The test is rr-able with --rr=-h, although it takes much longer, set repeat=N to a high value for rr.
--source include/have_innodb.inc
|
--source include/have_partition.inc
|
|
--connect (con1,localhost,root,,test)
|
CREATE TABLE t1 (pk INT PRIMARY KEY) ENGINE=InnoDB PARTITION BY HASH (pk) PARTITIONS 4; |
CREATE TABLE t2 (pk INT PRIMARY KEY) ENGINE=InnoDB; |
SET SESSION INNODB_LOCK_WAIT_TIMEOUT= 0; |
--send
|
DROP TABLE t1; |
--connection default
|
DROP TABLE t2; |
--error 0,ER_NO_SUCH_TABLE
|
SELECT * FROM t1 PARTITION (p3); |
|
# Cleanup
|
--connection con1
|
--reap
|
--disconnect con1 |
10.6 5e270ca2 |
mysqltest: At line 13: query 'SELECT * FROM t1 PARTITION (p3)' failed with wrong errno ER_FAILED_READ_FROM_PAR_FILE (1696): 'Failed to read from the .par file', instead of (0)... |
The failure started happening (or, if it existed before, its probability hugely increased) after this commit in 10.6.8:
commit 8840583a92243f6ac543689148ca79c85fa0a09d
|
Author: Marko Mäkelä
|
Date: Fri Mar 18 10:52:08 2022 +0200
|
|
MDEV-27909 InnoDB: Failing assertion: state == TRX_STATE_NOT_STARTED ... on DDL
|
Attachments
Issue Links
- relates to
-
MDEV-17567 Atomic DDL
-
- Closed
-
-
MDEV-27180 Fully atomic partitioning DDL operations
-
- In Review
-
-
MDEV-27618 Atomic DDL is not very atomic on partitioned tables
-
- Confirmed
-
-
MDEV-30583 DB_DUPLICATE_KEY on CREATE OR REPLACE after ALTER TABLE change engine
-
- Stalled
-
-
MDEV-35811 main.mysqldump-system fails with extra entries in innodb_index_stats
-
- Open
-
-
MDEV-36352 main.partition_explicit_prune sporadically crashes on loong64
-
- Open
-
I failed to reproduce this under rr, but I reproduced it easily without rr. The following change would fix this for me:
diff --git a/storage/innobase/handler/ha_innodb.cc b/storage/innobase/handler/ha_innodb.cc
index 515f66d6ba3..4c506998741 100644
--- a/storage/innobase/handler/ha_innodb.cc
+++ b/storage/innobase/handler/ha_innodb.cc
@@ -13613,7 +13613,8 @@ int ha_innobase::delete_table(const char *name)
dict_sys.unfreeze();
}
- const bool skip_wait{table->name.is_temporary()};
+ const bool skip_wait{table->name.is_temporary() ||
+ dict_table_is_partition(table)};
if (table_stats && index_stats &&
Even if I revert the above and add STATS_PERSISTENT=0 to one of the table definitions, then the test will not fail either.
If we were to apply this code change, then an existing test case could be extended to cover partitioned tables:
diff --git a/mysql-test/suite/innodb/t/innodb_stats_drop_locked.test b/mysql-test/suite/innodb/t/innodb_stats_drop_locked.test
index 6532816bb37..dc67cd41de6 100644
--- a/mysql-test/suite/innodb/t/innodb_stats_drop_locked.test
+++ b/mysql-test/suite/innodb/t/innodb_stats_drop_locked.test
@@ -4,12 +4,16 @@
#
-- source include/have_innodb.inc
+-- source include/have_partition.inc
CREATE DATABASE unlocked;
CREATE TABLE unlocked.t1(a INT PRIMARY KEY) ENGINE=INNODB STATS_PERSISTENT=0;
CREATE DATABASE locked;
CREATE TABLE locked.t1(a INT PRIMARY KEY) ENGINE=INNODB STATS_PERSISTENT=1;
+CREATE TABLE locked.t1p(pk INT PRIMARY KEY) ENGINE=InnoDB STATS_PERSISTENT=1
+PARTITION BY HASH (pk) PARTITIONS 4;
+
CREATE TABLE innodb_stats_drop_locked (c INT, KEY c_key (c))
ENGINE=INNODB STATS_PERSISTENT=1;
ANALYZE TABLE innodb_stats_drop_locked;
@@ -35,14 +39,26 @@ SHOW CREATE TABLE innodb_stats_drop_locked;
DROP TABLE innodb_stats_drop_locked;
DROP DATABASE unlocked;
+
+# Partitions will always be dropped despite locking conflicts.
+DROP TABLE locked.t1p;
+--error ER_NO_SUCH_TABLE
+SELECT * FROM locked.t1p;
+
--error ER_LOCK_WAIT_TIMEOUT
DROP DATABASE locked;
-- disconnect con1
-- connection default
COMMIT;
+SELECT COUNT(*) FROM mysql.innodb_table_stats WHERE database_name='locked';
+SELECT COUNT(*) FROM mysql.innodb_index_stats WHERE database_name='locked';
+
DROP DATABASE locked;
+SELECT table_name FROM mysql.innodb_table_stats WHERE database_name='locked';
+SELECT table_name FROM mysql.innodb_index_stats WHERE database_name='locked';
+
# the stats should be there
I am not convinced that this really is the correct fix. Dropping a partition could fail for other reasons too, and ha_partition::delete_table() should be able to propagate the error to the caller, instead of blindly ignoring it.
Based on my quick tests, it appears that ha_partition would correctly handle the failure to drop the very first partition. If the first partition was successfully dropped, it will just assume that the rest will succeed. Even worse, there is no API that would request the storage engine to drop multiple partitions atomically. If some partitions were successfully dropped and some not, we are sort-of lost, because there is no way to reach a consistent state.