|
let datadir= `select @@datadir`;
|
|
--let $_server_id= `SELECT @@server_id`
|
--let $_expect_file_name=$MYSQLTEST_VARDIR/tmp/mysqld.$_server_id.expect
|
|
CREATE TABLE t1 (f1 INT NOT NULL, f2 INT NOT NULL) ENGINE=innodb;
|
SET debug_dbug='+d,innodb_alter_commit_crash_before_commit';
|
|
--exec echo "wait" > $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
|
|
--error 2013
|
ALTER TABLE t1 ADD PRIMARY KEY (f2, f1);
|
|
|
--source include/start_mysqld.inc
|
|
show create table t1;
|
|
Before crashing:
================
InnoDB does the following:
1) create a table t1(table id is 18)
2) create a table #sql-alter-10b0-4.ibd(table id is 19) as part of alter operation
3) InnoDB renames t1.ibd to #sql-ib19.ibd
4) InnoDB renames #sql-alter-10b0-4.ibd to t1.ibd
5) InnoDB drops #sql-ib19.ibd
6) Kills the server before alter finishes.
Data directory has the following:
|
=================================
|
'#sql-alter-10b0-4.frm' '#sql-ib19.ibd' t1.frm t1.ibd
|
InnoDB sys tables
===================
2 entries for table id (18). one is test/t1, other is #sql-ibd19.ibd
Both entries are deleted.
2 entries for table id (19). one is #sql-alter-10b0-4.ibd, other is test/t1.
first one is deleted, 2nd one exist.
Recovery:
=========
InnoDB tries to load the table during trx_resurrect():
- table id 18 fails to load the table due to deleted entries in SYS_TABLES record.
- table id 19 load was successful and loads the tablespace also. (for tablespace t1)
Undo log processing done by InnoDB:
InnoDB does removal of new entries in SYS_TABLES.
InnoDB does undo of rename operation (t1 to #sql-alter-10b0-4.ibd).
During opening of table, InnoDB fails to re-load the tablespace for #sql-alter-10b04.ibd
due to the following code in dict_load_tablespace():
/* The tablespace may already be open. */
|
table->space = fil_space_for_table_exists_in_mem(table->space_id,
|
table->flags);
|
if (table->space) {
|
return;
|
}
|
and it does rename successfully as well. Now the data directory has
'#sql-alter-10b0-4.frm' '#sql-ib19.ibd' t1.frm '#sql-alter-10b0-4.ibd'
InnoDB tries to undo the rename operation (#sql-ib19.ibd to t1)
- InnoDB tries to open the tablespace for t1 and it fails to load the file in data directory.
and it leads to skipping the rename operation. Now, InnoDB doesn't have table t1.
|
|
I think that the following failure might be related to this:
atomic.rename_trigger 'innodb' w4 [ fail ]
|
Test ended at 2022-02-07 00:56:07
|
CURRENT_TEST: atomic.rename_trigger
|
mysqltest: At line 159: query 'let $res=`select t1.a+t2.b+t3.c+t4.d from t1,t2,t3,t4`' failed: ER_UNKNOWN_STORAGE_ENGINE (1286): Unknown storage engine 'InnoDB'
|
…
|
"engine: innodb crash point: ddl_log_rename_after_failed_rename_trigger position: 3"
|
More results from queries before failure can be found in /buildbot/amd64-ubuntu-1804-msan/build/mysql-test/var/4/log/rename_trigger.log
|
- skipping '/buildbot/amd64-ubuntu-1804-msan/build/mysql-test/var/4/log/atomic.rename_trigger-innodb/'
|
***Warnings generated in error logs during shutdown after running tests: atomic.rename_trigger
|
2022-02-07 0:56:06 0 [ERROR] InnoDB: Tablespace 5 was not found at ./test/t5.ibd.
|
2022-02-07 0:56:06 0 [ERROR] InnoDB: Set innodb_force_recovery=1 to ignore this and to permanently lose all changes to the tablespace.
|
2022-02-07 0:56:06 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1445] with error Tablespace not found
|
|
|
Another possibly related failure:
encryption.innodb-redo-badkey 'cbc,innodb,strict_crc32' w4 [ fail ] Found warnings/errors in server log file!
|
Test ended at 2022-02-06 22:48:09
|
line
|
2022-02-06 22:48:06 0 [ERROR] InnoDB: Missing FILE_CREATE, FILE_DELETE or FILE_MODIFY before FILE_CHECKPOINT for tablespace 5
|
2022-02-06 22:48:07 0 [ERROR] InnoDB: Missing FILE_CREATE, FILE_DELETE or FILE_MODIFY before FILE_CHECKPOINT for tablespace 5
|
|
|
I tried this on 10.8 (after MDEV-14425), but it did not fail:
diff --git a/storage/innobase/fil/fil0fil.cc b/storage/innobase/fil/fil0fil.cc
|
index 0273fcd9a03..8f02ff51242 100644
|
--- a/storage/innobase/fil/fil0fil.cc
|
+++ b/storage/innobase/fil/fil0fil.cc
|
@@ -1857,6 +1857,7 @@ static bool fil_rename_tablespace(uint32_t id, const char *old_path,
|
ut_ad(strchr(new_file_name, '/'));
|
|
if (!recv_recovery_is_on()) {
|
+ log_make_checkpoint();
|
log_sys.latch.wr_lock(SRW_LOCK_CALL);
|
}
|
|
|
10.8 e8a2a70cf869cf32a50144704646a5f34e8d7be9 (patched)
|
innodb.instant_alter_crash 'innodb' [ pass ] 2079
|
innodb.alter_crash 'innodb' [ pass ] 2489
|
atomic.rename_trigger 'innodb' [ pass ] 33901
|
Also, the crash injection test case recovered fine. Does this still fail on the latest 10.6?
|
|
After I ran the crash injection test on 10.7 af87186c1d3f9eeab73bc38b1faa51ba17fa8017, the var/mysqld.1/data/test only contained db.opt, t1.frm, t1.ibd.
Edit: With the patch, the recovery fails. (But not on 10.8 with the patch.)
|
10.7 af87186c1d3f9eeab73bc38b1faa51ba17fa8017 (patched)
|
mysqltest: At line 20: query 'show create table t1' failed: ER_GET_ERRNO (1030): Got error 194 "Tablespace is missing for a table" from storage engine InnoDB
|
|
|
10.8 with MDEV-14425 but right before MDEV-27774 fails to recover:
|
10.8 8c7c92adf384b2d0294502d5b38684e68d5eaf27 (patched)
|
mysqltest: At line 20: query 'show create table t1' failed: ER_GET_ERRNO (1030): Got error 194 "Tablespace is missing for a table" from storage engine InnoDB
|
|
|
Problem exists on 10.6.2 also. The below commit is the reason for the issue:
[~/mariarepo/10.6/10.6-sample]$git log -p -1 1bd681c8b3c
|
commit 1bd681c8b3c5213ce1f7976940a7dc38b48a0d39
|
Author: Marko Mäkelä <marko.makela@mariadb.com>
|
Date: Wed Jun 9 17:02:55 2021 +0300
|
|
MDEV-25506 (3 of 3): Do not delete .ibd files before commit
|
|
|
trx_resurrect_table_locks() fails to load the table because sys table entries has deleted marked record for it. It leads to failure of loading of tables during
recovery.
|
|
Thank you. I see that the following code in dict_load_table_on_id() makes an incorrect assumption:
if (page_rec_is_user_rec(rec)) {
|
/*---------------------------------------------------*/
|
/* Now we have the record in the secondary index
|
containing the table ID and NAME */
|
check_rec:
|
field = rec_get_nth_field_old(
|
rec, DICT_FLD__SYS_TABLE_IDS__ID, &len);
|
ut_ad(len == 8);
|
|
/* Check if the table id in record is the one searched for */
|
if (table_id == mach_read_from_8(field)) {
|
if (rec_get_deleted_flag(rec, 0)) {
|
/* Until purge has completed, there
|
may be delete-marked duplicate records
|
for the same SYS_TABLES.ID, but different
|
SYS_TABLES.NAME. */
|
while (btr_pcur_move_to_next(&pcur, &mtr)) {
|
rec = btr_pcur_get_rec(&pcur);
|
|
if (page_rec_is_user_rec(rec)) {
|
goto check_rec;
|
}
|
}
|
This code is essentially implementing READ UNCOMMITTED while we need READ COMMITTED here. It is wrong to filter out delete-marked secondary index records (records of SYS_TABLES.ID). The secondary index indeed only contains delete-marked records for this SYS_TABLES.ID=18: (ID,NAME)=(18,'test/#sql-ib19'),(18,'test/t1'). In the clustered index, I see a delete-marked record for the former and a non-delete-marked record for 'test/t1', with the wrong ID=19 for the most recent (uncommitted) version. Both records contain DB_TRX_ID=25.
Implementing proper READ COMMITTED might be easiest by allocating a transaction with a read view, and adapting some logic from row_search_mvcc() for accessing the dictionary tables. The functions row_sel_sec_rec_is_for_clust_rec() and row_vers_build_for_consistent_read() have to be part of the solution.
|
|
I think that we can implement "poor man’s MVCC" for the secondary index scan like this:
diff --git a/storage/innobase/dict/dict0load.cc b/storage/innobase/dict/dict0load.cc
|
index 71dedd48c29..88351b06201 100644
|
--- a/storage/innobase/dict/dict0load.cc
|
+++ b/storage/innobase/dict/dict0load.cc
|
@@ -2536,11 +2536,16 @@ dict_load_table_on_id(
|
|
/* Check if the table id in record is the one searched for */
|
if (table_id == mach_read_from_8(field)) {
|
- if (rec_get_deleted_flag(rec, 0)) {
|
- /* Until purge has completed, there
|
- may be delete-marked duplicate records
|
- for the same SYS_TABLES.ID, but different
|
- SYS_TABLES.NAME. */
|
+ field = rec_get_nth_field_old(rec,
|
+ DICT_FLD__SYS_TABLE_IDS__NAME, &len);
|
+ table = dict_sys.load_table(
|
+ {reinterpret_cast<const char*>(field),
|
+ len}, ignore_err);
|
+ if (table && table->id != table_id) {
|
+ ut_ad(rec_get_deleted_flag(rec, 0));
|
+ table = nullptr;
|
+ }
|
+ if (!table) {
|
while (btr_pcur_move_to_next(&pcur, &mtr)) {
|
rec = btr_pcur_get_rec(&pcur);
|
|
@@ -2548,13 +2553,6 @@ dict_load_table_on_id(
|
goto check_rec;
|
}
|
}
|
- } else {
|
- /* Now we get the table name from the record */
|
- field = rec_get_nth_field_old(rec,
|
- DICT_FLD__SYS_TABLE_IDS__NAME, &len);
|
- table = dict_sys.load_table(
|
- {reinterpret_cast<const char*>(field),
|
- len}, ignore_err);
|
}
|
}
|
}
|
For the first matching record, dict_sys.load_table() would find nothing, because it is still performing READ UNCOMMITTED and that record is delete-marked. For the second one, it returns a table with the wrong id=19 because it fails to fetch the latest committed version.
Edit: I was thinking to use purge_sys.view for the MVCC read, but I realized that READ COMMITTED semantics would require the newest possible read view instead of the oldest one.
|
|
It may be much easier to roll back a recovered DDL transaction before resurrecting the table locks, by simply continuing to use READ UNCOMMITTED, which thanks to the added rollback would be equivalent to READ COMMITTED.
If we implement READ COMMITTED, it will have to be implemented for accessing all dictionary tables. For example, if both the table and some indexed columns or base columns of virtual columns were renamed in a recovered incomplete DDL transaction, from each dictionary table we must fetch the correct names.
|
|
It turns out that we do have to implement the READ COMMITTED access to the data dictionary. Even if we defer trx_resurrect_table_locks() after the trx_rollback_active(false) call that would roll back the DDL transaction, that very rollback would fail in its attempt to load the table definition.
|
|
Here is a test that halts the commit and then forces a checkpoint so that the FILE_ record recovery will not interfere with the processing of the undo logs and the data dictionary tables:
--source include/have_innodb.inc
|
--source include/have_debug.inc
|
--source include/have_debug_sync.inc
|
|
CREATE TABLE t1 (a INT NOT NULL) ENGINE=InnoDB STATS_PERSISTENT=0;
|
|
connect ddl,localhost,root;
|
SET DEBUG_SYNC='after_trx_committed_in_memory SIGNAL stuck WAIT_FOR ever EXECUTE 2';
|
send ALTER TABLE t1 ADD PRIMARY KEY(a);
|
|
connection default;
|
SET DEBUG_SYNC='now WAIT_FOR stuck';
|
SET DEBUG_SYNC='now SIGNAL ever';
|
SET DEBUG_SYNC='now WAIT_FOR stuck';
|
|
SET GLOBAL innodb_log_checkpoint_now=ON;
|
|
--let $shutdown_timeout=0
|
--source include/restart_mysqld.inc
|
|
disconnect ddl;
|
|
SHOW CREATE TABLE t1;
|
SELECT * FROM t1;
|
DROP TABLE t1;
|
The test actually passes (with SHOW CREATE TABLE wrongly indicating that the table includes a PRIMARY KEY), and an orphan table will be left behind:
|
10.9 35fcae10408aa4c54967d118c8ccde1c3736b85f
|
MTR's internal check of the test case 'main.MDEV-27234' failed.
|
…
|
--- /dev/shm/10.9/mysql-test/var/tmp/check-mysqld_1.result 2022-03-10 10:53:59.764275437 +0200
|
+++ /dev/shm/10.9/mysql-test/var/tmp/check-mysqld_1.reject 2022-03-10 10:54:00.352281258 +0200
|
@@ -1092,3 +1092,4 @@
|
partition 1.0 DISABLED STORAGE ENGINE 100900.0 NULL NULL Mikael Ronstrom, MySQL AB Partition Storage Engine Helper GPL OFF Stable 1.0
|
VARIABLE_NAME VARIABLE_VALUE
|
DEBUG_SYNC ON - current signal: ''
|
+#sql-ib19.ibd
|
|
mysqltest: Result length mismatch
|
On 10.6 36a19f94ce102be87b22fdc7ad147ab7d5ed4331 the test behaves in exactly the same way.
|
|
The only dictionary tables that contain any secondary indexes are SYS_TABLES and SYS_FOREIGN. The function dict_load_foreigns() does access the secondary index on SYS_FOREIGN.FOR_NAME, in the READ UNCOMMITTED fashion. We must not filter out delete-marked records there, and we must make dict_load_foreign() retrieve the last committed version, or return a status that no committed record existed.
Every access to the clustered indexes must be corrected in a similar way, so that we will load the latest committed version (READ COMMITTED).
I expect that the function trx_t::evict_table() can be removed as part of fixing this bug.
|
|
data.tar.xz is a data directory with innodb_page_size=64k that might demonstrate this bug. On accessing the table t3 (on which DDL was in progress when the server was killed), the following mismatch would be reported:
|
10.6 dafc5fb9c17637c6233e8a73c890f158d645a37c
|
2022-03-15 16:16:19 3 [ERROR] InnoDB could not find key no 0 with name Marvão_uidx1 from dict cache for table test/t3
|
|
|
Some adjustment may be needed for recovering instant ALTER TABLE metadata (MDEV-11369 and MDEV-15562). Currently, the rec_get_offsets() call in btr_cur_instant_init_low() assumes that dict_index_t::n_fields corresponds to the READ UNCOMMITTED version of the table definition. An assertion would fail during the first restart of the test innodb.instant_alter_crash.
|
|
The trx_t::evict_table() is necessary for evicting table definitions on the rollback of DDL transactions, for example, on a CREATE TABLE that failed due to incorrect FOREIGN KEY constraints.
My current work-in-progress fix still shows the index mismatch message for data.tar.xz :
2022-03-21 17:11:23 3 [ERROR] Cannot find index Marvão_uidx1 in InnoDB index dictionary.
|
2022-03-21 17:11:23 3 [ERROR] InnoDB indexes are inconsistent with what defined in .frm for table ./test/t3
|
2022-03-21 17:11:23 3 [ERROR] InnoDB could not find key no 0 with name Marvão_uidx1 from dict cache for table test/t3
|
2022-03-21 17:11:23 3 [ERROR] InnoDB: Table test/t3 contains 3 indexes inside InnoDB, which is different from the number of indexes 3 defined in the .frm file. See https://mariadb.com/kb/en/innodb-troubleshooting/
|
|
|
I spent a lot of time on the test case before I confirmed with rr replay that before the server is killed, the ADD PRIMARY KEY was actually committed. The actual reason why the #sql-ib file was not dropped is that the fil_delete_tablespace() call in row_purge_remove_clust_if_poss_low() did not find the tablespace, because dict_check_sys_tables() was skipping delete-marked records of SYS_TABLES.
My work-in-progress fix of that broke a number of mtr tests (mostly IMPORT TABLESPACE) and still fails to recover data.tar.xz correctly.
|
|
The data.tar.xz cannot be helped by this fix. The warning message mentions an unexpectedly missing index. In fact, in dict_load_index_low(), a delete-marked record of SYS_INDEXES for it was encountered and correctly filtered out in my development branch, because no active transaction for the DB_TRX_ID was found. We would need an rr replay trace of such a failure to understand why the DDL recovery apparently chose the wrong .frm file when recovering from a crash during DROP INDEX Marvão_uidx1.
|
|
I was able to fix the recovery of data.tar.xz . It turned out that there was a SYS_INDEXES record for an ADD INDEX stub that had been deleted by a committed transaction. The table->def_trx_id was wrongly updated to the DB_TRX_ID of that record, and InnoDB wrongly informed the DDL log recovery that the ALTER TABLE transaction had been committed inside InnoDB. Yes, we must pay attention to the DB_TRX_ID of committed delete-marked SYS_INDEXES records, but only if the SYS_INDEXES.NAME does not identify the index as incomplete (by starting with a 0xff byte).
For the crash in the recovery of the test innodb.instant_alter_crash I got an idea: btr_cur_instant_init_low() should perform a READ COMMITTED of the instant ADD COLUMN metadata record. In the case of the crash, that record had been inserted by an uncommitted transaction.
|
|
origin/bb-10.6-MDEV-27234 13dfdd3953c076177f8b61355cb70a887a958b4b 2022-03-24T12:13:54+02:00
behaved well in RQG testing (battery for broad range coverage only).
|