[MDEV-27234] InnoDB dictionary recovery wrongly uses READ UNCOMMITTED isolation level instead of READ COMMITTED - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 10.6
Fix Version/s: 10.6.8, 10.7.4, 10.8.3, 10.9.1
Component/s: Storage Engine - InnoDB
Labels:
None

Description

The following patch tries to do extra checkpoint in fil_rename_tablespace(). This lead
to fail to write few tablespace file name during fil_names_clear().

diff --git a/storage/innobase/fil/fil0fil.cc b/storage/innobase/fil/fil0fil.cc

index 681cee32fd5..0b6e4764d98 100644

--- a/storage/innobase/fil/fil0fil.cc

+++ b/storage/innobase/fil/fil0fil.cc

@@ -1918,6 +1918,7 @@ fil_rename_tablespace(

        ut_ad(strchr(new_file_name, '/'));

        if (!recv_recovery_is_on()) {

+               log_make_checkpoint();

                mysql_mutex_lock(&log_sys.mutex);

The following test case were failing in 10.6:

innodb.innodb-alter-tempfile

CURRENT_TEST: innodb.innodb-alter-tempfile

mysqltest: At line 45: query 'show create table t1' failed: ER_GET_ERRNO (1030): Got error 194 "Tablespace is missing for a table" from storage engine InnoDB

innodb.alter_crash 'innodb'              w4 [ fail ]

        Test ended at 2021-12-12 10:22:42

CURRENT_TEST: innodb.alter_crash

mysqltest: At line 125: query 'INSERT INTO t2 VALUES (5,6),(7,8)' failed: ER_GET_ERRNO (1030): Got error 194 "Tablespace is missing for a table" from storage engine InnoDB

innodb.instant_alter_crash 'innodb'      w5 [ fail ]

        Test ended at 2021-12-12 10:23:28

CURRENT_TEST: innodb.instant_alter_crash

mysqltest: At line 200: query 'SHOW CREATE TABLE t3' failed: ER_GET_ERRNO (1030): Got error 194 "Tablespace is missing for a table" from storage engine InnoDB

innodb.truncate_crash 'innodb'           w4 [ fail ]

        Test ended at 2021-12-12 10:24:16

CURRENT_TEST: innodb.truncate_crash

mysqltest: At line 21: query 'SELECT COUNT(*) FROM t1' failed: ER_GET_ERRNO (1030): Got error 194 "Tablespace is missing for a table" from storage engine InnoDB

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

data.tar.xz
4.74 MB
2022-03-15 15:44

Issue Links

blocks

MDEV-27836 Assertion `n_fields <= ulint(index->n_fields) + 1' failed

Closed

causes

MDEV-29440 InnoDB instant ALTER TABLE recovery wrongly uses READ COMMITTED isolation level instead of READ UNCOMMITTED

Closed

is caused by

MDEV-25506 Atomic DDL: .frm file is removed and orphan InnoDB tablespace is left behind upon crash recovery

Closed

relates to

MDEV-23721 Assertion ‘node->table->is_temporary() || lock_table_has_locks(node->table)’ failed in row_undo_ins

Closed

MDEV-26954 Failed ALTER causes table corruption after recovery - Tablespace not found

Closed

MDEV-27111 atomic.rename_table test case fails on bb-10.6-MDEV-27022 branch

Closed

(1 relates to)

Activity

Ascending order - Click to sort in descending order

Thirunarayanan Balathandayuthapani added a comment - 2021-12-14 14:42

let datadir= `select @@datadir`;

--let $_server_id= `SELECT @@server_id`

--let $_expect_file_name=$MYSQLTEST_VARDIR/tmp/mysqld.$_server_id.expect

CREATE TABLE t1 (f1 INT NOT NULL, f2 INT NOT NULL) ENGINE=innodb;

SET debug_dbug='+d,innodb_alter_commit_crash_before_commit';

--exec echo "wait" > $MYSQLTEST_VARDIR/tmp/mysqld.1.expect

--error 2013

ALTER TABLE t1 ADD PRIMARY KEY (f2, f1);

--source include/start_mysqld.inc

show create table t1;

Before crashing:
================
InnoDB does the following:
1) create a table t1(table id is 18)
2) create a table #sql-alter-10b0-4.ibd(table id is 19) as part of alter operation
3) InnoDB renames t1.ibd to #sql-ib19.ibd
4) InnoDB renames #sql-alter-10b0-4.ibd to t1.ibd
5) InnoDB drops #sql-ib19.ibd
6) Kills the server before alter finishes.

Data directory has the following:

=================================

'#sql-alter-10b0-4.frm'  '#sql-ib19.ibd'   t1.frm   t1.ibd

InnoDB sys tables
===================
2 entries for table id (18). one is test/t1, other is #sql-ibd19.ibd
Both entries are deleted.

2 entries for table id (19). one is #sql-alter-10b0-4.ibd, other is test/t1.
first one is deleted, 2nd one exist.

Recovery:
=========

InnoDB tries to load the table during trx_resurrect():

table id 18 fails to load the table due to deleted entries in SYS_TABLES record.

table id 19 load was successful and loads the tablespace also. (for tablespace t1)

Undo log processing done by InnoDB:

InnoDB does removal of new entries in SYS_TABLES.

InnoDB does undo of rename operation (t1 to #sql-alter-10b0-4.ibd).
During opening of table, InnoDB fails to re-load the tablespace for #sql-alter-10b04.ibd
due to the following code in dict_load_tablespace():

        /* The tablespace may already be open. */

        table->space = fil_space_for_table_exists_in_mem(table->space_id,

                                                         table->flags);

        if (table->space) {

                return;

and it does rename successfully as well. Now the data directory has
'#sql-alter-10b0-4.frm' '#sql-ib19.ibd' t1.frm '#sql-alter-10b0-4.ibd'

InnoDB tries to undo the rename operation (#sql-ib19.ibd to t1)

InnoDB tries to open the tablespace for t1 and it fails to load the file in data directory.
and it leads to skipping the rename operation. Now, InnoDB doesn't have table t1.

Thirunarayanan Balathandayuthapani added a comment - 2021-12-14 14:42 let datadir= `select @@datadir`; --let $_server_id= `SELECT @@server_id` --let $_expect_file_name=$MYSQLTEST_VARDIR/tmp/mysqld.$_server_id.expect CREATE TABLE t1 (f1 INT NOT NULL, f2 INT NOT NULL) ENGINE=innodb; SET debug_dbug='+d,innodb_alter_commit_crash_before_commit'; --exec echo "wait" > $MYSQLTEST_VARDIR/tmp/mysqld.1.expect --error 2013 ALTER TABLE t1 ADD PRIMARY KEY (f2, f1); --source include/start_mysqld.inc show create table t1; Before crashing: ================ InnoDB does the following: 1) create a table t1(table id is 18) 2) create a table #sql-alter-10b0-4.ibd(table id is 19) as part of alter operation 3) InnoDB renames t1.ibd to #sql-ib19.ibd 4) InnoDB renames #sql-alter-10b0-4.ibd to t1.ibd 5) InnoDB drops #sql-ib19.ibd 6) Kills the server before alter finishes. Data directory has the following: ================================= '#sql-alter-10b0-4.frm' '#sql-ib19.ibd' t1.frm t1.ibd InnoDB sys tables =================== 2 entries for table id (18). one is test/t1, other is #sql-ibd19.ibd Both entries are deleted. 2 entries for table id (19). one is #sql-alter-10b0-4.ibd, other is test/t1. first one is deleted, 2nd one exist. Recovery: ========= InnoDB tries to load the table during trx_resurrect(): table id 18 fails to load the table due to deleted entries in SYS_TABLES record. table id 19 load was successful and loads the tablespace also. (for tablespace t1) Undo log processing done by InnoDB: InnoDB does removal of new entries in SYS_TABLES. InnoDB does undo of rename operation (t1 to #sql-alter-10b0-4.ibd). During opening of table, InnoDB fails to re-load the tablespace for #sql-alter-10b04.ibd due to the following code in dict_load_tablespace(): /* The tablespace may already be open. */ table->space = fil_space_for_table_exists_in_mem(table->space_id, table->flags); if (table->space) { return; } and it does rename successfully as well. Now the data directory has '#sql-alter-10b0-4.frm' '#sql-ib19.ibd' t1.frm '#sql-alter-10b0-4.ibd' InnoDB tries to undo the rename operation (#sql-ib19.ibd to t1) InnoDB tries to open the tablespace for t1 and it fails to load the file in data directory. and it leads to skipping the rename operation. Now, InnoDB doesn't have table t1.

Marko Mäkelä added a comment - 2022-02-07 07:09

I think that the following failure might be related to this:

atomic.rename_trigger 'innodb'           w4 [ fail ]

        Test ended at 2022-02-07 00:56:07

CURRENT_TEST: atomic.rename_trigger

mysqltest: At line 159: query 'let $res=`select t1.a+t2.b+t3.c+t4.d from t1,t2,t3,t4`' failed: ER_UNKNOWN_STORAGE_ENGINE (1286): Unknown storage engine 'InnoDB'

…

"engine: innodb  crash point: ddl_log_rename_after_failed_rename_trigger  position: 3"

More results from queries before failure can be found in /buildbot/amd64-ubuntu-1804-msan/build/mysql-test/var/4/log/rename_trigger.log

 - skipping '/buildbot/amd64-ubuntu-1804-msan/build/mysql-test/var/4/log/atomic.rename_trigger-innodb/'

***Warnings generated in error logs during shutdown after running tests: atomic.rename_trigger

2022-02-07  0:56:06 0 [ERROR] InnoDB: Tablespace 5 was not found at ./test/t5.ibd.

2022-02-07  0:56:06 0 [ERROR] InnoDB: Set innodb_force_recovery=1 to ignore this and to permanently lose all changes to the tablespace.

2022-02-07  0:56:06 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1445] with error Tablespace not found

Marko Mäkelä added a comment - 2022-02-07 07:09 I think that the following failure might be related to this: atomic.rename_trigger 'innodb' w4 [ fail ] Test ended at 2022-02-07 00:56:07 CURRENT_TEST: atomic.rename_trigger mysqltest: At line 159: query 'let $res=`select t1.a+t2.b+t3.c+t4.d from t1,t2,t3,t4`' failed: ER_UNKNOWN_STORAGE_ENGINE (1286): Unknown storage engine 'InnoDB' … "engine: innodb crash point: ddl_log_rename_after_failed_rename_trigger position: 3" More results from queries before failure can be found in /buildbot/amd64-ubuntu-1804-msan/build/mysql-test/var/4/log/rename_trigger.log - skipping '/buildbot/amd64-ubuntu-1804-msan/build/mysql-test/var/4/log/atomic.rename_trigger-innodb/' ***Warnings generated in error logs during shutdown after running tests: atomic.rename_trigger 2022-02-07 0:56:06 0 [ERROR] InnoDB: Tablespace 5 was not found at ./test/t5.ibd. 2022-02-07 0:56:06 0 [ERROR] InnoDB: Set innodb_force_recovery=1 to ignore this and to permanently lose all changes to the tablespace. 2022-02-07 0:56:06 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1445] with error Tablespace not found

Marko Mäkelä added a comment - 2022-02-07 07:21

Another possibly related failure:

encryption.innodb-redo-badkey 'cbc,innodb,strict_crc32' w4 [ fail ]  Found warnings/errors in server log file!

        Test ended at 2022-02-06 22:48:09

line

2022-02-06 22:48:06 0 [ERROR] InnoDB: Missing FILE_CREATE, FILE_DELETE or FILE_MODIFY before FILE_CHECKPOINT for tablespace 5

2022-02-06 22:48:07 0 [ERROR] InnoDB: Missing FILE_CREATE, FILE_DELETE or FILE_MODIFY before FILE_CHECKPOINT for tablespace 5

Marko Mäkelä added a comment - 2022-02-07 07:21 Another possibly related failure : encryption.innodb-redo-badkey 'cbc,innodb,strict_crc32' w4 [ fail ] Found warnings/errors in server log file! Test ended at 2022-02-06 22:48:09 line 2022-02-06 22:48:06 0 [ERROR] InnoDB: Missing FILE_CREATE, FILE_DELETE or FILE_MODIFY before FILE_CHECKPOINT for tablespace 5 2022-02-06 22:48:07 0 [ERROR] InnoDB: Missing FILE_CREATE, FILE_DELETE or FILE_MODIFY before FILE_CHECKPOINT for tablespace 5

Marko Mäkelä added a comment - 2022-03-08 11:34

I tried this on 10.8 (after ~~MDEV-14425~~), but it did not fail:

diff --git a/storage/innobase/fil/fil0fil.cc b/storage/innobase/fil/fil0fil.cc

index 0273fcd9a03..8f02ff51242 100644

--- a/storage/innobase/fil/fil0fil.cc

+++ b/storage/innobase/fil/fil0fil.cc

@@ -1857,6 +1857,7 @@ static bool fil_rename_tablespace(uint32_t id, const char *old_path,

 	ut_ad(strchr(new_file_name, '/'));

 	if (!recv_recovery_is_on()) {

+		log_make_checkpoint();

 		log_sys.latch.wr_lock(SRW_LOCK_CALL);

10.8 e8a2a70cf869cf32a50144704646a5f34e8d7be9 (patched)
innodb.instant_alter_crash 'innodb' [ pass ] 2079
innodb.alter_crash 'innodb' [ pass ] 2489
atomic.rename_trigger 'innodb' [ pass ] 33901

Also, the crash injection test case recovered fine. Does this still fail on the latest 10.6?

Marko Mäkelä added a comment - 2022-03-08 11:34 I tried this on 10.8 (after MDEV-14425 ), but it did not fail: diff --git a/storage/innobase/fil/fil0fil.cc b/storage/innobase/fil/fil0fil.cc index 0273fcd9a03..8f02ff51242 100644 --- a/storage/innobase/fil/fil0fil.cc +++ b/storage/innobase/fil/fil0fil.cc @@ -1857,6 +1857,7 @@ static bool fil_rename_tablespace(uint32_t id, const char *old_path, ut_ad(strchr(new_file_name, '/')); if (!recv_recovery_is_on()) { + log_make_checkpoint(); log_sys.latch.wr_lock(SRW_LOCK_CALL); } 10.8 e8a2a70cf869cf32a50144704646a5f34e8d7be9 (patched) innodb.instant_alter_crash 'innodb' [ pass ] 2079 innodb.alter_crash 'innodb' [ pass ] 2489 atomic.rename_trigger 'innodb' [ pass ] 33901 Also, the crash injection test case recovered fine. Does this still fail on the latest 10.6?

Marko Mäkelä added a comment - 2022-03-08 11:45 - edited

After I ran the crash injection test on 10.7 af87186c1d3f9eeab73bc38b1faa51ba17fa8017, the var/mysqld.1/data/test only contained db.opt, t1.frm, t1.ibd.

Edit: With the patch, the recovery fails. (But not on 10.8 with the patch.)

10.7 af87186c1d3f9eeab73bc38b1faa51ba17fa8017 (patched)
mysqltest: At line 20: query 'show create table t1' failed: ER_GET_ERRNO (1030): Got error 194 "Tablespace is missing for a table" from storage engine InnoDB

Marko Mäkelä added a comment - 2022-03-08 11:45 - edited After I ran the crash injection test on 10.7 af87186c1d3f9eeab73bc38b1faa51ba17fa8017, the var/mysqld.1/data/test only contained db.opt , t1.frm , t1.ibd . Edit: With the patch, the recovery fails. (But not on 10.8 with the patch.) 10.7 af87186c1d3f9eeab73bc38b1faa51ba17fa8017 (patched) mysqltest: At line 20: query 'show create table t1' failed: ER_GET_ERRNO (1030): Got error 194 "Tablespace is missing for a table" from storage engine InnoDB

Marko Mäkelä added a comment - 2022-03-08 13:03

10.8 with ~~MDEV-14425~~ but right before ~~MDEV-27774~~ fails to recover:

10.8 8c7c92adf384b2d0294502d5b38684e68d5eaf27 (patched)
mysqltest: At line 20: query 'show create table t1' failed: ER_GET_ERRNO (1030): Got error 194 "Tablespace is missing for a table" from storage engine InnoDB

Marko Mäkelä added a comment - 2022-03-08 13:03 10.8 with MDEV-14425 but right before MDEV-27774 fails to recover: 10.8 8c7c92adf384b2d0294502d5b38684e68d5eaf27 (patched) mysqltest: At line 20: query 'show create table t1' failed: ER_GET_ERRNO (1030): Got error 194 "Tablespace is missing for a table" from storage engine InnoDB

Thirunarayanan Balathandayuthapani added a comment - 2022-03-08 16:47

Problem exists on 10.6.2 also. The below commit is the reason for the issue:

[~/mariarepo/10.6/10.6-sample]$git log -p -1 1bd681c8b3c

commit 1bd681c8b3c5213ce1f7976940a7dc38b48a0d39

Author: Marko Mäkelä <marko.makela@mariadb.com>

Date:   Wed Jun 9 17:02:55 2021 +0300

    MDEV-25506 (3 of 3): Do not delete .ibd files before commit

Thirunarayanan Balathandayuthapani added a comment - 2022-03-08 16:47 Problem exists on 10.6.2 also. The below commit is the reason for the issue: [~/mariarepo/10.6/10.6-sample]$git log -p -1 1bd681c8b3c commit 1bd681c8b3c5213ce1f7976940a7dc38b48a0d39 Author: Marko Mäkelä <marko.makela@mariadb.com> Date: Wed Jun 9 17:02:55 2021 +0300 MDEV-25506 (3 of 3): Do not delete .ibd files before commit

Thirunarayanan Balathandayuthapani added a comment - 2022-03-09 11:24

trx_resurrect_table_locks() fails to load the table because sys table entries has deleted marked record for it. It leads to failure of loading of tables during
recovery.

Thirunarayanan Balathandayuthapani added a comment - 2022-03-09 11:24 trx_resurrect_table_locks() fails to load the table because sys table entries has deleted marked record for it. It leads to failure of loading of tables during recovery.

Marko Mäkelä added a comment - 2022-03-09 12:20

Thank you. I see that the following code in dict_load_table_on_id() makes an incorrect assumption:

	if (page_rec_is_user_rec(rec)) {

		/*---------------------------------------------------*/

		/* Now we have the record in the secondary index

		containing the table ID and NAME */

check_rec:

		field = rec_get_nth_field_old(

			rec, DICT_FLD__SYS_TABLE_IDS__ID, &len);

		ut_ad(len == 8);

		/* Check if the table id in record is the one searched for */

		if (table_id == mach_read_from_8(field)) {

			if (rec_get_deleted_flag(rec, 0)) {

				/* Until purge has completed, there

				may be delete-marked duplicate records

				for the same SYS_TABLES.ID, but different

				SYS_TABLES.NAME. */

				while (btr_pcur_move_to_next(&pcur, &mtr)) {

					rec = btr_pcur_get_rec(&pcur);

					if (page_rec_is_user_rec(rec)) {

						goto check_rec;

This code is essentially implementing READ UNCOMMITTED while we need READ COMMITTED here. It is wrong to filter out delete-marked secondary index records (records of SYS_TABLES.ID). The secondary index indeed only contains delete-marked records for this SYS_TABLES.ID=18: (ID,NAME)=(18,'test/#sql-ib19'),(18,'test/t1'). In the clustered index, I see a delete-marked record for the former and a non-delete-marked record for 'test/t1', with the wrong ID=19 for the most recent (uncommitted) version. Both records contain DB_TRX_ID=25.

Implementing proper READ COMMITTED might be easiest by allocating a transaction with a read view, and adapting some logic from row_search_mvcc() for accessing the dictionary tables. The functions row_sel_sec_rec_is_for_clust_rec() and row_vers_build_for_consistent_read() have to be part of the solution.

Marko Mäkelä added a comment - 2022-03-09 12:20 Thank you. I see that the following code in dict_load_table_on_id() makes an incorrect assumption: if (page_rec_is_user_rec(rec)) { /*---------------------------------------------------*/ /* Now we have the record in the secondary index containing the table ID and NAME */ check_rec: field = rec_get_nth_field_old( rec, DICT_FLD__SYS_TABLE_IDS__ID, &len); ut_ad(len == 8); /* Check if the table id in record is the one searched for */ if (table_id == mach_read_from_8(field)) { if (rec_get_deleted_flag(rec, 0)) { /* Until purge has completed, there may be delete-marked duplicate records for the same SYS_TABLES.ID, but different SYS_TABLES.NAME. */ while (btr_pcur_move_to_next(&pcur, &mtr)) { rec = btr_pcur_get_rec(&pcur); if (page_rec_is_user_rec(rec)) { goto check_rec; } } This code is essentially implementing READ UNCOMMITTED while we need READ COMMITTED here. It is wrong to filter out delete-marked secondary index records (records of SYS_TABLES.ID ). The secondary index indeed only contains delete-marked records for this SYS_TABLES.ID=18 : (ID,NAME)=(18,'test/#sql-ib19'),(18,'test/t1'). In the clustered index, I see a delete-marked record for the former and a non-delete-marked record for 'test/t1', with the wrong ID=19 for the most recent (uncommitted) version. Both records contain DB_TRX_ID=25 . Implementing proper READ COMMITTED might be easiest by allocating a transaction with a read view, and adapting some logic from row_search_mvcc() for accessing the dictionary tables. The functions row_sel_sec_rec_is_for_clust_rec() and row_vers_build_for_consistent_read() have to be part of the solution.

Marko Mäkelä added a comment - 2022-03-09 14:21 - edited

I think that we can implement "poor man’s MVCC" for the secondary index scan like this:

diff --git a/storage/innobase/dict/dict0load.cc b/storage/innobase/dict/dict0load.cc

index 71dedd48c29..88351b06201 100644

--- a/storage/innobase/dict/dict0load.cc

+++ b/storage/innobase/dict/dict0load.cc

@@ -2536,11 +2536,16 @@ dict_load_table_on_id(

 		/* Check if the table id in record is the one searched for */

 		if (table_id == mach_read_from_8(field)) {

-			if (rec_get_deleted_flag(rec, 0)) {

-				/* Until purge has completed, there

-				may be delete-marked duplicate records

-				for the same SYS_TABLES.ID, but different

-				SYS_TABLES.NAME. */

+			field = rec_get_nth_field_old(rec,

+				DICT_FLD__SYS_TABLE_IDS__NAME, &len);

+			table = dict_sys.load_table(

+				{reinterpret_cast<const char*>(field),

+				 len}, ignore_err);

+			if (table && table->id != table_id) {

+				ut_ad(rec_get_deleted_flag(rec, 0));

+				table = nullptr;

+			}

+			if (!table) {

 				while (btr_pcur_move_to_next(&pcur, &mtr)) {

 					rec = btr_pcur_get_rec(&pcur);

@@ -2548,13 +2553,6 @@ dict_load_table_on_id(

 						goto check_rec;

-			} else {

-				/* Now we get the table name from the record */

-				field = rec_get_nth_field_old(rec,

-					DICT_FLD__SYS_TABLE_IDS__NAME, &len);

-				table = dict_sys.load_table(

-					{reinterpret_cast<const char*>(field),

-					 len}, ignore_err);

For the first matching record, dict_sys.load_table() would find nothing, because it is still performing READ UNCOMMITTED and that record is delete-marked. For the second one, it returns a table with the wrong id=19 because it fails to fetch the latest committed version.

Edit: I was thinking to use purge_sys.view for the MVCC read, but I realized that READ COMMITTED semantics would require the newest possible read view instead of the oldest one.

Marko Mäkelä added a comment - 2022-03-09 14:21 - edited I think that we can implement "poor man’s MVCC" for the secondary index scan like this: diff --git a/storage/innobase/dict/dict0load.cc b/storage/innobase/dict/dict0load.cc index 71dedd48c29..88351b06201 100644 --- a/storage/innobase/dict/dict0load.cc +++ b/storage/innobase/dict/dict0load.cc @@ -2536,11 +2536,16 @@ dict_load_table_on_id( /* Check if the table id in record is the one searched for */ if (table_id == mach_read_from_8(field)) { - if (rec_get_deleted_flag(rec, 0)) { - /* Until purge has completed, there - may be delete-marked duplicate records - for the same SYS_TABLES.ID, but different - SYS_TABLES.NAME. */ + field = rec_get_nth_field_old(rec, + DICT_FLD__SYS_TABLE_IDS__NAME, &len); + table = dict_sys.load_table( + {reinterpret_cast<const char*>(field), + len}, ignore_err); + if (table && table->id != table_id) { + ut_ad(rec_get_deleted_flag(rec, 0)); + table = nullptr; + } + if (!table) { while (btr_pcur_move_to_next(&pcur, &mtr)) { rec = btr_pcur_get_rec(&pcur); @@ -2548,13 +2553,6 @@ dict_load_table_on_id( goto check_rec; } } - } else { - /* Now we get the table name from the record */ - field = rec_get_nth_field_old(rec, - DICT_FLD__SYS_TABLE_IDS__NAME, &len); - table = dict_sys.load_table( - {reinterpret_cast<const char*>(field), - len}, ignore_err); } } } For the first matching record, dict_sys.load_table() would find nothing, because it is still performing READ UNCOMMITTED and that record is delete-marked. For the second one, it returns a table with the wrong id=19 because it fails to fetch the latest committed version. Edit: I was thinking to use purge_sys.view for the MVCC read, but I realized that READ COMMITTED semantics would require the newest possible read view instead of the oldest one.

Marko Mäkelä added a comment - 2022-03-09 14:57

It may be much easier to roll back a recovered DDL transaction before resurrecting the table locks, by simply continuing to use READ UNCOMMITTED, which thanks to the added rollback would be equivalent to READ COMMITTED.

If we implement READ COMMITTED, it will have to be implemented for accessing all dictionary tables. For example, if both the table and some indexed columns or base columns of virtual columns were renamed in a recovered incomplete DDL transaction, from each dictionary table we must fetch the correct names.

Marko Mäkelä added a comment - 2022-03-09 14:57 It may be much easier to roll back a recovered DDL transaction before resurrecting the table locks, by simply continuing to use READ UNCOMMITTED , which thanks to the added rollback would be equivalent to READ COMMITTED . If we implement READ COMMITTED , it will have to be implemented for accessing all dictionary tables. For example, if both the table and some indexed columns or base columns of virtual columns were renamed in a recovered incomplete DDL transaction, from each dictionary table we must fetch the correct names.

Marko Mäkelä added a comment - 2022-03-09 15:35

It turns out that we do have to implement the READ COMMITTED access to the data dictionary. Even if we defer trx_resurrect_table_locks() after the trx_rollback_active(false) call that would roll back the DDL transaction, that very rollback would fail in its attempt to load the table definition.

Marko Mäkelä added a comment - 2022-03-09 15:35 It turns out that we do have to implement the READ COMMITTED access to the data dictionary. Even if we defer trx_resurrect_table_locks() after the trx_rollback_active(false) call that would roll back the DDL transaction, that very rollback would fail in its attempt to load the table definition.

Marko Mäkelä added a comment - 2022-03-10 09:02

Here is a test that halts the commit and then forces a checkpoint so that the FILE_ record recovery will not interfere with the processing of the undo logs and the data dictionary tables:

--source include/have_innodb.inc

--source include/have_debug.inc

--source include/have_debug_sync.inc

CREATE TABLE t1 (a INT NOT NULL) ENGINE=InnoDB STATS_PERSISTENT=0;

connect ddl,localhost,root;

SET DEBUG_SYNC='after_trx_committed_in_memory SIGNAL stuck WAIT_FOR ever EXECUTE 2';

send ALTER TABLE t1 ADD PRIMARY KEY(a);

connection default;

SET DEBUG_SYNC='now WAIT_FOR stuck';

SET DEBUG_SYNC='now SIGNAL ever';

SET DEBUG_SYNC='now WAIT_FOR stuck';

SET GLOBAL innodb_log_checkpoint_now=ON;

--let $shutdown_timeout=0

--source include/restart_mysqld.inc

disconnect ddl;

SHOW CREATE TABLE t1;

SELECT * FROM t1;

DROP TABLE t1;

The test actually passes (with SHOW CREATE TABLE wrongly indicating that the table includes a PRIMARY KEY), and an orphan table will be left behind:

10.9 35fcae10408aa4c54967d118c8ccde1c3736b85f
MTR's internal check of the test case 'main.MDEV-27234' failed.
…
--- /dev/shm/10.9/mysql-test/var/tmp/check-mysqld_1.result 2022-03-10 10:53:59.764275437 +0200
+++ /dev/shm/10.9/mysql-test/var/tmp/check-mysqld_1.reject 2022-03-10 10:54:00.352281258 +0200
@@ -1092,3 +1092,4 @@
partition 1.0 DISABLED STORAGE ENGINE 100900.0 NULL NULL Mikael Ronstrom, MySQL AB Partition Storage Engine Helper GPL OFF Stable 1.0
VARIABLE_NAME VARIABLE_VALUE
DEBUG_SYNC ON - current signal: ''
+#sql-ib19.ibd

mysqltest: Result length mismatch

On 10.6 36a19f94ce102be87b22fdc7ad147ab7d5ed4331 the test behaves in exactly the same way.

Marko Mäkelä added a comment - 2022-03-10 09:02 Here is a test that halts the commit and then forces a checkpoint so that the FILE_ record recovery will not interfere with the processing of the undo logs and the data dictionary tables: --source include/have_innodb.inc --source include/have_debug.inc --source include/have_debug_sync.inc CREATE TABLE t1 (a INT NOT NULL ) ENGINE=InnoDB STATS_PERSISTENT=0; connect ddl,localhost,root; SET DEBUG_SYNC= 'after_trx_committed_in_memory SIGNAL stuck WAIT_FOR ever EXECUTE 2' ; send ALTER TABLE t1 ADD PRIMARY KEY (a); connection default ; SET DEBUG_SYNC= 'now WAIT_FOR stuck' ; SET DEBUG_SYNC= 'now SIGNAL ever' ; SET DEBUG_SYNC= 'now WAIT_FOR stuck' ; SET GLOBAL innodb_log_checkpoint_now= ON ; --let $shutdown_timeout=0 --source include/restart_mysqld.inc disconnect ddl; SHOW CREATE TABLE t1; SELECT * FROM t1; DROP TABLE t1; The test actually passes (with SHOW CREATE TABLE wrongly indicating that the table includes a PRIMARY KEY ), and an orphan table will be left behind: 10.9 35fcae10408aa4c54967d118c8ccde1c3736b85f MTR's internal check of the test case 'main.MDEV-27234' failed. … --- /dev/shm/10.9/mysql-test/var/tmp/check-mysqld_1.result 2022-03-10 10:53:59.764275437 +0200 +++ /dev/shm/10.9/mysql-test/var/tmp/check-mysqld_1.reject 2022-03-10 10:54:00.352281258 +0200 @@ -1092,3 +1092,4 @@ partition 1.0 DISABLED STORAGE ENGINE 100900.0 NULL NULL Mikael Ronstrom, MySQL AB Partition Storage Engine Helper GPL OFF Stable 1.0 VARIABLE_NAME VARIABLE_VALUE DEBUG_SYNC ON - current signal: '' +#sql-ib19.ibd mysqltest: Result length mismatch On 10.6 36a19f94ce102be87b22fdc7ad147ab7d5ed4331 the test behaves in exactly the same way.

Marko Mäkelä added a comment - 2022-03-10 09:40

The only dictionary tables that contain any secondary indexes are SYS_TABLES and SYS_FOREIGN. The function dict_load_foreigns() does access the secondary index on SYS_FOREIGN.FOR_NAME, in the READ UNCOMMITTED fashion. We must not filter out delete-marked records there, and we must make dict_load_foreign() retrieve the last committed version, or return a status that no committed record existed.

Every access to the clustered indexes must be corrected in a similar way, so that we will load the latest committed version (READ COMMITTED).

I expect that the function trx_t::evict_table() can be removed as part of fixing this bug.

Marko Mäkelä added a comment - 2022-03-10 09:40 The only dictionary tables that contain any secondary indexes are SYS_TABLES and SYS_FOREIGN . The function dict_load_foreigns() does access the secondary index on SYS_FOREIGN.FOR_NAME , in the READ UNCOMMITTED fashion. We must not filter out delete-marked records there, and we must make dict_load_foreign() retrieve the last committed version, or return a status that no committed record existed. Every access to the clustered indexes must be corrected in a similar way, so that we will load the latest committed version ( READ COMMITTED ). I expect that the function trx_t::evict_table() can be removed as part of fixing this bug.

Marko Mäkelä added a comment - 2022-03-15 15:46

data.tar.xz is a data directory with innodb_page_size=64k that might demonstrate this bug. On accessing the table t3 (on which DDL was in progress when the server was killed), the following mismatch would be reported:

10.6 dafc5fb9c17637c6233e8a73c890f158d645a37c
2022-03-15 16:16:19 3 [ERROR] InnoDB could not find key no 0 with name MarvÃ£o_uidx1 from dict cache for table test/t3

Marko Mäkelä added a comment - 2022-03-15 15:46 data.tar.xz is a data directory with innodb_page_size=64k that might demonstrate this bug. On accessing the table t3 (on which DDL was in progress when the server was killed), the following mismatch would be reported: 10.6 dafc5fb9c17637c6233e8a73c890f158d645a37c 2022-03-15 16:16:19 3 [ERROR] InnoDB could not find key no 0 with name MarvÃ£o_uidx1 from dict cache for table test/t3

Marko Mäkelä added a comment - 2022-03-21 12:57

Some adjustment may be needed for recovering instant ALTER TABLE metadata (~~MDEV-11369~~ and ~~MDEV-15562~~). Currently, the rec_get_offsets() call in btr_cur_instant_init_low() assumes that dict_index_t::n_fields corresponds to the READ UNCOMMITTED version of the table definition. An assertion would fail during the first restart of the test innodb.instant_alter_crash.

Marko Mäkelä added a comment - 2022-03-21 12:57 Some adjustment may be needed for recovering instant ALTER TABLE metadata ( MDEV-11369 and MDEV-15562 ). Currently, the rec_get_offsets() call in btr_cur_instant_init_low() assumes that dict_index_t::n_fields corresponds to the READ UNCOMMITTED version of the table definition. An assertion would fail during the first restart of the test innodb.instant_alter_crash .

Marko Mäkelä added a comment - 2022-03-21 15:26 - edited

The trx_t::evict_table() is necessary for evicting table definitions on the rollback of DDL transactions, for example, on a CREATE TABLE that failed due to incorrect FOREIGN KEY constraints.

My current work-in-progress fix still shows the index mismatch message for data.tar.xz:

2022-03-21 17:11:23 3 [ERROR] Cannot find index MarvÃ£o_uidx1 in InnoDB index dictionary.

2022-03-21 17:11:23 3 [ERROR] InnoDB indexes are inconsistent with what defined in .frm for table ./test/t3

2022-03-21 17:11:23 3 [ERROR] InnoDB could not find key no 0 with name MarvÃ£o_uidx1 from dict cache for table test/t3

2022-03-21 17:11:23 3 [ERROR] InnoDB: Table test/t3 contains 3 indexes inside InnoDB, which is different from the number of indexes 3 defined in the .frm file. See https://mariadb.com/kb/en/innodb-troubleshooting/

Marko Mäkelä added a comment - 2022-03-21 15:26 - edited The trx_t::evict_table() is necessary for evicting table definitions on the rollback of DDL transactions, for example, on a CREATE TABLE that failed due to incorrect FOREIGN KEY constraints. My current work-in-progress fix still shows the index mismatch message for data.tar.xz : 2022-03-21 17:11:23 3 [ERROR] Cannot find index MarvÃ£o_uidx1 in InnoDB index dictionary. 2022-03-21 17:11:23 3 [ERROR] InnoDB indexes are inconsistent with what defined in .frm for table ./test/t3 2022-03-21 17:11:23 3 [ERROR] InnoDB could not find key no 0 with name MarvÃ£o_uidx1 from dict cache for table test/t3 2022-03-21 17:11:23 3 [ERROR] InnoDB: Table test/t3 contains 3 indexes inside InnoDB, which is different from the number of indexes 3 defined in the .frm file. See https://mariadb.com/kb/en/innodb-troubleshooting/

Marko Mäkelä added a comment - 2022-03-22 15:31

I spent a lot of time on the test case before I confirmed with rr replay that before the server is killed, the ADD PRIMARY KEY was actually committed. The actual reason why the #sql-ib file was not dropped is that the fil_delete_tablespace() call in row_purge_remove_clust_if_poss_low() did not find the tablespace, because dict_check_sys_tables() was skipping delete-marked records of SYS_TABLES.

My work-in-progress fix of that broke a number of mtr tests (mostly IMPORT TABLESPACE) and still fails to recover data.tar.xz correctly.

Marko Mäkelä added a comment - 2022-03-22 15:31 I spent a lot of time on the test case before I confirmed with rr replay that before the server is killed, the ADD PRIMARY KEY was actually committed. The actual reason why the #sql-ib file was not dropped is that the fil_delete_tablespace() call in row_purge_remove_clust_if_poss_low() did not find the tablespace, because dict_check_sys_tables() was skipping delete-marked records of SYS_TABLES . My work-in-progress fix of that broke a number of mtr tests (mostly IMPORT TABLESPACE ) and still fails to recover data.tar.xz correctly.

Marko Mäkelä added a comment - 2022-03-23 12:31

The data.tar.xz cannot be helped by this fix. The warning message mentions an unexpectedly missing index. In fact, in dict_load_index_low(), a delete-marked record of SYS_INDEXES for it was encountered and correctly filtered out in my development branch, because no active transaction for the DB_TRX_ID was found. We would need an rr replay trace of such a failure to understand why the DDL recovery apparently chose the wrong .frm file when recovering from a crash during DROP INDEX MarvÃ£o_uidx1.

Marko Mäkelä added a comment - 2022-03-23 12:31 The data.tar.xz cannot be helped by this fix. The warning message mentions an unexpectedly missing index. In fact, in dict_load_index_low() , a delete-marked record of SYS_INDEXES for it was encountered and correctly filtered out in my development branch, because no active transaction for the DB_TRX_ID was found. We would need an rr replay trace of such a failure to understand why the DDL recovery apparently chose the wrong .frm file when recovering from a crash during DROP INDEX MarvÃ£o_uidx1 .

Marko Mäkelä added a comment - 2022-03-24 12:53

I was able to fix the recovery of data.tar.xz. It turned out that there was a SYS_INDEXES record for an ADD INDEX stub that had been deleted by a committed transaction. The table->def_trx_id was wrongly updated to the DB_TRX_ID of that record, and InnoDB wrongly informed the DDL log recovery that the ALTER TABLE transaction had been committed inside InnoDB. Yes, we must pay attention to the DB_TRX_ID of committed delete-marked SYS_INDEXES records, but only if the SYS_INDEXES.NAME does not identify the index as incomplete (by starting with a 0xff byte).

For the crash in the recovery of the test innodb.instant_alter_crash I got an idea: btr_cur_instant_init_low() should perform a READ COMMITTED of the instant ADD COLUMN metadata record. In the case of the crash, that record had been inserted by an uncommitted transaction.

Marko Mäkelä added a comment - 2022-03-24 12:53 I was able to fix the recovery of data.tar.xz . It turned out that there was a SYS_INDEXES record for an ADD INDEX stub that had been deleted by a committed transaction. The table->def_trx_id was wrongly updated to the DB_TRX_ID of that record, and InnoDB wrongly informed the DDL log recovery that the ALTER TABLE transaction had been committed inside InnoDB. Yes, we must pay attention to the DB_TRX_ID of committed delete-marked SYS_INDEXES records, but only if the SYS_INDEXES.NAME does not identify the index as incomplete (by starting with a 0xff byte). For the crash in the recovery of the test innodb.instant_alter_crash I got an idea: btr_cur_instant_init_low() should perform a READ COMMITTED of the instant ADD COLUMN metadata record. In the case of the crash, that record had been inserted by an uncommitted transaction.

Matthias Leich added a comment - 2022-03-24 18:09 - edited

origin/bb-10.6-~~MDEV-27234~~ 13dfdd3953c076177f8b61355cb70a887a958b4b 2022-03-24T12:13:54+02:00
behaved well in RQG testing (battery for broad range coverage only).

Matthias Leich added a comment - 2022-03-24 18:09 - edited origin/bb-10.6- MDEV-27234 13dfdd3953c076177f8b61355cb70a887a958b4b 2022-03-24T12:13:54+02:00 behaved well in RQG testing (battery for broad range coverage only).

People

Assignee:: Marko Mäkelä

Reporter:: Thirunarayanan Balathandayuthapani

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 2021-12-12 05:14

Updated:: 2022-12-01 14:08

Resolved:: 2022-03-28 06:13

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration