[MDEV-21549] IMPORT TABLESPACE fails to adjust all tablespace ID in root pages Created: 2020-01-21  Updated: 2020-05-06  Resolved: 2020-03-20

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB, Storage Engine - XtraDB
Affects Version/s: 10.0, 10.1, 10.2, 10.3, 10.4, 10.5
Fix Version/s: 10.1.45, 10.2.32, 10.3.23, 10.4.13, 10.5.3

Type: Bug Priority: Critical
Reporter: Marko Mäkelä Assignee: Marko Mäkelä
Resolution: Fixed Votes: 2
Labels: affects-tests, crash

Issue Links:
Blocks
blocks MDEV-18762 Support easy restore of partial backup Closed
Duplicate
is duplicated by MDEV-22431 IMPORT TABLESPACE, service restart Closed
is duplicated by MDEV-22481 Server crash after access to table wi... Closed
Problem/Incident
is caused by MDEV-18295 IMPORT TABLESPACE fails with instant-... Closed
Relates
relates to MDEV-16283 ALTER TABLE...DISCARD TABLESPACE stil... Closed
relates to MDEV-21407 Crash when restarting server after IM... Closed
relates to MDEV-18543 IMPORT TABLESPACE fails after instant... Closed

 Description   

I happened to get this crash today once on a 10.5-based branch.

10.5

CURRENT_TEST: innodb.innodb-wl5522-debug
mysqltest: At line 1138: query 'ALTER TABLE test_wl5522.t1 IMPORT TABLESPACE' failed: 2013: Lost connection to MySQL server during query
2020-01-21 15:20:37 3 [Note] InnoDB: Phase IV - Flush complete
2020-01-21 15:20:37 3 [Note] InnoDB: `test_wl5522`.`t1` autoinc value set to 248
2020-01-21 15:20:37 0x7f10499c9700  InnoDB: Assertion failure in file /mariadb/10.5-MDEV-12353/storage/innobase/btr/btr0btr.cc line 204
InnoDB: Failing assertion: mach_read_from_4(seg_header + FSEG_HDR_SPACE) == space
#6  0x00005599e08d7f49 in btr_root_block_get (index=0x7f101c0e9c88, mode=<optimized out>, mtr=<optimized out>) at /mariadb/10.5-MDEV-12353/storage/innobase/btr/btr0btr.cc:243
#7  0x00005599e08d7f8e in btr_root_get (index=0x2, mtr=0x0) at /mariadb/10.5-MDEV-12353/storage/innobase/btr/btr0btr.cc:271
#8  0x00005599e0903b90 in btr_cur_instant_init_low (index=<optimized out>, mtr=0x7f10499c5f90) at /mariadb/10.5-MDEV-12353/storage/innobase/btr/btr0cur.cc:409
#9  btr_cur_instant_init (table=<optimized out>) at /mariadb/10.5-MDEV-12353/storage/innobase/btr/btr0cur.cc:662
#10 0x00005599e09bfaea in dict_load_table_one (name=..., ignore_err=<optimized out>, fk_tables=...) at /mariadb/10.5-MDEV-12353/storage/innobase/dict/dict0load.cc:3011
#11 0x00005599e09bd4ae in dict_load_table (name=0x7f101c071518 "test_wl5522/t1", ignore_err=DICT_ERR_IGNORE_FK_NOKEY) at /mariadb/10.5-MDEV-12353/storage/innobase/dict/dict0load.cc:2742
#12 0x00005599e09c0e0f in dict_load_table_on_id (table_id=<optimized out>, ignore_err=<optimized out>) at /mariadb/10.5-MDEV-12353/storage/innobase/dict/dict0load.cc:3179
#13 0x00005599e09a439f in dict_table_open_on_id_low (table_id=86, ignore_err=1234983936, cached_only=<optimized out>) at /mariadb/10.5-MDEV-12353/storage/innobase/dict/dict0dict.cc:225
#14 dict_table_open_on_id (table_id=86, dict_locked=true, table_op=DICT_TABLE_OP_NORMAL, thd=0x0, mdl=0x0) at /mariadb/10.5-MDEV-12353/storage/innobase/dict/dict0dict.cc:933
#15 0x00005599e06999dc in ha_innobase::discard_or_import_tablespace (this=0x7f101c11a520, discard=<optimized out>) at /mariadb/10.5-MDEV-12353/storage/innobase/handler/ha_innodb.cc:13396
#16 0x00005599e01c1d1e in mysql_discard_or_import_tablespace (thd=0x7f101c000cf8, table_list=0x7f101c014c98, discard=false) at /mariadb/10.5-MDEV-12353/sql/sql_table.cc:5972
#17 0x00005599e023f2ab in Sql_cmd_discard_import_tablespace::execute (this=0x7f101c015358, thd=0x7f101c000cf8) at /mariadb/10.5-MDEV-12353/sql/sql_alter.cc:559
#18 0x00005599e010a8ac in mysql_execute_command (thd=0x7f101c000cf8) at /mariadb/10.5-MDEV-12353/sql/sql_parse.cc:5959
#19 0x00005599e0105818 in mysql_parse (thd=0x7f101c000cf8, rawbuf=0x7f101c014b80 "ALTER TABLE test_wl5522.t1 IMPORT TABLESPACE", length=<optimized out>, parser_state=<optimized out>, is_com_multi=<optimized out>, is_next_command=<optimized out>) at /mariadb/10.5-MDEV-12353/sql/sql_parse.cc:7988

The following code was added to ha_innobase::discard_or_import_tablespace() in MDEV-18295:

       /* Evict and reload the table definition in order to invoke
       btr_cur_instant_init(). */
       table_id_t id = m_prebuilt->table->id;
       ut_ad(id);
       mutex_enter(&dict_sys->mutex);
       dict_table_close(m_prebuilt->table, TRUE, FALSE);
       dict_table_remove_from_cache(m_prebuilt->table);
       m_prebuilt->table = dict_table_open_on_id(id, TRUE,
                                                 DICT_TABLE_OP_NORMAL);

I believe that the check called by btr_cur_instant_init_low() must be relaxed. It is a hard assertion, affecting non-debug builds as well, because UNIV_BTR_DEBUG is always enabled. That is why I am setting this to Critical.

We should not crash, but return a failure to the caller of btr_cur_instant_init_low(). This may require specializing the btr_root_get() call.

Furthermore, when opening the table during IMPORT TABLESPACE, we must either suppress the tablespace ID validation, or we should temporarily set table->space_id to the ID that is present in the tablespace file.

Last but not least, during IMPORT TABLESPACE, the call to btr_cur_instant_init_low() must not fetch an older page from the buffer pool (it might be there after DISCARD TABLESPACE), but actually read the page from the file. See also the related 10.4+ bug MDEV-18543.



 Comments   
Comment by Marko Mäkelä [ 2020-01-21 ]

One explanation why the test crashes so rarely could be ‘stale’ pages left in the buffer pool after DISCARD TABLESPACE by MDEV-16283.

Comment by Marko Mäkelä [ 2020-03-20 ]

I am taking over this, because the test innodb_zip.wl5522_debug_zip fails on 10.5 kvm-asan due to this every time.

I was not able to repeat the failure on a clone of the VM image, but I was able to repeat a failure of innodb.innodb-wl5522-debug locally again.

Comment by Marko Mäkelä [ 2020-03-20 ]

It looks like ALTER TABLE…IMPORT TABLESPACE was always broken in this way. PageConverter::update_index_page() always failed to update all 3 copies of tablespace identifier in the index root pages. InnoDB generally ignored the wasted 4+4 bytes (which are not wasted only in B-tree root pages, but in each B-tree page header!) in BTR_SEG_TOP and BTR_SEG_LEAF.

I was able to rather reliably repeat the error with the following test case:

--source include/have_innodb.inc
--source include/default_charset.inc
--source include/have_sequence.inc
 
let MYSQLD_DATADIR =`SELECT @@datadir`;
 
#
# Create a large table with delete marked records, disable purge during
# the update so that we can test the IMPORT purge code.
#
CREATE TABLE t1 (
	c1 SERIAL,
	c2 BIGINT,
	c3 VARCHAR(2048),
	c4 VARCHAR(2048),
	INDEX idx1(c2),
	INDEX idx2(c3(512)),
	INDEX idx3(c4(512))) Engine=InnoDB;
 
# Stop purge so that it doesn't remove the delete marked entries.
connect (purge_control,localhost,root);
START TRANSACTION WITH CONSISTENT SNAPSHOT;
connection default;
 
INSERT INTO t1
SELECT 1 + seq, 1 + (seq MOD 4),
 REPEAT(SUBSTR('abcd', 1 + (seq MOD 4), 1), 2048),
 REPEAT(SUBSTR('abcd', 1 + (seq MOD 4), 1), 2048)
FROM seq_0_to_127;
 
--disable_query_log
UPDATE t1 SET c2 = c2 + c1;
UPDATE t1 SET c2 = c2 + c1;
UPDATE t1 SET c2 = c2 + c1;
UPDATE t1 SET c2 = c2 + c1;
UPDATE t1 SET c2 = c2 + c1;
UPDATE t1 SET c2 = c2 + c1;
UPDATE t1 SET c2 = c2 + c1;
UPDATE t1 SET c2 = c2 + c1;
UPDATE t1 SET c2 = c2 + c1;
UPDATE t1 SET c2 = c2 + c1;
UPDATE t1 SET c2 = c2 + c1;
UPDATE t1 SET c2 = c2 + c1;
UPDATE t1 SET c2 = c2 + c1;
UPDATE t1 SET c3 = REPEAT("c2", 1024);
UPDATE t1 SET c4 = REPEAT("c4", 1024);
--enable_query_log
 
FLUSH TABLES t1 FOR EXPORT;
 
perl;
do "$ENV{MTR_SUITE_DIR}/include/innodb-util.pl";
ib_backup_tablespaces("test", "t1");
EOF
 
UNLOCK TABLES;
 
# Enable normal operation
connection purge_control;
COMMIT;
disconnect purge_control;
connection default;
 
DROP TABLE t1;
 
CREATE TABLE t1 (
	c1 SERIAL,
	c2 BIGINT,
	c3 VARCHAR(2048),
	c4 VARCHAR(2048),
	INDEX idx1(c2),
	INDEX idx2(c3(512)),
	INDEX idx3(c4(512))) Engine=InnoDB;
 
ALTER TABLE t1 DISCARD TABLESPACE;
 
perl;
do "$ENV{MTR_SUITE_DIR}/include/innodb-util.pl";
ib_restore_tablespaces("test", "t1");
EOF
 
ALTER TABLE t1 IMPORT TABLESPACE;
DROP TABLE t1;

It would typically fail on the first or second round. It is unclear why this started to fail just now. Maybe MDEV-12353 was changing the I/O patterns enough.

The consistency check was originally added along with the function btr_root_block_get() probably by me in Oracle’s InnoDB Plugin for MySQL 5.1.

With the following fix, the crash goes away:

diff --git a/storage/innobase/row/row0import.cc b/storage/innobase/row/row0import.cc
index 494a8b3b729..3c5e27f5b27 100644
--- a/storage/innobase/row/row0import.cc
+++ b/storage/innobase/row/row0import.cc
@@ -1913,6 +1913,23 @@ PageConverter::update_index_page(
 		return(DB_SUCCESS);
 	}
 
+	if (m_index && block->page.id.page_no() == m_index->m_page_no) {
+		byte *b = FIL_PAGE_DATA + PAGE_BTR_SEG_LEAF + FSEG_HDR_SPACE
+			+ page;
+		mach_write_to_4(b, block->page.id.space());
+
+		memcpy(FIL_PAGE_DATA + PAGE_BTR_SEG_TOP + FSEG_HDR_SPACE
+		       + page, b, 4);
+		if (UNIV_LIKELY_NULL(block->page.zip.data)) {
+			memcpy(&block->page.zip.data[FIL_PAGE_DATA
+						     + PAGE_BTR_SEG_TOP
+						     + FSEG_HDR_SPACE], b, 4);
+			memcpy(&block->page.zip.data[FIL_PAGE_DATA
+						     + PAGE_BTR_SEG_LEAF
+						     + FSEG_HDR_SPACE], b, 4);
+		}
+	}
+
 #ifdef UNIV_ZIP_DEBUG
 	ut_a(!block->page.zip.data || page_zip_validate(&block->page.zip, page,
 							m_index->m_srv_index));

Comment by Marko Mäkelä [ 2020-03-20 ]

I pushed the fix to 10.5, to get a clean kvm-asan for the 10.5.2 release, without having to wait for several days for a merge from 10.2.

While chasing this down, I worked on cleaning up the IMPORT tests. I plan to push that cleanup along with the fix to the 10.2 branch. I will close the bug once that is done.

Comment by Marko Mäkelä [ 2020-03-20 ]

I spent some additional effort to clean up the import/export tests, hoping to make them run slightly faster.

Generated at Thu Feb 08 09:07:59 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.