I happened to get this crash today once on a 10.5-based branch.
10.5
CURRENT_TEST: innodb.innodb-wl5522-debug
mysqltest: At line 1138: query 'ALTER TABLE test_wl5522.t1 IMPORT TABLESPACE' failed: 2013: Lost connection to MySQL server during query
…
2020-01-21 15:20:37 3 [Note] InnoDB: Phase IV - Flush complete
2020-01-21 15:20:37 3 [Note] InnoDB: `test_wl5522`.`t1` autoinc value set to 248
2020-01-21 15:20:37 0x7f10499c9700 InnoDB: Assertion failure in file /mariadb/10.5-MDEV-12353/storage/innobase/btr/btr0btr.cc line 204
InnoDB: Failing assertion: mach_read_from_4(seg_header + FSEG_HDR_SPACE) == space
…
#6 0x00005599e08d7f49 in btr_root_block_get (index=0x7f101c0e9c88, mode=<optimized out>, mtr=<optimized out>) at /mariadb/10.5-MDEV-12353/storage/innobase/btr/btr0btr.cc:243
#7 0x00005599e08d7f8e in btr_root_get (index=0x2, mtr=0x0) at /mariadb/10.5-MDEV-12353/storage/innobase/btr/btr0btr.cc:271
#8 0x00005599e0903b90 in btr_cur_instant_init_low (index=<optimized out>, mtr=0x7f10499c5f90) at /mariadb/10.5-MDEV-12353/storage/innobase/btr/btr0cur.cc:409
#9 btr_cur_instant_init (table=<optimized out>) at /mariadb/10.5-MDEV-12353/storage/innobase/btr/btr0cur.cc:662
#10 0x00005599e09bfaea in dict_load_table_one (name=..., ignore_err=<optimized out>, fk_tables=...) at /mariadb/10.5-MDEV-12353/storage/innobase/dict/dict0load.cc:3011
#11 0x00005599e09bd4ae in dict_load_table (name=0x7f101c071518 "test_wl5522/t1", ignore_err=DICT_ERR_IGNORE_FK_NOKEY) at /mariadb/10.5-MDEV-12353/storage/innobase/dict/dict0load.cc:2742
#12 0x00005599e09c0e0f in dict_load_table_on_id (table_id=<optimized out>, ignore_err=<optimized out>) at /mariadb/10.5-MDEV-12353/storage/innobase/dict/dict0load.cc:3179
#13 0x00005599e09a439f in dict_table_open_on_id_low (table_id=86, ignore_err=1234983936, cached_only=<optimized out>) at /mariadb/10.5-MDEV-12353/storage/innobase/dict/dict0dict.cc:225
#14 dict_table_open_on_id (table_id=86, dict_locked=true, table_op=DICT_TABLE_OP_NORMAL, thd=0x0, mdl=0x0) at /mariadb/10.5-MDEV-12353/storage/innobase/dict/dict0dict.cc:933
#15 0x00005599e06999dc in ha_innobase::discard_or_import_tablespace (this=0x7f101c11a520, discard=<optimized out>) at /mariadb/10.5-MDEV-12353/storage/innobase/handler/ha_innodb.cc:13396
#16 0x00005599e01c1d1e in mysql_discard_or_import_tablespace (thd=0x7f101c000cf8, table_list=0x7f101c014c98, discard=false) at /mariadb/10.5-MDEV-12353/sql/sql_table.cc:5972
#17 0x00005599e023f2ab in Sql_cmd_discard_import_tablespace::execute (this=0x7f101c015358, thd=0x7f101c000cf8) at /mariadb/10.5-MDEV-12353/sql/sql_alter.cc:559
#18 0x00005599e010a8ac in mysql_execute_command (thd=0x7f101c000cf8) at /mariadb/10.5-MDEV-12353/sql/sql_parse.cc:5959
I believe that the check called by btr_cur_instant_init_low() must be relaxed. It is a hard assertion, affecting non-debug builds as well, because UNIV_BTR_DEBUG is always enabled. That is why I am setting this to Critical.
We should not crash, but return a failure to the caller of btr_cur_instant_init_low(). This may require specializing the btr_root_get() call.
Furthermore, when opening the table during IMPORT TABLESPACE, we must either suppress the tablespace ID validation, or we should temporarily set table->space_id to the ID that is present in the tablespace file.
Last but not least, during IMPORT TABLESPACE, the call to btr_cur_instant_init_low() must not fetch an older page from the buffer pool (it might be there after DISCARD TABLESPACE), but actually read the page from the file. See also the related 10.4+ bug MDEV-18543.
One explanation why the test crashes so rarely could be ‘stale’ pages left in the buffer pool after DISCARD TABLESPACE by MDEV-16283.
Marko Mäkelä
added a comment - One explanation why the test crashes so rarely could be ‘stale’ pages left in the buffer pool after DISCARD TABLESPACE by MDEV-16283 .
I am taking over this, because the test innodb_zip.wl5522_debug_zip fails on 10.5 kvm-asan due to this every time.
I was not able to repeat the failure on a clone of the VM image, but I was able to repeat a failure of innodb.innodb-wl5522-debug locally again.
Marko Mäkelä
added a comment - I am taking over this, because the test innodb_zip.wl5522_debug_zip fails on 10.5 kvm-asan due to this every time.
I was not able to repeat the failure on a clone of the VM image, but I was able to repeat a failure of innodb.innodb-wl5522-debug locally again.
It looks like ALTER TABLE…IMPORT TABLESPACE was always broken in this way. PageConverter::update_index_page() always failed to update all 3 copies of tablespace identifier in the index root pages. InnoDB generally ignored the wasted 4+4 bytes (which are not wasted only in B-tree root pages, but in each B-tree page header!) in BTR_SEG_TOP and BTR_SEG_LEAF.
I was able to rather reliably repeat the error with the following test case:
--source include/have_innodb.inc
--source include/default_charset.inc
--source include/have_sequence.inc
let MYSQLD_DATADIR =`SELECT @@datadir`;
#
# Create a large tablewithdelete marked records, disable purge during
# the update so that we can test the IMPORT purge code.
#
CREATETABLE t1 (
c1 SERIAL,
c2 BIGINT,
c3 VARCHAR(2048),
c4 VARCHAR(2048),
INDEX idx1(c2),
INDEX idx2(c3(512)),
INDEX idx3(c4(512))) Engine=InnoDB;
# Stop purge so that it doesn't remove the delete marked entries.
connect (purge_control,localhost,root);
START TRANSACTION WITH CONSISTENT SNAPSHOT;
connection default;
INSERT INTO t1
SELECT 1 + seq, 1 + (seq MOD 4),
REPEAT(SUBSTR('abcd', 1 + (seq MOD 4), 1), 2048),
REPEAT(SUBSTR('abcd', 1 + (seq MOD 4), 1), 2048)
FROM seq_0_to_127;
--disable_query_log
UPDATE t1 SET c2 = c2 + c1;
UPDATE t1 SET c2 = c2 + c1;
UPDATE t1 SET c2 = c2 + c1;
UPDATE t1 SET c2 = c2 + c1;
UPDATE t1 SET c2 = c2 + c1;
UPDATE t1 SET c2 = c2 + c1;
UPDATE t1 SET c2 = c2 + c1;
UPDATE t1 SET c2 = c2 + c1;
UPDATE t1 SET c2 = c2 + c1;
UPDATE t1 SET c2 = c2 + c1;
UPDATE t1 SET c2 = c2 + c1;
UPDATE t1 SET c2 = c2 + c1;
UPDATE t1 SET c2 = c2 + c1;
UPDATE t1 SET c3 = REPEAT("c2", 1024);
UPDATE t1 SET c4 = REPEAT("c4", 1024);
--enable_query_log
FLUSH TABLES t1 FOR EXPORT;
perl;
do "$ENV{MTR_SUITE_DIR}/include/innodb-util.pl";
ib_backup_tablespaces("test", "t1");
EOF
UNLOCK TABLES;
# Enable normal operation
connection purge_control;
COMMIT;
disconnect purge_control;
connectiondefault;
DROPTABLE t1;
CREATETABLE t1 (
c1 SERIAL,
c2 BIGINT,
c3 VARCHAR(2048),
c4 VARCHAR(2048),
INDEX idx1(c2),
INDEX idx2(c3(512)),
INDEX idx3(c4(512))) Engine=InnoDB;
ALTERTABLE t1 DISCARD TABLESPACE;
perl;
do "$ENV{MTR_SUITE_DIR}/include/innodb-util.pl";
ib_restore_tablespaces("test", "t1");
EOF
ALTERTABLE t1 IMPORT TABLESPACE;
DROPTABLE t1;
It would typically fail on the first or second round. It is unclear why this started to fail just now. Maybe MDEV-12353 was changing the I/O patterns enough.
The consistency check was originally added along with the function btr_root_block_get() probably by me in Oracle’s InnoDB Plugin for MySQL 5.1.
Marko Mäkelä
added a comment - It looks like ALTER TABLE…IMPORT TABLESPACE was always broken in this way. PageConverter::update_index_page() always failed to update all 3 copies of tablespace identifier in the index root pages. InnoDB generally ignored the wasted 4+4 bytes (which are not wasted only in B-tree root pages, but in each B-tree page header!) in BTR_SEG_TOP and BTR_SEG_LEAF .
I was able to rather reliably repeat the error with the following test case:
--source include/have_innodb.inc
--source include/default_charset.inc
--source include/have_sequence.inc
let MYSQLD_DATADIR =` SELECT @@datadir`;
#
# Create a large table with delete marked records, disable purge during
# the update so that we can test the IMPORT purge code.
#
CREATE TABLE t1 (
c1 SERIAL,
c2 BIGINT ,
c3 VARCHAR (2048),
c4 VARCHAR (2048),
INDEX idx1(c2),
INDEX idx2(c3(512)),
INDEX idx3(c4(512))) Engine=InnoDB;
# Stop purge so that it doesn 't remove the delete marked entries.
connect (purge_control,localhost,root);
START TRANSACTION WITH CONSISTENT SNAPSHOT;
connection default;
INSERT INTO t1
SELECT 1 + seq, 1 + (seq MOD 4),
REPEAT(SUBSTR(' abcd ', 1 + (seq MOD 4), 1), 2048),
REPEAT(SUBSTR(' abcd', 1 + (seq MOD 4), 1), 2048)
FROM seq_0_to_127;
--disable_query_log
UPDATE t1 SET c2 = c2 + c1;
UPDATE t1 SET c2 = c2 + c1;
UPDATE t1 SET c2 = c2 + c1;
UPDATE t1 SET c2 = c2 + c1;
UPDATE t1 SET c2 = c2 + c1;
UPDATE t1 SET c2 = c2 + c1;
UPDATE t1 SET c2 = c2 + c1;
UPDATE t1 SET c2 = c2 + c1;
UPDATE t1 SET c2 = c2 + c1;
UPDATE t1 SET c2 = c2 + c1;
UPDATE t1 SET c2 = c2 + c1;
UPDATE t1 SET c2 = c2 + c1;
UPDATE t1 SET c2 = c2 + c1;
UPDATE t1 SET c3 = REPEAT( "c2" , 1024);
UPDATE t1 SET c4 = REPEAT( "c4" , 1024);
--enable_query_log
FLUSH TABLES t1 FOR EXPORT;
perl;
do "$ENV{MTR_SUITE_DIR}/include/innodb-util.pl" ;
ib_backup_tablespaces( "test" , "t1" );
EOF
UNLOCK TABLES;
# Enable normal operation
connection purge_control;
COMMIT ;
disconnect purge_control;
connection default ;
DROP TABLE t1;
CREATE TABLE t1 (
c1 SERIAL,
c2 BIGINT ,
c3 VARCHAR (2048),
c4 VARCHAR (2048),
INDEX idx1(c2),
INDEX idx2(c3(512)),
INDEX idx3(c4(512))) Engine=InnoDB;
ALTER TABLE t1 DISCARD TABLESPACE;
perl;
do "$ENV{MTR_SUITE_DIR}/include/innodb-util.pl" ;
ib_restore_tablespaces( "test" , "t1" );
EOF
ALTER TABLE t1 IMPORT TABLESPACE;
DROP TABLE t1;
It would typically fail on the first or second round. It is unclear why this started to fail just now. Maybe MDEV-12353 was changing the I/O patterns enough.
The consistency check was originally added along with the function btr_root_block_get() probably by me in Oracle’s InnoDB Plugin for MySQL 5.1 .
With the following fix, the crash goes away:
diff --git a/storage/innobase/row/row0import.cc b/storage/innobase/row/row0import.cc
index 494a8b3b729..3c5e27f5b27 100644
--- a/storage/innobase/row/row0import.cc
+++ b/storage/innobase/row/row0import.cc
@@ -1913,6 +1913,23 @@ PageConverter::update_index_page(
return(DB_SUCCESS);
}
+ if (m_index && block->page.id.page_no() == m_index->m_page_no) {
+ byte *b = FIL_PAGE_DATA + PAGE_BTR_SEG_LEAF + FSEG_HDR_SPACE
+ + page;
+ mach_write_to_4(b, block->page.id.space());
+
+ memcpy(FIL_PAGE_DATA + PAGE_BTR_SEG_TOP + FSEG_HDR_SPACE
+ + page, b, 4);
+ if (UNIV_LIKELY_NULL(block->page.zip.data)) {
+ memcpy(&block->page.zip.data[FIL_PAGE_DATA
+ + PAGE_BTR_SEG_TOP
+ + FSEG_HDR_SPACE], b, 4);
+ memcpy(&block->page.zip.data[FIL_PAGE_DATA
+ + PAGE_BTR_SEG_LEAF
+ + FSEG_HDR_SPACE], b, 4);
+ }
+ }
+
#ifdef UNIV_ZIP_DEBUG
ut_a(!block->page.zip.data || page_zip_validate(&block->page.zip, page,
m_index->m_srv_index));
I pushed the fix to 10.5, to get a clean kvm-asan for the 10.5.2 release, without having to wait for several days for a merge from 10.2.
While chasing this down, I worked on cleaning up the IMPORT tests. I plan to push that cleanup along with the fix to the 10.2 branch. I will close the bug once that is done.
Marko Mäkelä
added a comment - I pushed the fix to 10.5, to get a clean kvm-asan for the 10.5.2 release, without having to wait for several days for a merge from 10.2.
While chasing this down, I worked on cleaning up the IMPORT tests. I plan to push that cleanup along with the fix to the 10.2 branch. I will close the bug once that is done.
One explanation why the test crashes so rarely could be ‘stale’ pages left in the buffer pool after DISCARD TABLESPACE by
MDEV-16283.