[MDEV-27577] Recovery behaves differently with DESC PK in innodb.leaf_page_corrupted_during_recovery - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Minor
Resolution: Not a Bug
Affects Version/s: N/A
Fix Version/s: N/A
Component/s: Storage Engine - InnoDB, Tests
Labels:
None

Description

I'm almost certain that it's not a real problem, since innodb.leaf_page_corrupted_during_recovery is a rather low-level and internals-specific test. Still, better to double-check, to ensure that we don't have some well-hidden issue upon recovery which might be not easy to reveal otherwise.

diff --git a/mysql-test/suite/innodb/t/leaf_page_corrupted_during_recovery.test b/mysql-test/suite/innodb/t/leaf_page_corrupted_during_recovery.test

index 92c3e2fbb08..f3ee93192ff 100644

--- a/mysql-test/suite/innodb/t/leaf_page_corrupted_during_recovery.test

+++ b/mysql-test/suite/innodb/t/leaf_page_corrupted_during_recovery.test

@@ -11,7 +11,7 @@ call mtr.add_suppression("\\[ERROR\\] InnoDB: We detected index corruption");

 call mtr.add_suppression("\\[ERROR\\] (mysqld|mariadbd).*: Index for table 't1' is corrupt; try to repair it");

 call mtr.add_suppression("InnoDB: btr_pcur_open_low level: 0 table: `test`\\.`t1` index: `PRIMARY`");

 --enable_query_log

-CREATE TABLE t1 (pk INT PRIMARY KEY, c CHAR(255))ENGINE=InnoDB STATS_PERSISTENT=0;

+CREATE TABLE t1 (pk INT, PRIMARY KEY (pk DESC), c CHAR(255))ENGINE=InnoDB STATS_PERSISTENT=0;

 SET GLOBAL INNODB_LIMIT_OPTIMISTIC_INSERT_DEBUG = 2;

preview-10.8-MDEV-13756-desc-indexes c10e10c693 with the test modified as above
innodb.leaf_page_corrupted_during_recovery 'innodb,strict_crc32' [ fail ]
Test ended at 2022-01-22 16:57:20

CURRENT_TEST: innodb.leaf_page_corrupted_during_recovery
mysqltest: At line 48: query 'SELECT * FROM t1 WHERE PK = 1' failed with wrong errno ER_TABLE_CORRUPT (1877): 'Table test/t1 is corrupted. Please drop the table and recreate.', instead of ER_UNKNOWN_STORAGE_ENGINE (1286)...

So, if I get the difference correctly from the error log, in the "normal" execution the corruption is detected upon recovery, InnoDB reports it once and the plugin initialization aborts, while with the modified execution it starts up and only reveals corruption at a later stage.

Attachments

Issue Links

relates to

MDEV-13756 Implement descending index: KEY (a DESC, b ASC)

Closed

Activity

Ascending order - Click to sort in descending order

Marko Mäkelä added a comment - 2022-01-23 14:47

Is this really a bug? The test is corrupting a page 19 of the tablespace and then expecting things to fail in a particular way.

My guess is that 19 used to be the first leaf page to be visited by the SELECT statement, and now it is some other page.

The test passes (only produces a result mismatch) if I reverse the keys on insert, update, and select. Note: the UPDATE had better not change the length of the column.

diff --git a/mysql-test/suite/innodb/t/leaf_page_corrupted_during_recovery.test b/mysql-test/suite/innodb/t/leaf_page_corrupted_during_recovery.test

index 92c3e2fbb08..df685804d22 100644

--- a/mysql-test/suite/innodb/t/leaf_page_corrupted_during_recovery.test

+++ b/mysql-test/suite/innodb/t/leaf_page_corrupted_during_recovery.test

@@ -11,14 +11,15 @@ call mtr.add_suppression("\\[ERROR\\] InnoDB: We detected index corruption");

 call mtr.add_suppression("\\[ERROR\\] (mysqld|mariadbd).*: Index for table 't1' is corrupt; try to repair it");

 call mtr.add_suppression("InnoDB: btr_pcur_open_low level: 0 table: `test`\\.`t1` index: `PRIMARY`");

 --enable_query_log

-CREATE TABLE t1 (pk INT PRIMARY KEY, c CHAR(255))ENGINE=InnoDB STATS_PERSISTENT=0;

+CREATE TABLE t1 (pk INT, PRIMARY KEY (pk DESC), c CHAR(255))ENGINE=InnoDB STATS_PERSISTENT=0;

 SET GLOBAL INNODB_LIMIT_OPTIMISTIC_INSERT_DEBUG = 2;

-INSERT INTO t1 VALUES(1, 'sql'), (2, 'server'), (3, 'mariadb'),

-	(4, 'mariadb'), (5, 'test1'), (6, 'test2'), (7, 'test3'),

-	(8, 'test4'), (9, 'test5'), (10, 'test6'), (11, 'test7'),

-	(12, 'test8');

+INSERT INTO t1 VALUES

+(12, 'test8'), (11, 'test7'), (10, 'test6'),

+(9, 'test5'), (8, 'test4'), (7, 'test3'),

+(6, 'test2'), (5, 'test1'), (4, 'mariadb'),

+(3, 'mariadb'), (2, 'server'), (1, 'sql');

 let $restart_noprint=2;

 --source include/restart_mysqld.inc

@@ -28,7 +29,7 @@ let MYSQLD_DATADIR=`select @@datadir`;

 SELECT COUNT(*) FROM t1;

 --source ../include/no_checkpoint_start.inc

-UPDATE t1 SET c='best8' WHERE pk=12;

+UPDATE t1 SET c='qsl' WHERE pk=1;

 --let CLEANUP_IF_CHECKPOINT=DROP TABLE t1;

 --source ../include/no_checkpoint_end.inc

@@ -49,9 +50,9 @@ SELECT * FROM t1 WHERE PK = 1;

 let $restart_parameters=--innodb-force-recovery=1;

 --source include/restart_mysqld.inc

-SELECT * FROM t1 WHERE PK = 1;

+SELECT * FROM t1 WHERE PK = 12;

 --error ER_NOT_KEYFILE

-SELECT * FROM t1 WHERE pk = 12;

+SELECT * FROM t1 WHERE pk = 1;

 DROP TABLE t1;

 let $restart_parameters=;

Marko Mäkelä added a comment - 2022-01-23 14:47 Is this really a bug? The test is corrupting a page 19 of the tablespace and then expecting things to fail in a particular way. My guess is that 19 used to be the first leaf page to be visited by the SELECT statement, and now it is some other page. The test passes (only produces a result mismatch) if I reverse the keys on insert, update, and select. Note: the UPDATE had better not change the length of the column. diff --git a/mysql-test/suite/innodb/t/leaf_page_corrupted_during_recovery.test b/mysql-test/suite/innodb/t/leaf_page_corrupted_during_recovery.test index 92c3e2fbb08..df685804d22 100644 --- a/mysql-test/suite/innodb/t/leaf_page_corrupted_during_recovery.test +++ b/mysql-test/suite/innodb/t/leaf_page_corrupted_during_recovery.test @@ -11,14 +11,15 @@ call mtr.add_suppression("\\[ERROR\\] InnoDB: We detected index corruption"); call mtr.add_suppression("\\[ERROR\\] (mysqld|mariadbd).*: Index for table 't1' is corrupt; try to repair it"); call mtr.add_suppression("InnoDB: btr_pcur_open_low level: 0 table: `test`\\.`t1` index: `PRIMARY`"); --enable_query_log -CREATE TABLE t1 (pk INT PRIMARY KEY, c CHAR(255))ENGINE=InnoDB STATS_PERSISTENT=0; +CREATE TABLE t1 (pk INT, PRIMARY KEY (pk DESC), c CHAR(255))ENGINE=InnoDB STATS_PERSISTENT=0; SET GLOBAL INNODB_LIMIT_OPTIMISTIC_INSERT_DEBUG = 2; -INSERT INTO t1 VALUES(1, 'sql'), (2, 'server'), (3, 'mariadb'), - (4, 'mariadb'), (5, 'test1'), (6, 'test2'), (7, 'test3'), - (8, 'test4'), (9, 'test5'), (10, 'test6'), (11, 'test7'), - (12, 'test8'); +INSERT INTO t1 VALUES +(12, 'test8'), (11, 'test7'), (10, 'test6'), +(9, 'test5'), (8, 'test4'), (7, 'test3'), +(6, 'test2'), (5, 'test1'), (4, 'mariadb'), +(3, 'mariadb'), (2, 'server'), (1, 'sql'); let $restart_noprint=2; --source include/restart_mysqld.inc @@ -28,7 +29,7 @@ let MYSQLD_DATADIR=`select @@datadir`; SELECT COUNT(*) FROM t1; --source ../include/no_checkpoint_start.inc -UPDATE t1 SET c='best8' WHERE pk=12; +UPDATE t1 SET c='qsl' WHERE pk=1; --let CLEANUP_IF_CHECKPOINT=DROP TABLE t1; --source ../include/no_checkpoint_end.inc @@ -49,9 +50,9 @@ SELECT * FROM t1 WHERE PK = 1; let $restart_parameters=--innodb-force-recovery=1; --source include/restart_mysqld.inc -SELECT * FROM t1 WHERE PK = 1; +SELECT * FROM t1 WHERE PK = 12; --error ER_NOT_KEYFILE -SELECT * FROM t1 WHERE pk = 12; +SELECT * FROM t1 WHERE pk = 1; DROP TABLE t1; let $restart_parameters=;

Elena Stepanova added a comment - 2022-01-23 21:36 - edited

Is this really a bug?

marko,
I can only repeat the note which I would hope to be a sufficient explanation for this report:
I'm almost certain that it's not a real problem, since innodb.leaf_page_corrupted_during_recovery is a rather low-level and internals-specific test. Still, better to double-check.

To make it even clearer, my concern was not that the test fails in a different way, but that InnoDB actually behaves the different way: even although a page is still obviously corrupted, it doesn't anymore detect it upon initialization on startup.
However, I cannot answer whether it is a bug or not, it is your test and your code. If you think it's not a bug, please feel free to close it as such.

Elena Stepanova added a comment - 2022-01-23 21:36 - edited Is this really a bug? marko , I can only repeat the note which I would hope to be a sufficient explanation for this report: I'm almost certain that it's not a real problem, since innodb.leaf_page_corrupted_during_recovery is a rather low-level and internals-specific test. Still, better to double-check . To make it even clearer, my concern was not that the test fails in a different way, but that InnoDB actually behaves the different way: even although a page is still obviously corrupted, it doesn't anymore detect it upon initialization on startup. However, I cannot answer whether it is a bug or not, it is your test and your code. If you think it's not a bug, please feel free to close it as such.

Marko Mäkelä added a comment - 2022-01-27 08:03

I checked this as follows:

./mtr --rr --parallel=auto innodb.leaf_page_corrupted_during_recovery

rr replay var/1/log/mysqld.1.rr/mariadbd-2

break sql_print_error

continue

thread apply all backtrace

Recovery will read the page that the test corrupted, and it will report corruption on read completion:

10.8 5217211e071d1c9943026b429baaaffe8ef8414a
Thread 2 (Thread 21686.21713 (mmap_copy_5_mar)):
#0 sql_print_error (format=format@entry=0x56223c0f94ad "InnoDB: %s") at /mariadb/10.8/sql/log.cc:9165
#1 0x000056223bba6e1f in ib::error::~error (this=this@entry=0x7f79720f38c0, __in_chrg=<optimized out>) at /usr/include/c++/11/bits/basic_string.h:194
#2 0x000056223bc17c95 in buf_page_t::read_complete (this=0x7f7973f01930, node=@0x56223df44268: {space = 0x56223df440f8, name = 0x56223de2d618 "./test/t1.ibd", handle = {m_file = 12, m_psi = 0x7f797a0eb640}, on_ssd = 0, punch_hole = 0, atomic_write = 0, is_raw_disk = 0, deferred = 0, size = 21, init_size = 0, max_size = 4294967295, being_extended = {m = {_M_base = {_M_i = false}}}, chain = {prev = 0x0, next = 0x0}, block_size = 4096}) at /mariadb/10.8/storage/innobase/buf/buf0buf.cc:3647
#3 0x000056223bca6c66 in fil_aio_callback (request=@0x56223de84690: {bpage = 0x7f7973f01930, slot = 0x0, node = 0x56223df44268, type = IORequest::READ_ASYNC}) at /mariadb/10.8/storage/innobase/fil/fil0fil.cc:2858
#4 0x000056223ba4b1cc in io_callback (cb=0x56223de84618) at /mariadb/10.8/storage/innobase/os/os0file.cc:3579
#5 0x000056223bd2f06d in tpool::task_group::execute (this=0x56223de719a0, t=t@entry=0x56223de84670) at /mariadb/10.8/tpool/task_group.cc:55
…

This page read was requested by the main thread:

10.8 5217211e071d1c9943026b429baaaffe8ef8414a
Thread 1 (Thread 21686.21686 (mmap_copy_5_mar)):
…
#4 0x000056223bd9bdb8 in safe_cond_wait (cond=0x56223c747570 <recv_sys+176>, mp=0x56223c7474c0 <recv_sys>, file=0x56223c0c2c78 "/mariadb/10.8/storage/innobase/log/log0recv.cc", line=3423) at /mariadb/10.8/mysys/thr_mutex.c:494
#5 0x000056223ba208b2 in recv_sys_t::apply (this=0x56223c7474c0 <recv_sys>, last_batch=last_batch@entry=true) at /mariadb/10.8/storage/innobase/log/log0recv.cc:3423
#6 0x000056223bb6289e in srv_start (create_new_db=<optimized out>) at /mariadb/10.8/storage/innobase/srv/srv0start.cc:1244
#7 0x000056223b97f50f in innodb_init (p=<optimized out>) at /mariadb/10.8/storage/innobase/handler/ha_innodb.cc:4155

The read of the corrupted page read will lead to aborted recovery. The test expects that InnoDB will fail to start up. With the suggested small change, a different page would be corrupted, and InnoDB would start up, and page corruption would be noticed (without triggering an infamous ~~MDEV-13542~~ crash, yay!) while executing the statement.

With my more extensive change to the test, it will corrupt the right page again. As far as I can tell, there indeed is no real problem.

Marko Mäkelä added a comment - 2022-01-27 08:03 I checked this as follows: ./mtr --rr --parallel=auto innodb.leaf_page_corrupted_during_recovery rr replay var/1/log/mysqld.1.rr/mariadbd-2 break sql_print_error continue thread apply all backtrace Recovery will read the page that the test corrupted, and it will report corruption on read completion: 10.8 5217211e071d1c9943026b429baaaffe8ef8414a Thread 2 (Thread 21686.21713 (mmap_copy_5_mar)): #0 sql_print_error (format=format@entry=0x56223c0f94ad "InnoDB: %s") at /mariadb/10.8/sql/log.cc:9165 #1 0x000056223bba6e1f in ib::error::~error (this=this@entry=0x7f79720f38c0, __in_chrg=<optimized out>) at /usr/include/c++/11/bits/basic_string.h:194 #2 0x000056223bc17c95 in buf_page_t::read_complete (this=0x7f7973f01930, node=@0x56223df44268: {space = 0x56223df440f8, name = 0x56223de2d618 "./test/t1.ibd", handle = {m_file = 12, m_psi = 0x7f797a0eb640}, on_ssd = 0, punch_hole = 0, atomic_write = 0, is_raw_disk = 0, deferred = 0, size = 21, init_size = 0, max_size = 4294967295, being_extended = {m = {_M_base = {_M_i = false}}}, chain = {prev = 0x0, next = 0x0}, block_size = 4096}) at /mariadb/10.8/storage/innobase/buf/buf0buf.cc:3647 #3 0x000056223bca6c66 in fil_aio_callback (request=@0x56223de84690: {bpage = 0x7f7973f01930, slot = 0x0, node = 0x56223df44268, type = IORequest::READ_ASYNC}) at /mariadb/10.8/storage/innobase/fil/fil0fil.cc:2858 #4 0x000056223ba4b1cc in io_callback (cb=0x56223de84618) at /mariadb/10.8/storage/innobase/os/os0file.cc:3579 #5 0x000056223bd2f06d in tpool::task_group::execute (this=0x56223de719a0, t=t@entry=0x56223de84670) at /mariadb/10.8/tpool/task_group.cc:55 … This page read was requested by the main thread: 10.8 5217211e071d1c9943026b429baaaffe8ef8414a Thread 1 (Thread 21686.21686 (mmap_copy_5_mar)): … #4 0x000056223bd9bdb8 in safe_cond_wait (cond=0x56223c747570 <recv_sys+176>, mp=0x56223c7474c0 <recv_sys>, file=0x56223c0c2c78 "/mariadb/10.8/storage/innobase/log/log0recv.cc", line=3423) at /mariadb/10.8/mysys/thr_mutex.c:494 #5 0x000056223ba208b2 in recv_sys_t::apply (this=0x56223c7474c0 <recv_sys>, last_batch=last_batch@entry=true) at /mariadb/10.8/storage/innobase/log/log0recv.cc:3423 #6 0x000056223bb6289e in srv_start (create_new_db=<optimized out>) at /mariadb/10.8/storage/innobase/srv/srv0start.cc:1244 #7 0x000056223b97f50f in innodb_init (p=<optimized out>) at /mariadb/10.8/storage/innobase/handler/ha_innodb.cc:4155 The read of the corrupted page read will lead to aborted recovery. The test expects that InnoDB will fail to start up. With the suggested small change, a different page would be corrupted, and InnoDB would start up, and page corruption would be noticed (without triggering an infamous MDEV-13542 crash, yay!) while executing the statement. With my more extensive change to the test, it will corrupt the right page again. As far as I can tell, there indeed is no real problem.

MariaDB Server

Recovery behaves differently with DESC PK in innodb.leaf_page_corrupted_during_recovery

Details

Description

Attachments

Issue Links

Activity

People

Dates

Git Integration