[MDEV-29438] Recovery or backup of instant ALTER TABLE is incorrect - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Affects Version/s: 10.5.2, 10.6.0, 10.5, 10.6, 10.7(EOL), 10.8(EOL), 10.9(EOL), 10.10(EOL)
Fix Version/s: 10.5.18, 10.6.10, 10.7.6, 10.8.5, 10.9.3, 10.10.2
Component/s: Backup, Storage Engine - InnoDB
Labels:
- regression-10.5

Description

mleich provided rr replay traces where mariadb-backup --prepare fails like this:

10.6 92032499874259bae7455130958ea7f38c4d53a3
2022-08-31 9:06:55 0 [Note] InnoDB: Starting final batch to recover 241 pages from redo log.
2022-08-31 9:06:58 0 [ERROR] InnoDB: OPT_PAGE_CHECKSUM mismatch on [page id: space=0, page number=514]
2022-08-31 9:06:58 0 [ERROR] InnoDB: Set innodb_force_recovery=1 to ignore corruption.
2022-08-31 9:06:58 0 [ERROR] InnoDB: Unable to apply log to corrupted page [page id: space=0, page number=514]; set innodb_force_recovery to ignore
2022-08-31 9:06:58 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1490] with error Data structure corruption

I extracted a copy of the page both from the backup and from the server at the logical point of time when the OPT_PAGE_CHECKSUM record was written. Apart from FIL_PAGE_LSN which is excluded from the checksum, the pages differ as follows:

@@ -38,7 +38,7 @@

 0004e0 20 20 20 20 20 20 20 20 00 00 00 00 00 00 00 00

 0004f0 01 01 00 00 00 00 00 00 00 00 00 f0 3f 00 00 00

 000500 00 00 00 f0 3f 03 01 30 00 3c fb 73 00 00 00 00

-000510 00 00 00 00 07 0a 8b 00 00 01 46 0d 17 00 00 00

+000510 00 00 00 00 06 0a 8b 00 00 01 46 0d 17 00 00 00

 000520 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

 000530 00 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20

 000540 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20

The difference is for a clustered index leaf page record that starts at 0x50c. The DB_TRX_ID=0x70a would be incorrectly recovered as 0x60a. This difference was caught thanks to ~~MDEV-18976~~.

The columns of the index seem to be something like the following:
(id INT UNSIGNED NOT NULL, DB_TRX_ID, DB_ROLL_PTR, pad CHAR(60), c CHAR(120), k INT, geocol2 GEOMETRY, tcol CHAR(3)).

On recovery, the incorrect byte was recovered by the following:

10.6 92032499874259bae7455130958ea7f38c4d53a3
(rr) frame 2
#2 0x000055f5b96c8359 in page_apply_insert_dynamic (block=..., reuse=false,
prev=0, shift=0, enc_hdr_l=23, hdr_c=0, data_c=9, data=0x4e0c52e6e807,
data_len=212)
at /data/Server/bb-10.6-MDEV-29374/storage/innobase/page/page0cur.cc:2838
2838 memcpy(buf, prev_rec, data_c);

The buf starts at 0x50c (the start of the first column), and the prev_rec at offset 0x63 contains the following:

(rr) p/x *prev_rec@data_c

$12 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x6}

It is worth noting that right after the incorrect data had been copied, that prev_rec field would have been adjusted to the correct value:

#0  mach_write_to_1 (b=0x4e0c5298806b "\a", n=7)

    at /data/Server/bb-10.6-MDEV-29374/storage/innobase/include/mach0data.inl:45

#1  0x000055f5b96b4204 in rec_set_bit_field_1 (rec=0x4e0c52988070 "", val=7,

    offs=5, mask=15, shift=0)

    at /data/Server/bb-10.6-MDEV-29374/storage/innobase/include/rem0rec.inl:159

#2  0x000055f5b96c83ab in page_apply_insert_dynamic (block=..., reuse=false,

    prev=0, shift=0, enc_hdr_l=23, hdr_c=0, data_c=9, data=0x4e0c52e6e807,

    data_len=212)

    at /data/Server/bb-10.6-MDEV-29374/storage/innobase/page/page0cur.cc:2842

…

(rr) frame 2

#2  0x000055f5b96c83ab in page_apply_insert_dynamic (block=..., reuse=false,

    prev=0, shift=0, enc_hdr_l=23, hdr_c=0, data_c=9, data=0x4e0c52e6e807,

    data_len=212)

    at /data/Server/bb-10.6-MDEV-29374/storage/innobase/page/page0cur.cc:2842

2842	  rec_set_bit_field_1(owner_rec, n_owned + 1, REC_NEW_N_OWNED,

Offset 0x63 is where the page infimum pseudo-record is stored. Because an instant ADD/DROP COLUMN has been executed on this table, the record will not contain the string infimum but something else (NUL bytes followed by the header of the supremum record).

I think that we must ‘pessimize’ the implementation of ~~MDEV-21724~~ and never write log that would copy something from the infimum record to the first actual user record in the table.

Attachments

Issue Links

causes

MDEV-29471 Buffer overflow in page_cur_insert_rec_low()

Closed

is caused by

MDEV-21724 Optimize page_cur_insert_rec_low() redo logging

Closed

relates to

MDEV-15562 Instant DROP COLUMN or changing the order of columns

Closed

MDEV-29440 InnoDB instant ALTER TABLE recovery wrongly uses READ COMMITTED isolation level instead of READ UNCOMMITTED

Closed

MDEV-18976 Implement a CHECKSUM redo log record for improved validation

Closed

MDEV-28668 Recovery or backup of INSERT may be incorrect

Closed

(1 relates to)

Activity

Ascending order - Click to sort in descending order

Marko Mäkelä added a comment - 2022-09-01 09:48

The infimum and supremum strings were repurposed in ~~MDEV-15562~~.

The insert of the record that is logged incorrectly is actually for the hidden metadata record of a ~~MDEV-15562~~ operation:

10.6 92032499874259bae7455130958ea7f38c4d53a3
#2 0x00005572faaadca1 in page_cur_insert_rec_low (cur=0x58551c4f5048, index=0x616006e1a608, rec=0x6140000254cf "", offsets=0x58551c4f5170, mtr=0x58551c4f5450)
at /data/Server/bb-10.6-MDEV-29374/storage/innobase/page/page0cur.cc:1525
#3 0x00005572fad90d7d in page_cur_tuple_insert (cursor=0x58551c4f5048, tuple=0x61d0009f1e88, index=0x616006e1a608, offsets=0x58551c4f5000, heap=0x58551c4f4fe0, n_ext=1, mtr=0x58551c4f5450)
at /data/Server/bb-10.6-MDEV-29374/storage/innobase/include/page0cur.inl:199
#4 0x00005572fada7631 in btr_cur_optimistic_insert (flags=2, cursor=0x58551c4f5040, offsets=0x58551c4f5000, heap=0x58551c4f4fe0, entry=0x61d0009f1e88, rec=0x58551c4f5020, big_rec=0x58551c4f4fc0, n_ext=1,
thr=0x61d0009f2070, mtr=0x58551c4f5450) at /data/Server/bb-10.6-MDEV-29374/storage/innobase/btr/btr0cur.cc:3500
#5 0x00005572fab90ae2 in row_ins_clust_index_entry_low (flags=2, mode=10, index=0x616006e1a608, n_uniq=1, entry=0x61d0009f1e88, n_ext=0, thr=0x61d0009f2070)
at /data/Server/bb-10.6-MDEV-29374/storage/innobase/row/row0ins.cc:2764
#6 0x00005572fa8dcfcd in innobase_instant_try (ha_alter_info=0x58551c4f7a90, ctx=0x62b00011aad0, altered_table=0x58551c4f8190, table=0x619001cea398, trx=0x75cd11101440)
at /data/Server/bb-10.6-MDEV-29374/storage/innobase/handler/handler0alter.cc:6177
#7 0x00005572fa92d529 in commit_try_norebuild (ha_alter_info=0x58551c4f7a90, ctx=0x62b00011aad0, altered_table=0x58551c4f8190, old_table=0x619001cea398, trx=0x75cd11101440, table_name=0x61b000092155 "t18")
at /data/Server/bb-10.6-MDEV-29374/storage/innobase/handler/handler0alter.cc:10559
#8 0x00005572fa905753 in ha_innobase::commit_inplace_alter_table (this=0x61d000885eb8, altered_table=0x58551c4f8190, ha_alter_info=0x58551c4f7a90, commit=true)
at /data/Server/bb-10.6-MDEV-29374/storage/innobase/handler/handler0alter.cc:11308
#9 0x00005572f9f5c304 in handler::ha_commit_inplace_alter_table (this=0x61d000885eb8, altered_table=0x58551c4f8190, ha_alter_info=0x58551c4f7a90, commit=true)
at /data/Server/bb-10.6-MDEV-29374/sql/handler.cc:5234
#10 0x00005572f99d192a in mysql_inplace_alter_table (thd=0x62b000149218, table_list=0x62b000118400, table=0x619001cea398, altered_table=0x58551c4f8190, ha_alter_info=0x58551c4f7a90,
target_mdl_request=0x58551c4f7b90, ddl_log_state=0x58551c4f78b0, trigger_param=0x58551c4f8600, alter_ctx=0x58551c4f90a0) at /data/Server/bb-10.6-MDEV-29374/sql/sql_table.cc:7471
#11 0x00005572f99e55b6 in mysql_alter_table (thd=0x62b000149218, new_db=0x62b00014dc18, new_name=0x62b00014e030, create_info=0x58551c4fa550, table_list=0x62b000118400, alter_info=0x58551c4fa420, order_num=0,
order=0x0, ignore=false, if_exists=false) at /data/Server/bb-10.6-MDEV-29374/sql/sql_table.cc:10278
#12 0x00005572f9b74c76 in Sql_cmd_alter_table::execute (this=0x62b000118ce8, thd=0x62b000149218) at /data/Server/bb-10.6-MDEV-29374/sql/sql_alter.cc:542
#13 0x00005572f9774c79 in mysql_execute_command (thd=0x62b000149218, is_called_from_prepared_stmt=false) at /data/Server/bb-10.6-MDEV-29374/sql/sql_parse.cc:5997
#14 0x00005572f9781046 in mysql_parse (thd=0x62b000149218, rawbuf=0x62b000118238 "ALTER TABLE t18 ADD COLUMN IF NOT EXISTS tcol7 TIME DEFAULT NULL FIRST, LOCK=SHARED /* E_R Thread2 QNO 2036 CON_ID 19 */",
length=120, parser_state=0x58551c4fba20) at /data/Server/bb-10.6-MDEV-29374/sql/sql_parse.cc:8030

The ADD COLUMN…FIRST will cause columns to be reordered in the table.

For reproducing this bug, I think that it is important that the table contains INT UNSIGNED PRIMARY KEY. Execute something like this in a loop, and concurrently run backup and restore:

CREATE TABLE t(a INT UNSIGNED PRIMARY KEY) ENGINE=InnoDB;

INSERT INTO t VALUES …;

ALTER TABLE t ADD COLUMN b INT NULL FIRST;

DROP TABLE t;

Marko Mäkelä added a comment - 2022-09-01 09:48 The infimum and supremum strings were repurposed in MDEV-15562 . The insert of the record that is logged incorrectly is actually for the hidden metadata record of a MDEV-15562 operation: 10.6 92032499874259bae7455130958ea7f38c4d53a3 #2 0x00005572faaadca1 in page_cur_insert_rec_low (cur=0x58551c4f5048, index=0x616006e1a608, rec=0x6140000254cf "", offsets=0x58551c4f5170, mtr=0x58551c4f5450) at /data/Server/bb-10.6-MDEV-29374/storage/innobase/page/page0cur.cc:1525 #3 0x00005572fad90d7d in page_cur_tuple_insert (cursor=0x58551c4f5048, tuple=0x61d0009f1e88, index=0x616006e1a608, offsets=0x58551c4f5000, heap=0x58551c4f4fe0, n_ext=1, mtr=0x58551c4f5450) at /data/Server/bb-10.6-MDEV-29374/storage/innobase/include/page0cur.inl:199 #4 0x00005572fada7631 in btr_cur_optimistic_insert (flags=2, cursor=0x58551c4f5040, offsets=0x58551c4f5000, heap=0x58551c4f4fe0, entry=0x61d0009f1e88, rec=0x58551c4f5020, big_rec=0x58551c4f4fc0, n_ext=1, thr=0x61d0009f2070, mtr=0x58551c4f5450) at /data/Server/bb-10.6-MDEV-29374/storage/innobase/btr/btr0cur.cc:3500 #5 0x00005572fab90ae2 in row_ins_clust_index_entry_low (flags=2, mode=10, index=0x616006e1a608, n_uniq=1, entry=0x61d0009f1e88, n_ext=0, thr=0x61d0009f2070) at /data/Server/bb-10.6-MDEV-29374/storage/innobase/row/row0ins.cc:2764 #6 0x00005572fa8dcfcd in innobase_instant_try (ha_alter_info=0x58551c4f7a90, ctx=0x62b00011aad0, altered_table=0x58551c4f8190, table=0x619001cea398, trx=0x75cd11101440) at /data/Server/bb-10.6-MDEV-29374/storage/innobase/handler/handler0alter.cc:6177 #7 0x00005572fa92d529 in commit_try_norebuild (ha_alter_info=0x58551c4f7a90, ctx=0x62b00011aad0, altered_table=0x58551c4f8190, old_table=0x619001cea398, trx=0x75cd11101440, table_name=0x61b000092155 "t18") at /data/Server/bb-10.6-MDEV-29374/storage/innobase/handler/handler0alter.cc:10559 #8 0x00005572fa905753 in ha_innobase::commit_inplace_alter_table (this=0x61d000885eb8, altered_table=0x58551c4f8190, ha_alter_info=0x58551c4f7a90, commit=true) at /data/Server/bb-10.6-MDEV-29374/storage/innobase/handler/handler0alter.cc:11308 #9 0x00005572f9f5c304 in handler::ha_commit_inplace_alter_table (this=0x61d000885eb8, altered_table=0x58551c4f8190, ha_alter_info=0x58551c4f7a90, commit=true) at /data/Server/bb-10.6-MDEV-29374/sql/handler.cc:5234 #10 0x00005572f99d192a in mysql_inplace_alter_table (thd=0x62b000149218, table_list=0x62b000118400, table=0x619001cea398, altered_table=0x58551c4f8190, ha_alter_info=0x58551c4f7a90, target_mdl_request=0x58551c4f7b90, ddl_log_state=0x58551c4f78b0, trigger_param=0x58551c4f8600, alter_ctx=0x58551c4f90a0) at /data/Server/bb-10.6-MDEV-29374/sql/sql_table.cc:7471 #11 0x00005572f99e55b6 in mysql_alter_table (thd=0x62b000149218, new_db=0x62b00014dc18, new_name=0x62b00014e030, create_info=0x58551c4fa550, table_list=0x62b000118400, alter_info=0x58551c4fa420, order_num=0, order=0x0, ignore=false, if_exists=false) at /data/Server/bb-10.6-MDEV-29374/sql/sql_table.cc:10278 #12 0x00005572f9b74c76 in Sql_cmd_alter_table::execute (this=0x62b000118ce8, thd=0x62b000149218) at /data/Server/bb-10.6-MDEV-29374/sql/sql_alter.cc:542 #13 0x00005572f9774c79 in mysql_execute_command (thd=0x62b000149218, is_called_from_prepared_stmt=false) at /data/Server/bb-10.6-MDEV-29374/sql/sql_parse.cc:5997 #14 0x00005572f9781046 in mysql_parse (thd=0x62b000149218, rawbuf=0x62b000118238 "ALTER TABLE t18 ADD COLUMN IF NOT EXISTS tcol7 TIME DEFAULT NULL FIRST, LOCK=SHARED /* E_R Thread2 QNO 2036 CON_ID 19 */", length=120, parser_state=0x58551c4fba20) at /data/Server/bb-10.6-MDEV-29374/sql/sql_parse.cc:8030 The ADD COLUMN…FIRST will cause columns to be reordered in the table. For reproducing this bug, I think that it is important that the table contains INT UNSIGNED PRIMARY KEY . Execute something like this in a loop, and concurrently run backup and restore: CREATE TABLE t(a INT UNSIGNED PRIMARY KEY ) ENGINE=InnoDB; INSERT INTO t VALUES …; ALTER TABLE t ADD COLUMN b INT NULL FIRST ; DROP TABLE t;

Marko Mäkelä added a comment - 2022-09-01 10:20

My attempt at reproducing this reproduced something else:

diff --git a/mysql-test/suite/innodb/t/instant_alter_crash.test b/mysql-test/suite/innodb/t/instant_alter_crash.test

index ddeaa6885e6..8c08ad0f6d9 100644

--- a/mysql-test/suite/innodb/t/instant_alter_crash.test

+++ b/mysql-test/suite/innodb/t/instant_alter_crash.test

@@ -198,6 +198,29 @@ disconnect ddl;

 SHOW CREATE TABLE t1;

 SHOW CREATE TABLE t2;

 SHOW CREATE TABLE t3;

-DROP TABLE t1,t2,t3;

+DROP TABLE t2,t3;

+--echo #

+--echo # MDEV-29438 Recovery or backup of instant ALTER TABLE is incorrect

+--echo #

+CREATE TABLE t2(a INT UNSIGNED PRIMARY KEY) ENGINE=InnoDB;

+INSERT INTO t2 VALUES (1),(2),(3),(4),(5),(6);

+connect ddl, localhost, root;

+SET DEBUG_SYNC='innodb_alter_inplace_before_commit SIGNAL ddl WAIT_FOR ever';

+--send

+ALTER TABLE t2 ADD COLUMN b TINYINT UNSIGNED NOT NULL DEFAULT 42 FIRST;

+connection default;

+SET DEBUG_SYNC='now WAIT_FOR ddl';

+SET GLOBAL innodb_flush_log_at_trx_commit=1;

+DELETE FROM t1;

+--source include/kill_mysqld.inc

+disconnect ddl;

+--source include/start_mysqld.inc

+CHECK TABLE t2;

+DROP TABLE t1,t2;

 --list_files $MYSQLD_DATADIR/test

10.6 40aa94df356cfa000fc43c92ff0061212d1b161d

2022-09-01 13:19:14 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=54849,54849

mariadbd: /mariadb/10.6/storage/innobase/rem/rem0rec.cc:304: void rec_init_offsets_comp_ordinary(const rec_t*, const dict_index_t*, rec_offs*, ulint, const dict_col_t::def_t*, rec_leaf_format) [with bool mblob = true; bool redundant_temp = false; rec_t = unsigned char; rec_offs = short unsigned int; ulint = long unsigned int]: Assertion `n_fields <= ulint(index->n_fields) + 1' failed.

The index->n_fields=3 corresponds to (a,DB_TRX_ID,DB_ROLL_PTR) and the n_fields=5 corresponds to the metadata record (a,DB_TRX_ID,DB_ROLL_PTR,blob,b).

Marko Mäkelä added a comment - 2022-09-01 10:20 My attempt at reproducing this reproduced something else: diff --git a/mysql-test/suite/innodb/t/instant_alter_crash.test b/mysql-test/suite/innodb/t/instant_alter_crash.test index ddeaa6885e6..8c08ad0f6d9 100644 --- a/mysql-test/suite/innodb/t/instant_alter_crash.test +++ b/mysql-test/suite/innodb/t/instant_alter_crash.test @@ -198,6 +198,29 @@ disconnect ddl; SHOW CREATE TABLE t1; SHOW CREATE TABLE t2; SHOW CREATE TABLE t3; -DROP TABLE t1,t2,t3; +DROP TABLE t2,t3; + +--echo # +--echo # MDEV-29438 Recovery or backup of instant ALTER TABLE is incorrect +--echo # +CREATE TABLE t2(a INT UNSIGNED PRIMARY KEY) ENGINE=InnoDB; +INSERT INTO t2 VALUES (1),(2),(3),(4),(5),(6); + +connect ddl, localhost, root; +SET DEBUG_SYNC='innodb_alter_inplace_before_commit SIGNAL ddl WAIT_FOR ever'; +--send +ALTER TABLE t2 ADD COLUMN b TINYINT UNSIGNED NOT NULL DEFAULT 42 FIRST; + +connection default; +SET DEBUG_SYNC='now WAIT_FOR ddl'; +SET GLOBAL innodb_flush_log_at_trx_commit=1; +DELETE FROM t1; + +--source include/kill_mysqld.inc +disconnect ddl; +--source include/start_mysqld.inc + +CHECK TABLE t2; +DROP TABLE t1,t2; --list_files $MYSQLD_DATADIR/test 10.6 40aa94df356cfa000fc43c92ff0061212d1b161d 2022-09-01 13:19:14 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=54849,54849 mariadbd: /mariadb/10.6/storage/innobase/rem/rem0rec.cc:304: void rec_init_offsets_comp_ordinary(const rec_t*, const dict_index_t*, rec_offs*, ulint, const dict_col_t::def_t*, rec_leaf_format) [with bool mblob = true; bool redundant_temp = false; rec_t = unsigned char; rec_offs = short unsigned int; ulint = long unsigned int]: Assertion `n_fields <= ulint(index->n_fields) + 1' failed. The index->n_fields=3 corresponds to (a,DB_TRX_ID,DB_ROLL_PTR) and the n_fields=5 corresponds to the metadata record (a,DB_TRX_ID,DB_ROLL_PTR,blob,b).

Marko Mäkelä added a comment - 2022-09-01 11:47

That something else is now ~~MDEV-29440~~.

Marko Mäkelä added a comment - 2022-09-01 11:47 That something else is now MDEV-29440 .

Marko Mäkelä added a comment - 2022-09-01 12:53

Finally, I reproduced this without hitting ~~MDEV-29440~~. The UPDATE loop is for incrementing the DB_TRX_ID to a suitable range.

--source include/have_innodb.inc

--source include/not_embedded.inc

--source include/have_debug.inc

--source include/have_debug_sync.inc

CREATE TABLE t1(a INT PRIMARY KEY) ENGINE=InnoDB;

INSERT INTO t1 SET a=1;

--disable_query_log

let $N=768;

while ($N) {

UPDATE t1 SET a=a+1;

dec $N;

--enable_query_log

CREATE TABLE t2(a INT UNSIGNED PRIMARY KEY, b INT NOT NULL DEFAULT 42)

ENGINE=InnoDB;

INSERT INTO t2 (a) VALUES (1),(2),(3),(4);

connect ddl, localhost, root;

SET DEBUG_SYNC='innodb_alter_inplace_before_commit SIGNAL ddl WAIT_FOR ever';

--send

ALTER TABLE t2 MODIFY COLUMN b INT NOT NULL DEFAULT 42 FIRST;

connection default;

SET DEBUG_SYNC='now WAIT_FOR ddl';

SET GLOBAL innodb_flush_log_at_trx_commit=1;

DELETE FROM t1;

--source include/kill_mysqld.inc

disconnect ddl;

--source include/start_mysqld.inc

CHECK TABLE t2;

DROP TABLE t1,t2;

10.6 40aa94df356cfa000fc43c92ff0061212d1b161d
2022-09-01 15:50:35 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=54849,54849
2022-09-01 15:50:35 0 [ERROR] InnoDB: OPT_PAGE_CHECKSUM mismatch on [page id: space=6, page number=3]
2022-09-01 15:50:35 0 [ERROR] InnoDB: Set innodb_force_recovery=1 to ignore corruption.
2022-09-01 15:50:35 0 [ERROR] InnoDB: Unable to apply log to corrupted page [page id: space=6, page number=3]; set innodb_force_recovery to ignore
2022-09-01 15:50:35 0 [Note] InnoDB: 1 transaction(s) which must be rolled back or cleaned up in total 6 row operations to undo
2022-09-01 15:50:35 0 [Note] InnoDB: Trx id counter is 1574
2022-09-01 15:50:35 0 [Note] InnoDB: Starting final batch to recover 12 pages from redo log.
2022-09-01 15:50:35 0 [ERROR] InnoDB: Unable to apply log to corrupted page [page id: space=0, page number=324]; set innodb_force_recovery to ignore
2022-09-01 15:50:35 0 [ERROR] InnoDB: Unable to apply log to corrupted page [page id: space=0, page number=410]; set innodb_force_recovery to ignore
2022-09-01 15:50:35 0 [ERROR] InnoDB: Unable to apply log to corrupted page [page id: space=0, page number=0]; set innodb_force_recovery to ignore
2022-09-01 15:50:35 0 [ERROR] InnoDB: Failed to read page 0 from file './ibdata1': Page read from tablespace is corrupted.
2022-09-01 15:50:35 0 [ERROR] InnoDB: Unable to apply log to corrupted page [page id: space=5, page number=1]; set innodb_force_recovery to ignore
2022-09-01 15:50:35 0 [ERROR] InnoDB: Unable to apply log to corrupted page [page id: space=5, page number=2]; set innodb_force_recovery to ignore
2022-09-01 15:50:35 0 [ERROR] InnoDB: Unable to apply log to corrupted page [page id: space=5, page number=3]; set innodb_force_recovery to ignore
2022-09-01 15:50:35 0 [ERROR] InnoDB: Unable to apply log to corrupted page [page id: space=6, page number=1]; set innodb_force_recovery to ignore
2022-09-01 15:50:35 0 [ERROR] InnoDB: Unable to apply log to corrupted page [page id: space=6, page number=2]; set innodb_force_recovery to ignore
2022-09-01 15:50:35 0 [ERROR] InnoDB: Unable to apply log to corrupted page [page id: space=6, page number=4]; set innodb_force_recovery to ignore
2022-09-01 15:50:35 0 [Note] InnoDB: Set innodb_force_recovery=1 to ignore corrupted pages.
2022-09-01 15:50:35 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1490] with error Data structure corruption
2022-09-01 15:50:35 0 [Note] InnoDB: Starting shutdown...
2022-09-01 15:50:35 0 [ERROR] InnoDB: Unable to apply log to corrupted page [page id: space=0, page number=243]; set innodb_force_recovery to ignore
2022-09-01 15:50:35 0 [ERROR] InnoDB: Failed to read page 243 from file './ibdata1': Page read from tablespace is corrupted.
2022-09-01 15:50:35 0 [ERROR] InnoDB: Unable to apply log to corrupted page [page id: space=1, page number=3]; set innodb_force_recovery to ignore
2022-09-01 15:50:35 0 [ERROR] InnoDB: Failed to read page 3 from file './mysql/innodb_table_stats.ibd': Page read from tablespace is corrupted.
2022-09-01 15:50:35 0 [ERROR] InnoDB: Unable to apply log to corrupted page [page id: space=2, page number=3]; set innodb_force_recovery to ignore
2022-09-01 15:50:35 0 [ERROR] InnoDB: Failed to read page 3 from file './mysql/innodb_index_stats.ibd': Page read from tablespace is corrupted.
2022-09-01 15:50:35 0 [ERROR] Plugin 'InnoDB' init function returned error.
2022-09-01 15:50:35 0 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.

I may have accidentally reproduced another bug in multi-batch recovery, because I only expected that single page to be corrupted.

Marko Mäkelä added a comment - 2022-09-01 12:53 Finally, I reproduced this without hitting MDEV-29440 . The UPDATE loop is for incrementing the DB_TRX_ID to a suitable range. --source include/have_innodb.inc --source include/not_embedded.inc --source include/have_debug.inc --source include/have_debug_sync.inc CREATE TABLE t1(a INT PRIMARY KEY ) ENGINE=InnoDB; INSERT INTO t1 SET a=1; --disable_query_log let $N=768; while ($N) { UPDATE t1 SET a=a+1; dec $N; } --enable_query_log CREATE TABLE t2(a INT UNSIGNED PRIMARY KEY , b INT NOT NULL DEFAULT 42) ENGINE=InnoDB; INSERT INTO t2 (a) VALUES (1),(2),(3),(4); connect ddl, localhost, root; SET DEBUG_SYNC= 'innodb_alter_inplace_before_commit SIGNAL ddl WAIT_FOR ever' ; --send ALTER TABLE t2 MODIFY COLUMN b INT NOT NULL DEFAULT 42 FIRST ; connection default ; SET DEBUG_SYNC= 'now WAIT_FOR ddl' ; SET GLOBAL innodb_flush_log_at_trx_commit=1; DELETE FROM t1; --source include/kill_mysqld.inc disconnect ddl; --source include/start_mysqld.inc CHECK TABLE t2; DROP TABLE t1,t2; 10.6 40aa94df356cfa000fc43c92ff0061212d1b161d 2022-09-01 15:50:35 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=54849,54849 2022-09-01 15:50:35 0 [ERROR] InnoDB: OPT_PAGE_CHECKSUM mismatch on [page id: space=6, page number=3] 2022-09-01 15:50:35 0 [ERROR] InnoDB: Set innodb_force_recovery=1 to ignore corruption. 2022-09-01 15:50:35 0 [ERROR] InnoDB: Unable to apply log to corrupted page [page id: space=6, page number=3]; set innodb_force_recovery to ignore 2022-09-01 15:50:35 0 [Note] InnoDB: 1 transaction(s) which must be rolled back or cleaned up in total 6 row operations to undo 2022-09-01 15:50:35 0 [Note] InnoDB: Trx id counter is 1574 2022-09-01 15:50:35 0 [Note] InnoDB: Starting final batch to recover 12 pages from redo log. 2022-09-01 15:50:35 0 [ERROR] InnoDB: Unable to apply log to corrupted page [page id: space=0, page number=324]; set innodb_force_recovery to ignore 2022-09-01 15:50:35 0 [ERROR] InnoDB: Unable to apply log to corrupted page [page id: space=0, page number=410]; set innodb_force_recovery to ignore 2022-09-01 15:50:35 0 [ERROR] InnoDB: Unable to apply log to corrupted page [page id: space=0, page number=0]; set innodb_force_recovery to ignore 2022-09-01 15:50:35 0 [ERROR] InnoDB: Failed to read page 0 from file './ibdata1': Page read from tablespace is corrupted. 2022-09-01 15:50:35 0 [ERROR] InnoDB: Unable to apply log to corrupted page [page id: space=5, page number=1]; set innodb_force_recovery to ignore 2022-09-01 15:50:35 0 [ERROR] InnoDB: Unable to apply log to corrupted page [page id: space=5, page number=2]; set innodb_force_recovery to ignore 2022-09-01 15:50:35 0 [ERROR] InnoDB: Unable to apply log to corrupted page [page id: space=5, page number=3]; set innodb_force_recovery to ignore 2022-09-01 15:50:35 0 [ERROR] InnoDB: Unable to apply log to corrupted page [page id: space=6, page number=1]; set innodb_force_recovery to ignore 2022-09-01 15:50:35 0 [ERROR] InnoDB: Unable to apply log to corrupted page [page id: space=6, page number=2]; set innodb_force_recovery to ignore 2022-09-01 15:50:35 0 [ERROR] InnoDB: Unable to apply log to corrupted page [page id: space=6, page number=4]; set innodb_force_recovery to ignore 2022-09-01 15:50:35 0 [Note] InnoDB: Set innodb_force_recovery=1 to ignore corrupted pages. 2022-09-01 15:50:35 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1490] with error Data structure corruption 2022-09-01 15:50:35 0 [Note] InnoDB: Starting shutdown... 2022-09-01 15:50:35 0 [ERROR] InnoDB: Unable to apply log to corrupted page [page id: space=0, page number=243]; set innodb_force_recovery to ignore 2022-09-01 15:50:35 0 [ERROR] InnoDB: Failed to read page 243 from file './ibdata1': Page read from tablespace is corrupted. 2022-09-01 15:50:35 0 [ERROR] InnoDB: Unable to apply log to corrupted page [page id: space=1, page number=3]; set innodb_force_recovery to ignore 2022-09-01 15:50:35 0 [ERROR] InnoDB: Failed to read page 3 from file './mysql/innodb_table_stats.ibd': Page read from tablespace is corrupted. 2022-09-01 15:50:35 0 [ERROR] InnoDB: Unable to apply log to corrupted page [page id: space=2, page number=3]; set innodb_force_recovery to ignore 2022-09-01 15:50:35 0 [ERROR] InnoDB: Failed to read page 3 from file './mysql/innodb_index_stats.ibd': Page read from tablespace is corrupted. 2022-09-01 15:50:35 0 [ERROR] Plugin 'InnoDB' init function returned error. 2022-09-01 15:50:35 0 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed. I may have accidentally reproduced another bug in multi-batch recovery, because I only expected that single page to be corrupted.

Marko Mäkelä added a comment - 2022-09-01 13:34

With the following fix, there will be no log apply failure messages:

diff --git a/storage/innobase/page/page0cur.cc b/storage/innobase/page/page0cur.cc

index 5f7192675dc..3f863d36a3a 100644

--- a/storage/innobase/page/page0cur.cc

+++ b/storage/innobase/page/page0cur.cc

@@ -1648,7 +1648,7 @@ page_cur_insert_rec_low(

     const byte *r= rec;

     const byte *c= cur->rec;

-    const byte *c_end= cur->rec + data_size;

+    const byte *c_end= c + (page_rec_is_infimum(c) ? 8 : data_size);

     static_assert(REC_N_OLD_EXTRA_BYTES == REC_N_NEW_EXTRA_BYTES + 1, "");

     if (c <= insert_buf && c_end > insert_buf)

       c_end= insert_buf;

10.6 40aa94df356cfa000fc43c92ff0061212d1b161d with the patch
2022-09-01 16:25:13 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=54849,54849
2022-09-01 16:25:13 0 [ERROR] InnoDB: Table `test`.`t2` contains unrecognizable instant ALTER metadata
2022-09-01 16:25:13 0 [Note] InnoDB: 1 transaction(s) which must be rolled back or cleaned up in total 6 row operations to undo
2022-09-01 16:25:13 0 [Note] InnoDB: Trx id counter is 1574
2022-09-01 16:25:13 0 [Note] InnoDB: Starting final batch to recover 13 pages from redo log.
2022-09-01 16:25:13 0 [Note] InnoDB: 128 rollback segments are active.
2022-09-01 16:25:13 0 [Note] InnoDB: Rolled back recovered transaction 1571
…
2022-09-01 16:25:13 0 [ERROR] InnoDB: Table `test`.`t2` contains unrecognizable instant ALTER metadata
mariadbd: /mariadb/10.6/storage/innobase/dict/dict0load.cc:2464: dict_table_t* dict_load_table_one(const st_::span<const char>&, dict_err_ignore_t, dict_names_t&): Assertion `!table \|\| (ignore_err & ~DICT_ERR_IGNORE_FK_NOKEY) \|\| !table->is_readable() \|\| !table->corrupted' failed.

That is, we would hit ~~MDEV-29440~~ a little later (this time during innodb_check_version(), invoked by ddl_log_execute_recovery()). I double-checked that without the fix, we will get many ‘extra’ messages for the unrelated pages.

Marko Mäkelä added a comment - 2022-09-01 13:34 With the following fix, there will be no log apply failure messages: diff --git a/storage/innobase/page/page0cur.cc b/storage/innobase/page/page0cur.cc index 5f7192675dc..3f863d36a3a 100644 --- a/storage/innobase/page/page0cur.cc +++ b/storage/innobase/page/page0cur.cc @@ -1648,7 +1648,7 @@ page_cur_insert_rec_low( { const byte *r= rec; const byte *c= cur->rec; - const byte *c_end= cur->rec + data_size; + const byte *c_end= c + (page_rec_is_infimum(c) ? 8 : data_size); static_assert(REC_N_OLD_EXTRA_BYTES == REC_N_NEW_EXTRA_BYTES + 1, ""); if (c <= insert_buf && c_end > insert_buf) c_end= insert_buf; 10.6 40aa94df356cfa000fc43c92ff0061212d1b161d with the patch 2022-09-01 16:25:13 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=54849,54849 2022-09-01 16:25:13 0 [ERROR] InnoDB: Table `test`.`t2` contains unrecognizable instant ALTER metadata 2022-09-01 16:25:13 0 [Note] InnoDB: 1 transaction(s) which must be rolled back or cleaned up in total 6 row operations to undo 2022-09-01 16:25:13 0 [Note] InnoDB: Trx id counter is 1574 2022-09-01 16:25:13 0 [Note] InnoDB: Starting final batch to recover 13 pages from redo log. 2022-09-01 16:25:13 0 [Note] InnoDB: 128 rollback segments are active. 2022-09-01 16:25:13 0 [Note] InnoDB: Rolled back recovered transaction 1571 … 2022-09-01 16:25:13 0 [ERROR] InnoDB: Table `test`.`t2` contains unrecognizable instant ALTER metadata mariadbd: /mariadb/10.6/storage/innobase/dict/dict0load.cc:2464: dict_table_t* dict_load_table_one(const st_::span<const char>&, dict_err_ignore_t, dict_names_t&): Assertion `!table || (ignore_err & ~DICT_ERR_IGNORE_FK_NOKEY) || !table->is_readable() || !table->corrupted' failed. That is, we would hit MDEV-29440 a little later (this time during innodb_check_version() , invoked by ddl_log_execute_recovery() ). I double-checked that without the fix, we will get many ‘extra’ messages for the unrelated pages.

Marko Mäkelä added a comment - 2022-09-01 13:50

The reason for the flood of messages for unrelated pages is the following in log_phys_t::apply():

      case OPTION:

        ut_ad(rlen == 5);

        ut_ad(*l == OPT_PAGE_CHECKSUM);

        if (page_checksum(block, l + 1))

          applied= APPLIED_YES;

page_corrupted:

          sql_print_error("InnoDB: Set innodb_force_recovery=1"

                          " to ignore corruption.");

          recv_sys.set_corrupt_log();

          return applied;

We are flagging the log corrupted only because a single page is corrupted. This will cause all subsequent invocations of recv_recover_page() to evict pages as corrupted:

set_start_lsn:

		if (recv_sys.is_corrupt_log() && !srv_force_recovery) {

…

			buf_pool.corrupted_evict(&block->page,

						 block->page.state() &

						 buf_page_t::LRU_MASK);

It would be better to introduce another return value, say, log_phys_t::APPLIED_CORRUPTED, to indicate that the page was corrupted. Then there would be no ‘collateral damage’ of claiming unrelated pages as corrupted. The recv_sys.set_corrupt_log() must be reserved to cases where the entire log is corrupted.

Marko Mäkelä added a comment - 2022-09-01 13:50 The reason for the flood of messages for unrelated pages is the following in log_phys_t::apply() : case OPTION: ut_ad(rlen == 5); ut_ad(*l == OPT_PAGE_CHECKSUM); if (page_checksum(block, l + 1)) { applied= APPLIED_YES; page_corrupted: sql_print_error( "InnoDB: Set innodb_force_recovery=1" " to ignore corruption." ); recv_sys.set_corrupt_log(); return applied; } We are flagging the log corrupted only because a single page is corrupted. This will cause all subsequent invocations of recv_recover_page() to evict pages as corrupted: set_start_lsn: if (recv_sys.is_corrupt_log() && !srv_force_recovery) { … buf_pool.corrupted_evict(&block->page, block->page.state() & buf_page_t::LRU_MASK); It would be better to introduce another return value, say, log_phys_t::APPLIED_CORRUPTED , to indicate that the page was corrupted. Then there would be no ‘collateral damage’ of claiming unrelated pages as corrupted. The recv_sys.set_corrupt_log() must be reserved to cases where the entire log is corrupted.

Marko Mäkelä added a comment - 2022-09-05 06:53

I think that this bug should also affect normal tables where the string infimum is being inserted in the first PRIMARY KEY column as the first record of the table, but I was unable to repeat an OPT_PAGE_CHECKSUM mismatch in such a case.

In 10.5, recovery would not invoke buf_pool.corrupted_evict() if it fails to apply some log to a page. That code was added to 10.6.9 in ~~MDEV-13542~~.

Marko Mäkelä added a comment - 2022-09-05 06:53 I think that this bug should also affect normal tables where the string infimum is being inserted in the first PRIMARY KEY column as the first record of the table, but I was unable to repeat an OPT_PAGE_CHECKSUM mismatch in such a case. In 10.5, recovery would not invoke buf_pool.corrupted_evict() if it fails to apply some log to a page. That code was added to 10.6.9 in MDEV-13542 .

People

Assignee:: Marko Mäkelä

Reporter:: Marko Mäkelä

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 2022-09-01 09:00

Updated:: 2022-09-06 10:39

Resolved:: 2022-09-05 07:32

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server