[MDEV-20755] InnoDB: Database page corruption on disk or a failed file read of tablespace upon prepare of mariabackup incremental backup - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 10.2(EOL), 10.3(EOL), 10.4(EOL)
Fix Version/s: 10.2.35, 10.3.26, 10.4.16, 10.5.7, 10.6.0
Component/s: Encryption, mariabackup, Storage Engine - InnoDB
Labels:

Description

Note: Can't be a duplicate of ~~MDEV-19348~~ anymore, as it was closed in the previous release set.

10.2 5e65c67c
...
2019-10-05 3:25:49 140645837592384 [Note] InnoDB: Ignoring data file 'test/t1.ibd' with space ID 275, since the redo log references test/t1.ibd with space ID 274.
2019-10-05 3:25:49 140645837592384 [Note] InnoDB: Ignoring data file './test/#sql-2a07_12.ibd' with space ID 275. Another data file called test/t1.ibd exists with the same space ID.
2019-10-05 3:25:49 140645837592384 [Note] InnoDB: Ignoring data file './test/#sql-2a07_12.ibd' with space ID 275. Another data file called test/t1.ibd exists with the same space ID.
2019-10-05 3:25:49 140645837592384 [Warning] InnoDB: Are you sure you are using the right ib_logfiles to start up the database? Log sequence number in the ib_logfiles is 2570499, less than the log sequence number in the first system tablespace file header, 2570528.
2019-10-05 3:25:49 140645837592384 [Note] InnoDB: Starting final batch to recover 963 pages from redo log.
2019-10-05 3:25:49 140645669078784 [ERROR] InnoDB: Database page corruption on disk or a failed file read of tablespace test/t1 page [page id: space=275, page number=3]. You may have to recover from a backup.
2019-10-05 3:25:49 140645669078784 [Note] InnoDB: Uncompressed page, stored checksum in field1 3500439468, calculated checksums for field1: crc32 228036971, innodb 3302731224, page type 17855 == INDEX.none 3735928559, stored checksum in field2 3500439468, calculated checksums for field2: crc32 228036971, innodb 3703238392, none 3735928559, page LSN 0 3445649, low 4 bytes of LSN at page end 3445649, page number (if stored to page already) 3, space id (if created with >= MySQL-4.1.1 and stored already) 275
2019-10-05 3:25:49 140645669078784 [Note] InnoDB: Page may be an index page where index id is 8202515128101958035
2019-10-05 3:25:49 140645669078784 [Note] InnoDB: It is also possible that your operating system has corrupted its own file cache and rebooting your computer removes the error. If the corrupt page is an index page. You can also try to fix the corruption by dumping, dropping, and reimporting the corrupt table. You can use CHECK TABLE to scan your table for corruption. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery.
2019-10-05 3:25:49 140645669078784 [ERROR] InnoDB: Failed to read file 'test/t1.ibd' at offset 3: Page read from tablespace is corrupted.
2019-10-05 3:25:50 140645837592384 [Note] InnoDB: Set innodb_force_recovery=1 to ignore corrupted pages.
2019-10-05 3:25:50 140645837592384 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[2250] with error Data structure corruption
[00] FATAL ERROR: 2019-10-05 03:25:50 mariabackup: innodb_init() returned 39 (Data structure corruption).

The test case is non-deterministic, run with --repeat=N and other options listed below. It usually fails for me within 5 attempts, but it can vary on different machines and builds. The probability seems higher with larger number of tables, columns and data. Running in shm also seems to help.

MTR command line

perl ./mtr bug2.innodb_page_corrupt2 --mysqld=--innodb-encrypt-tables --mysqld=--innodb-encrypt-log --mysqld=--innodb-encryption-threads=4  --mysqld=--file-key-management --mysqld=--file-key-management-filename=`pwd`/std_data/keys.txt --mysqld=--plugin-load-add=file_key_management --mem --repeat=5

--source include/have_innodb.inc

CREATE TABLE t1 (a BINARY(8), b CHAR(8), c INT) ENGINE=InnoDB;

INSERT INTO t1 VALUES ('a','foobar',8);

--exec $XTRABACKUP --backup --target-dir=$MYSQLTEST_VARDIR/tmp/backup_0 --protocol=tcp --port=$MASTER_MYPORT --user=root > $MYSQLTEST_VARDIR/log/backup0.log 2>&1

--connect (con1,localhost,root,,)

--let $conid= `SELECT CONNECTION_ID() AS con14`

--delimiter |

--send

  BEGIN NOT ATOMIC

    LOOP OPTIMIZE TABLE t1;

    END LOOP;

  END |

--delimiter ;

--connection default

--exec $XTRABACKUP --backup --target-dir=$MYSQLTEST_VARDIR/tmp/backup_1 --incremental-basedir=$MYSQLTEST_VARDIR/tmp/backup_0 --protocol=tcp --port=$MASTER_MYPORT --user=root > $MYSQLTEST_VARDIR/log/backup1.log 2>&1

--eval KILL $conid

--exec $XTRABACKUP --prepare --apply-log-only --target-dir=$MYSQLTEST_VARDIR/tmp/backup_0 > $MYSQLTEST_VARDIR/log/prepare0.log 2>&1

--error 0,1

--exec $XTRABACKUP --prepare --apply-log-only --target-dir=$MYSQLTEST_VARDIR/tmp/backup_0 --incremental-dir=$MYSQLTEST_VARDIR/tmp/backup_1 > $MYSQLTEST_VARDIR/log/prepare1.log 2>&1

if ($sys_errno)

    --cat_file $MYSQLTEST_VARDIR/log/prepare1.log

    --die "Encountered error upon incremental backup"

# Cleanup

DROP TABLE t1;

--rmdir $MYSQLTEST_VARDIR/tmp/backup_0

--rmdir $MYSQLTEST_VARDIR/tmp/backup_1

Fails on 10.2-10.4, all of debug, ASAN, non-debug builds.
10.1 fails with something irrelevant, I didn't pursue it.
Couldn't reproduce on 10.5.

Attachments

Issue Links

causes

MDEV-24026 InnoDB: Failing assertion: os_total_large_mem_allocated >= size upon incremental backup

Closed

relates to

MDEV-23720 Change innodb_log_optimize_ddl=OFF by default

Closed

MDEV-12699 Improve crash recovery of corrupted data pages

Closed

MDEV-20688 Recovery crashes after unnecessarily reading a corrupted page

Closed

MDEV-21572 buf_page_get_gen() should apply buffered page initialized redo log during recovery

Closed

MDEV-21722 Mariabackup - Preparing incremental backup crash with mysqld got signal 6

Closed

mentioned in: Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...

(1 relates to, 6 mentioned in)

Activity

Ascending order - Click to sort in descending order

View 5 older comments

Vladislav Lesin added a comment - 2020-10-06 08:16 - edited

Briefly.

The first page of the tablespace is rewritten during preparing incremental backup because tablespace is "created" instead of opening the existing tablespace, because tablespace id from *.meta file is not equal to the tablespace id of data file in full backup directory.

Detailed.

I have repeated the bug locally with the test, provided by Elena, and done the following analysis.

1) The content of "corrupted" page read by mariabackup during incremental --backup and --prepare is the same. But during --backup the page passes validation, and during --prepare it does not pass.

2) The "corrupted" page is encrypted. In mariabackup if the page is encrypted, buf_page_is_corrupted() is invoked during backup only if --extended_validation option is set. But even if we set the option, buf_page_is_corrupted() returns false on --backup, and true on --prepare. The difference is that during --backup the page is passed to buf_page_is_corrupted() after decryption, but during --prepare the page is not decrypted before it is passed to the function.

3) The page is not decrypted in --prepare because space->crypt_data is NULL. space->crypt_data is NULL because fil_space_read_crypt_data() can't find CRYPT_MAGIC in the first page. So it turned out that the first page of the tablespace after the second --prepare does not contain crypt data, but it does contain it after the first --prepare. What means the first page was rewritten during incremental --prepare.

4) When innodb_log_optimize_ddl is enabled, backup_optimized_ddl_op() callback is invoked when MLOG_INDEX_LOAD is read on incremental --backup. The tablespace is pushed in the special container, and then is treated as a newly created tablespace in backup_fix_ddl(), and re-copied in *.new file. But as this is incremental backup, all copied data files are passed through wf_incremental filter, which creates *.delta file, and copies only the pages with LSN greater then incremental_lsn. So after backup_fix_ddl() is executed there will be *.new.delta file with the only page - "corrupted" page from (2), the file does not contain the first page of the tablespace.

5) When incremental --prepare is executed(the second --prepare in Elena's test), the *.new.delta file is renamed to *.ibd.delta file, and applied as regular *.delta file. It turned out that the first page of the tablespace is rewritten when delta is applied with the following stack trace:

 #0  pfs_os_file_write_func (type=..., name=0x7ffd5d596cb0 "./test/t1.ibd", file=..., buf=0x557322804000, offset=0, n=16384,

     src_file=0x55731f873190 "./extra/mariabackup/xtrabackup.cc", src_line=4705)

     at ./storage/innobase/include/os0file.ic:413

 #1  0x000055731ed20c98 in xb_space_create_file (path=0x7ffd5d596cb0 "./test/t1.ibd", space_id=255, flags=0, file=0x7ffd5d596380)

     at ./extra/mariabackup/xtrabackup.cc:4704

 #2  0x000055731ed217bf in xb_delta_open_matching_space (dbname=0x7ffd5d5982d0 "test", name=0x7ffd5d5970b0 "t1.ibd", info=...,

     real_name=0x7ffd5d596cb0 "./test/t1.ibd", real_name_len=512, success=0x7ffd5d5969e6)

     at ./extra/mariabackup/xtrabackup.cc:4897

 #3  0x000055731ed21c0e in xtrabackup_apply_delta (dirname=0x5573203c5e00 <xtrabackup_real_incremental_dir> "/dev/shm/b/mysql-test/var/tmp/backup_1/",

     dbname=0x7ffd5d5982d0 "test", filename=0x7ffd5d5992b0 "t1.ibd.delta")

     at ./extra/mariabackup/xtrabackup.cc:4991

 #4  0x000055731ed22e8a in xb_process_datadir (path=0x5573203c5e00 <xtrabackup_real_incremental_dir> "/dev/shm/b/mysql-test/var/tmp/backup_1/",

     suffix=0x55731f878b24 ".delta", func=0x55731ed21811 <xtrabackup_apply_delta(char const*, char const*, char const*, void*)>)

     at ./extra/mariabackup/xtrabackup.cc:5337

 #5  0x000055731ed22f84 in xtrabackup_apply_deltas () at ./extra/mariabackup/xtrabackup.cc:5374

 #6  0x000055731ed244d0 in xtrabackup_prepare_func (argv=0x55732279b6b8)

     at ./extra/mariabackup/xtrabackup.cc:5597

So the new tablespace is "created" instead of opening the existing tablespace because tablespace id from *.meta file(the tablespace id of *.new.delta file, created on (4)) can not be found in fil_system->spaces (fil_space_get_by_id() invoked from xb_delta_open_matching_space() returns NULL).

fil_system->spaces is filled in xb_load_tablespaces() before applying delta's. It iterates data files in full backup directory and fill fil_system->spaces.

So the first page of the tablespace is rewritten during preparing incremental backup because tablespace is "created" instead of opening the existing tablespace, because tablespace id from *.meta file is not equal to the tablespace id of data file in full backup directory.

How to fix?

I see the following variants:

1) I suppose that if tablespace id is changed during incremental backup, all non-zero pages of this tablespace contain new tablespace id, i.e. all non-zero pages were changed during incremental backup, so there is no need to get delta in backup_fix_ddl() for such data files, we could just copy the whole data file as *.new.

2) Check if data file exists in full backup directory in xb_delta_open_matching_space(), then change it's id, and invoke fil_space_create() only for non-existing files.

Vladislav Lesin added a comment - 2020-10-06 08:16 - edited Briefly. The first page of the tablespace is rewritten during preparing incremental backup because tablespace is "created" instead of opening the existing tablespace, because tablespace id from *.meta file is not equal to the tablespace id of data file in full backup directory. Detailed. I have repeated the bug locally with the test, provided by Elena, and done the following analysis. 1) The content of "corrupted" page read by mariabackup during incremental --backup and --prepare is the same. But during --backup the page passes validation, and during --prepare it does not pass. 2) The "corrupted" page is encrypted. In mariabackup if the page is encrypted, buf_page_is_corrupted() is invoked during backup only if --extended_validation option is set. But even if we set the option, buf_page_is_corrupted() returns false on --backup, and true on --prepare. The difference is that during --backup the page is passed to buf_page_is_corrupted() after decryption, but during --prepare the page is not decrypted before it is passed to the function. 3) The page is not decrypted in --prepare because space->crypt_data is NULL. space->crypt_data is NULL because fil_space_read_crypt_data() can't find CRYPT_MAGIC in the first page. So it turned out that the first page of the tablespace after the second --prepare does not contain crypt data, but it does contain it after the first --prepare. What means the first page was rewritten during incremental --prepare. 4) When innodb_log_optimize_ddl is enabled, backup_optimized_ddl_op() callback is invoked when MLOG_INDEX_LOAD is read on incremental --backup. The tablespace is pushed in the special container, and then is treated as a newly created tablespace in backup_fix_ddl(), and re-copied in *.new file. But as this is incremental backup, all copied data files are passed through wf_incremental filter, which creates *.delta file, and copies only the pages with LSN greater then incremental_lsn. So after backup_fix_ddl() is executed there will be *.new.delta file with the only page - "corrupted" page from (2), the file does not contain the first page of the tablespace. 5) When incremental --prepare is executed(the second --prepare in Elena's test), the *.new.delta file is renamed to *.ibd.delta file, and applied as regular *.delta file. It turned out that the first page of the tablespace is rewritten when delta is applied with the following stack trace: #0 pfs_os_file_write_func (type=..., name=0x7ffd5d596cb0 "./test/t1.ibd", file=..., buf=0x557322804000, offset=0, n=16384, src_file=0x55731f873190 "./extra/mariabackup/xtrabackup.cc", src_line=4705) at ./storage/innobase/include/os0file.ic:413 #1 0x000055731ed20c98 in xb_space_create_file (path=0x7ffd5d596cb0 "./test/t1.ibd", space_id=255, flags=0, file=0x7ffd5d596380) at ./extra/mariabackup/xtrabackup.cc:4704 #2 0x000055731ed217bf in xb_delta_open_matching_space (dbname=0x7ffd5d5982d0 "test", name=0x7ffd5d5970b0 "t1.ibd", info=..., real_name=0x7ffd5d596cb0 "./test/t1.ibd", real_name_len=512, success=0x7ffd5d5969e6) at ./extra/mariabackup/xtrabackup.cc:4897 #3 0x000055731ed21c0e in xtrabackup_apply_delta (dirname=0x5573203c5e00 <xtrabackup_real_incremental_dir> "/dev/shm/b/mysql-test/var/tmp/backup_1/", dbname=0x7ffd5d5982d0 "test", filename=0x7ffd5d5992b0 "t1.ibd.delta") at ./extra/mariabackup/xtrabackup.cc:4991 #4 0x000055731ed22e8a in xb_process_datadir (path=0x5573203c5e00 <xtrabackup_real_incremental_dir> "/dev/shm/b/mysql-test/var/tmp/backup_1/", suffix=0x55731f878b24 ".delta", func=0x55731ed21811 <xtrabackup_apply_delta(char const*, char const*, char const*, void*)>) at ./extra/mariabackup/xtrabackup.cc:5337 #5 0x000055731ed22f84 in xtrabackup_apply_deltas () at ./extra/mariabackup/xtrabackup.cc:5374 #6 0x000055731ed244d0 in xtrabackup_prepare_func (argv=0x55732279b6b8) at ./extra/mariabackup/xtrabackup.cc:5597 So the new tablespace is "created" instead of opening the existing tablespace because tablespace id from *.meta file(the tablespace id of *.new.delta file, created on (4)) can not be found in fil_system->spaces (fil_space_get_by_id() invoked from xb_delta_open_matching_space() returns NULL). fil_system->spaces is filled in xb_load_tablespaces() before applying delta's. It iterates data files in full backup directory and fill fil_system->spaces. So the first page of the tablespace is rewritten during preparing incremental backup because tablespace is "created" instead of opening the existing tablespace, because tablespace id from *.meta file is not equal to the tablespace id of data file in full backup directory. How to fix? I see the following variants: 1) I suppose that if tablespace id is changed during incremental backup, all non-zero pages of this tablespace contain new tablespace id, i.e. all non-zero pages were changed during incremental backup, so there is no need to get delta in backup_fix_ddl() for such data files, we could just copy the whole data file as *.new. 2) Check if data file exists in full backup directory in xb_delta_open_matching_space(), then change it's id, and invoke fil_space_create() only for non-existing files.

Vladislav Lesin added a comment - 2020-10-14 13:22 - edited

I failed to create stable test case for it. The test scenario I am trying to implement, is the following:

1) Disable pages flushing.
2) Create table.
3) Make full backup.
4) Insert some data.
5) Star incremental backup.
6) Execute "OPTIMIZE TABLE" just after the tablespace is copied during backup.
7) Finish backup.
8) Prepare full and incremental backup. The latter must fail.

As page 0 is not flushed by the server, it's LSN is 0 just after the table is optimized, and that is why the page is not copied to .delta file during incremental backup, and that is why the tablespace is created during incremental prepare as there is no tablespace with the same id during prepare, and the tablespace is created without CRYPT_MAGIC mark on page 0. I supposed that the following page_is_corrupted() call must fail as it will check crc's for encrypted page which was not decrypted as fil_space_read_crypt_data() invoked on tablespace open does not read crypt data from page 0.

But the test is unstable because the sequence of recv_recover_page() calls for different pages is not specified during recovery, and sometimes fil_parse_write_crypt_data() for page 0 is invoked before page_is_corrupted() for any other page, and the page is decrypted successfully before crc check as crypt data is set correctly for the tablespace in fil_parse_write_crypt_data().

We could insert DBUG_EXECUTE_IF() in fil_parse_write_crypt_data() to avoid space->crypt_data initialization on MLOG_FILE_WRITE_CRYPT_DATA redo record execution to make the test stable.

But I am not sure this is correct because I am not sure that changing any data on some page recovery, which could influence recovery process of the other pages, is correct. It's supposed that pages recovery can be done independently for different pages. So fil_parse_write_crypt_data() call from recv_recover_page() looks very suspicious.

Vladislav Lesin added a comment - 2020-10-14 13:22 - edited I failed to create stable test case for it. The test scenario I am trying to implement, is the following: 1) Disable pages flushing. 2) Create table. 3) Make full backup. 4) Insert some data. 5) Star incremental backup. 6) Execute "OPTIMIZE TABLE" just after the tablespace is copied during backup. 7) Finish backup. 8) Prepare full and incremental backup. The latter must fail. As page 0 is not flushed by the server, it's LSN is 0 just after the table is optimized, and that is why the page is not copied to .delta file during incremental backup, and that is why the tablespace is created during incremental prepare as there is no tablespace with the same id during prepare, and the tablespace is created without CRYPT_MAGIC mark on page 0. I supposed that the following page_is_corrupted() call must fail as it will check crc's for encrypted page which was not decrypted as fil_space_read_crypt_data() invoked on tablespace open does not read crypt data from page 0. But the test is unstable because the sequence of recv_recover_page() calls for different pages is not specified during recovery, and sometimes fil_parse_write_crypt_data() for page 0 is invoked before page_is_corrupted() for any other page, and the page is decrypted successfully before crc check as crypt data is set correctly for the tablespace in fil_parse_write_crypt_data(). We could insert DBUG_EXECUTE_IF() in fil_parse_write_crypt_data() to avoid space->crypt_data initialization on MLOG_FILE_WRITE_CRYPT_DATA redo record execution to make the test stable. But I am not sure this is correct because I am not sure that changing any data on some page recovery, which could influence recovery process of the other pages, is correct. It's supposed that pages recovery can be done independently for different pages. So fil_parse_write_crypt_data() call from recv_recover_page() looks very suspicious.

Vladislav Lesin added a comment - 2020-10-20 11:59

Pushed https://github.com/MariaDB/server/tree/bb-10.2-MDEV-20755-incr-copy-new-files for testing.

Vladislav Lesin added a comment - 2020-10-20 11:59 Pushed https://github.com/MariaDB/server/tree/bb-10.2-MDEV-20755-incr-copy-new-files for testing.

Vladislav Lesin added a comment - 2020-10-22 09:08

There will be conflicts on merging it to 10.4, they are resolved in https://github.com/MariaDB/server/tree/10.4-MDEV-20755-incr-copy-new-files branch. The fix should also be merged in 10.5+ despite the issue does not affect 10.5, because it processes files, created by some DDL during backup, more effectively for incremental backup and prepare. As it is a performance fix only for 10.5, there is no need to merge the test case for it. Besides, the test case will always pass for 10.5+ as ddl log optimization is deprecated for those versions. 10.5 conflicts are resolved here: https://github.com/MariaDB/server/tree/10.5-MDEV-20755-incr-copy-new-files.

Vladislav Lesin added a comment - 2020-10-22 09:08 There will be conflicts on merging it to 10.4, they are resolved in https://github.com/MariaDB/server/tree/10.4-MDEV-20755-incr-copy-new-files branch. The fix should also be merged in 10.5+ despite the issue does not affect 10.5, because it processes files, created by some DDL during backup, more effectively for incremental backup and prepare. As it is a performance fix only for 10.5, there is no need to merge the test case for it. Besides, the test case will always pass for 10.5+ as ddl log optimization is deprecated for those versions. 10.5 conflicts are resolved here: https://github.com/MariaDB/server/tree/10.5-MDEV-20755-incr-copy-new-files .

Vladislav Lesin added a comment - 2020-10-23 08:29

The fix was reviewed by wlad.

Vladislav Lesin added a comment - 2020-10-23 08:29 The fix was reviewed by wlad .

MariaDB Server

InnoDB: Database page corruption on disk or a failed file read of tablespace upon prepare of mariabackup incremental backup