Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
10.2(EOL), 10.3(EOL), 10.4(EOL)
Description
Note: Can't be a duplicate of MDEV-19348 anymore, as it was closed in the previous release set.
10.2 5e65c67c |
...
|
2019-10-05 3:25:49 140645837592384 [Note] InnoDB: Ignoring data file 'test/t1.ibd' with space ID 275, since the redo log references test/t1.ibd with space ID 274.
|
2019-10-05 3:25:49 140645837592384 [Note] InnoDB: Ignoring data file './test/#sql-2a07_12.ibd' with space ID 275. Another data file called test/t1.ibd exists with the same space ID.
|
2019-10-05 3:25:49 140645837592384 [Note] InnoDB: Ignoring data file './test/#sql-2a07_12.ibd' with space ID 275. Another data file called test/t1.ibd exists with the same space ID.
|
2019-10-05 3:25:49 140645837592384 [Warning] InnoDB: Are you sure you are using the right ib_logfiles to start up the database? Log sequence number in the ib_logfiles is 2570499, less than the log sequence number in the first system tablespace file header, 2570528.
|
2019-10-05 3:25:49 140645837592384 [Note] InnoDB: Starting final batch to recover 963 pages from redo log.
|
2019-10-05 3:25:49 140645669078784 [ERROR] InnoDB: Database page corruption on disk or a failed file read of tablespace test/t1 page [page id: space=275, page number=3]. You may have to recover from a backup.
|
2019-10-05 3:25:49 140645669078784 [Note] InnoDB: Uncompressed page, stored checksum in field1 3500439468, calculated checksums for field1: crc32 228036971, innodb 3302731224, page type 17855 == INDEX.none 3735928559, stored checksum in field2 3500439468, calculated checksums for field2: crc32 228036971, innodb 3703238392, none 3735928559, page LSN 0 3445649, low 4 bytes of LSN at page end 3445649, page number (if stored to page already) 3, space id (if created with >= MySQL-4.1.1 and stored already) 275
|
2019-10-05 3:25:49 140645669078784 [Note] InnoDB: Page may be an index page where index id is 8202515128101958035
|
2019-10-05 3:25:49 140645669078784 [Note] InnoDB: It is also possible that your operating system has corrupted its own file cache and rebooting your computer removes the error. If the corrupt page is an index page. You can also try to fix the corruption by dumping, dropping, and reimporting the corrupt table. You can use CHECK TABLE to scan your table for corruption. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery.
|
2019-10-05 3:25:49 140645669078784 [ERROR] InnoDB: Failed to read file 'test/t1.ibd' at offset 3: Page read from tablespace is corrupted.
|
2019-10-05 3:25:50 140645837592384 [Note] InnoDB: Set innodb_force_recovery=1 to ignore corrupted pages.
|
2019-10-05 3:25:50 140645837592384 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[2250] with error Data structure corruption
|
[00] FATAL ERROR: 2019-10-05 03:25:50 mariabackup: innodb_init() returned 39 (Data structure corruption).
|
The test case is non-deterministic, run with --repeat=N and other options listed below. It usually fails for me within 5 attempts, but it can vary on different machines and builds. The probability seems higher with larger number of tables, columns and data. Running in shm also seems to help.
MTR command line |
perl ./mtr bug2.innodb_page_corrupt2 --mysqld=--innodb-encrypt-tables --mysqld=--innodb-encrypt-log --mysqld=--innodb-encryption-threads=4 --mysqld=--file-key-management --mysqld=--file-key-management-filename=`pwd`/std_data/keys.txt --mysqld=--plugin-load-add=file_key_management --mem --repeat=5
|
--source include/have_innodb.inc
|
|
CREATE TABLE t1 (a BINARY(8), b CHAR(8), c INT) ENGINE=InnoDB; |
INSERT INTO t1 VALUES ('a','foobar',8); |
|
--exec $XTRABACKUP --backup --target-dir=$MYSQLTEST_VARDIR/tmp/backup_0 --protocol=tcp --port=$MASTER_MYPORT --user=root > $MYSQLTEST_VARDIR/log/backup0.log 2>&1
|
|
--connect (con1,localhost,root,,)
|
--let $conid= `SELECT CONNECTION_ID() AS con14`
|
--delimiter |
|
--send
|
BEGIN NOT ATOMIC |
LOOP OPTIMIZE TABLE t1; |
END LOOP; |
END | |
--delimiter ;
|
|
--connection default
|
|
--exec $XTRABACKUP --backup --target-dir=$MYSQLTEST_VARDIR/tmp/backup_1 --incremental-basedir=$MYSQLTEST_VARDIR/tmp/backup_0 --protocol=tcp --port=$MASTER_MYPORT --user=root > $MYSQLTEST_VARDIR/log/backup1.log 2>&1
|
|
--eval KILL $conid
|
|
--exec $XTRABACKUP --prepare --apply-log-only --target-dir=$MYSQLTEST_VARDIR/tmp/backup_0 > $MYSQLTEST_VARDIR/log/prepare0.log 2>&1
|
--error 0,1
|
--exec $XTRABACKUP --prepare --apply-log-only --target-dir=$MYSQLTEST_VARDIR/tmp/backup_0 --incremental-dir=$MYSQLTEST_VARDIR/tmp/backup_1 > $MYSQLTEST_VARDIR/log/prepare1.log 2>&1
|
|
if ($sys_errno) |
{
|
--cat_file $MYSQLTEST_VARDIR/log/prepare1.log |
--die "Encountered error upon incremental backup" |
}
|
|
# Cleanup
|
DROP TABLE t1; |
--rmdir $MYSQLTEST_VARDIR/tmp/backup_0
|
--rmdir $MYSQLTEST_VARDIR/tmp/backup_1 |
Fails on 10.2-10.4, all of debug, ASAN, non-debug builds.
10.1 fails with something irrelevant, I didn't pursue it.
Couldn't reproduce on 10.5.
Attachments
Issue Links
- causes
-
MDEV-24026 InnoDB: Failing assertion: os_total_large_mem_allocated >= size upon incremental backup
-
- Closed
-
- relates to
-
MDEV-23720 Change innodb_log_optimize_ddl=OFF by default
-
- Closed
-
-
MDEV-12699 Improve crash recovery of corrupted data pages
-
- Closed
-
-
MDEV-20688 Recovery crashes after unnecessarily reading a corrupted page
-
- Closed
-
-
MDEV-21572 buf_page_get_gen() should apply buffered page initialized redo log during recovery
-
- Closed
-
-
MDEV-21722 Mariabackup - Preparing incremental backup crash with mysqld got signal 6
-
- Closed
-
- mentioned in
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
Briefly.
The first page of the tablespace is rewritten during preparing incremental backup because tablespace is "created" instead of opening the existing tablespace, because tablespace id from *.meta file is not equal to the tablespace id of data file in full backup directory.
Detailed.
I have repeated the bug locally with the test, provided by Elena, and done the following analysis.
1) The content of "corrupted" page read by mariabackup during incremental --backup and --prepare is the same. But during --backup the page passes validation, and during --prepare it does not pass.
2) The "corrupted" page is encrypted. In mariabackup if the page is encrypted, buf_page_is_corrupted() is invoked during backup only if --extended_validation option is set. But even if we set the option, buf_page_is_corrupted() returns false on --backup, and true on --prepare. The difference is that during --backup the page is passed to buf_page_is_corrupted() after decryption, but during --prepare the page is not decrypted before it is passed to the function.
3) The page is not decrypted in --prepare because space->crypt_data is NULL. space->crypt_data is NULL because fil_space_read_crypt_data() can't find CRYPT_MAGIC in the first page. So it turned out that the first page of the tablespace after the second --prepare does not contain crypt data, but it does contain it after the first --prepare. What means the first page was rewritten during incremental --prepare.
4) When innodb_log_optimize_ddl is enabled, backup_optimized_ddl_op() callback is invoked when MLOG_INDEX_LOAD is read on incremental --backup. The tablespace is pushed in the special container, and then is treated as a newly created tablespace in backup_fix_ddl(), and re-copied in *.new file. But as this is incremental backup, all copied data files are passed through wf_incremental filter, which creates *.delta file, and copies only the pages with LSN greater then incremental_lsn. So after backup_fix_ddl() is executed there will be *.new.delta file with the only page - "corrupted" page from (2), the file does not contain the first page of the tablespace.
5) When incremental --prepare is executed(the second --prepare in Elena's test), the *.new.delta file is renamed to *.ibd.delta file, and applied as regular *.delta file. It turned out that the first page of the tablespace is rewritten when delta is applied with the following stack trace:
#0 pfs_os_file_write_func (type=..., name=0x7ffd5d596cb0 "./test/t1.ibd", file=..., buf=0x557322804000, offset=0, n=16384,
src_file=0x55731f873190 "./extra/mariabackup/xtrabackup.cc", src_line=4705)
at ./storage/innobase/include/os0file.ic:413
#1 0x000055731ed20c98 in xb_space_create_file (path=0x7ffd5d596cb0 "./test/t1.ibd", space_id=255, flags=0, file=0x7ffd5d596380)
at ./extra/mariabackup/xtrabackup.cc:4704
#2 0x000055731ed217bf in xb_delta_open_matching_space (dbname=0x7ffd5d5982d0 "test", name=0x7ffd5d5970b0 "t1.ibd", info=...,
real_name=0x7ffd5d596cb0 "./test/t1.ibd", real_name_len=512, success=0x7ffd5d5969e6)
at ./extra/mariabackup/xtrabackup.cc:4897
#3 0x000055731ed21c0e in xtrabackup_apply_delta (dirname=0x5573203c5e00 <xtrabackup_real_incremental_dir> "/dev/shm/b/mysql-test/var/tmp/backup_1/",
dbname=0x7ffd5d5982d0 "test", filename=0x7ffd5d5992b0 "t1.ibd.delta")
at ./extra/mariabackup/xtrabackup.cc:4991
#4 0x000055731ed22e8a in xb_process_datadir (path=0x5573203c5e00 <xtrabackup_real_incremental_dir> "/dev/shm/b/mysql-test/var/tmp/backup_1/",
suffix=0x55731f878b24 ".delta", func=0x55731ed21811 <xtrabackup_apply_delta(char const*, char const*, char const*, void*)>)
at ./extra/mariabackup/xtrabackup.cc:5337
#5 0x000055731ed22f84 in xtrabackup_apply_deltas () at ./extra/mariabackup/xtrabackup.cc:5374
#6 0x000055731ed244d0 in xtrabackup_prepare_func (argv=0x55732279b6b8)
at ./extra/mariabackup/xtrabackup.cc:5597
So the new tablespace is "created" instead of opening the existing tablespace because tablespace id from *.meta file(the tablespace id of *.new.delta file, created on (4)) can not be found in fil_system->spaces (fil_space_get_by_id() invoked from xb_delta_open_matching_space() returns NULL).
fil_system->spaces is filled in xb_load_tablespaces() before applying delta's. It iterates data files in full backup directory and fill fil_system->spaces.
So the first page of the tablespace is rewritten during preparing incremental backup because tablespace is "created" instead of opening the existing tablespace, because tablespace id from *.meta file is not equal to the tablespace id of data file in full backup directory.
How to fix?
I see the following variants:
1) I suppose that if tablespace id is changed during incremental backup, all non-zero pages of this tablespace contain new tablespace id, i.e. all non-zero pages were changed during incremental backup, so there is no need to get delta in backup_fix_ddl() for such data files, we could just copy the whole data file as *.new.
2) Check if data file exists in full backup directory in xb_delta_open_matching_space(), then change it's id, and invoke fil_space_create() only for non-existing files.