[MDEV-27424] mariabackup ignores physically corrupt first pages Created: 2022-01-05  Updated: 2022-03-15  Resolved: 2022-03-05

Status: Closed
Project: MariaDB Server
Component/s: mariabackup
Affects Version/s: None
Fix Version/s: N/A

Type: Bug Priority: Critical
Reporter: Anders Karlsson Assignee: Anders Karlsson
Resolution: Incomplete Votes: 1
Labels: None

Issue Links:
Relates
relates to MDEV-14992 BACKUP: in-server backup Open
relates to MDEV-26326 MDEV-24626 (remove synchronous page0 ... Closed

 Description   

When a tables has a corrupt first page, a backup on said table is largely ignored. The table in question is not backup up at all, there are no errors logged and mariabackup exists without any errors. Using said incremental backup to recover will end up with said table missing. This might be caused by a disk error or something. Innochecksum will detect the issue but not mariabackup. To reproduce:
Get a byte from the first page of the table at some random position and make a note of the value and then write some other value to that position, here we are assuming a table "t1":

$ sudo dd if=/var/lib/mysql/t1.ibd bs=1 skip=1056 count=1 status=none | od -A n -t x1
 ab
$ echo -ne "\xac" | sudo dd of=/var/lib/mysql/test/t1.ibd bs=1 seek=1056 count=1 conv=notrunc status=none

Now run mariabackup. In the target directory there is no .ibd file for the table t1 in the "test" database. The exit status from mariabackup is 0 and there are no errors in the log. One can see that the table t1 is not copied though, beyond the .frm file.

At first I noticed this only for incremental backups, but the situation is the same for full backups.



 Comments   
Comment by Marko Mäkelä [ 2022-01-11 ]

Here is a start of an mtr test case.

CREATE TABLE t1(c INT) ENGINE INNODB;
FLUSH TABLES t1 FOR EXPORT;
--let MYSQLD_DATADIR=`select @@datadir`
perl;
open(F, "+<", "$ENV{MYSQLD_DATADIR}/test/t1.ibd") || die;
seek(F, 1000, 0);
print F "garbage";
close(F);
EOF
 
echo # xtrabackup backup;
let $targetdir=$MYSQLTEST_VARDIR/tmp/backup;
--disable_result_log
exec $XTRABACKUP --defaults-file=$MYSQLTEST_VARDIR/my.cnf --backup --target-dir=$targetdir;
--enable_result_log
 
UNLOCK TABLES;
DROP TABLE t1;

karlsson, which version are you using? In 10.5, this fails just fine:

2022-01-11 21:39:42 0 [Note] InnoDB: Checksum mismatch in datafile: ./test/t1.ibd, Space ID:5, Flags: 21
[00] FATAL ERROR: 2022-01-11 21:39:42 Failed to validate first page of the file test/t1, error 39

For 10.6 or later, rr record can be added after exec to get an rr replay trace to find out why the file is not being copied. The corrupted first page is being read and found to be invalid in the following call trace:

#2  buf_page_is_corrupted (check_lsn=check_lsn@entry=false, read_buf=0x5559c7e2c000 "", fsp_flags=<optimized out>) at /mariadb/10.8/storage/innobase/buf/buf0buf.cc:611
#3  0x00005559c65fe596 in Datafile::validate_first_page (this=this@entry=0x5559c7d383d0) at /mariadb/10.8/storage/innobase/fsp/fsp0file.cc:539
#4  0x00005559c5d647d0 in xb_load_single_table_tablespace (dirname=<optimized out>, filname=<optimized out>, is_remote=<optimized out>, skip_node_page0=<optimized out>, defer_space_id=0)
    at /mariadb/10.8/extra/mariabackup/xtrabackup.cc:3401
#5  0x00005559c5d65b05 in enumerate_ibd_files (callback=callback@entry=0x5559c5d643f3 <xb_load_single_table_tablespace(char const*, char const*, bool, bool, uint32_t)>)
    at /mariadb/10.8/extra/mariabackup/xtrabackup.cc:3791

The following code in Datafile::validate_first_page() might not be entirely appropriate:

		if (recv_recovery_is_on()
		    || srv_operation == SRV_OPERATION_BACKUP) {
			m_defer= true;
			return DB_SUCCESS;
		}

Generated at Thu Feb 08 09:52:48 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.