[MDEV-16519] mariabackup --backup fails with concurrent RENAME TABLE Created: 2018-06-19  Updated: 2020-11-10  Resolved: 2018-06-22

Status: Closed
Project: MariaDB Server
Component/s: Backup
Affects Version/s: 10.1, 10.2, 10.3
Fix Version/s: 10.2.16, 10.3.8

Type: Bug Priority: Major
Reporter: Marko Mäkelä Assignee: Vladislav Vaintroub
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
PartOf
is part of MDEV-16791 mariabackup : allow consistent backup... Closed
Relates
relates to MDEV-24184 InnoDB RENAME TABLE recovery failure ... Closed

 Description   

If a table is being both renamed and written to while Mariabackup is trying to back up the server, the backup may fail. Here is a simple test case:

let $n=10000000;
 
exec echo $XTRABACKUP --defaults-file=$MYSQLTEST_VARDIR/my.cnf  --backup --target-dir=$basedir;
eval create table t$n (a serial) engine=innodb;
 
--disable_query_log
while ($n) {
eval insert into t$n values();
let $o=$n;
dec $n;
eval rename table t$o to t$n;
}
 
--enable_query_log
drop table t0;

You will have to run mariabackup --backup concurrently while the test is running, say, ./mtr mariabackup.rename. For me, the backup fails in the file copying phase, both in 10.1 and 10.2. Here is a 10.2 invocation, with the lock_ddl_per_table parameter, which is supposed to prevent DDL operations:

/dev/shm/10.2/extra/mariabackup/mariabackup --defaults-file=/dev/shm/10.2/mysql-test/var/my.cnf --backup --lock-ddl-per-table --target-dir=/dev/shm/bu

10.2 c55de8d40bba29503773a6a56d6f13f19ca7e339

180619 10:30:24 Connecting to MySQL server host: localhost, user: root, password: set, port: 16000, socket: /dev/shm/10.2/mysql-test/var/tmp/mysqld.1.sock
180619 10:30:24 Locking MDL for `mysql`.`innodb_table_stats`
180619 10:30:24 Locking MDL for `mysql`.`innodb_index_stats`
180619 10:30:24 Locking MDL for `test`.`t9997860`
180619 10:30:24 [01] Copying ibdata1 to /dev/shm/bu/ibdata1
180619 10:30:24 [01]        ...done
180619 10:30:24 [01] Copying ./mysql/innodb_table_stats.ibd to /dev/shm/bu/mysql/innodb_table_stats.ibd
180619 10:30:24 [01]        ...done
180619 10:30:24 [01] Copying ./mysql/innodb_index_stats.ibd to /dev/shm/bu/mysql/innodb_index_stats.ibd
180619 10:30:24 [01]        ...done
[01] mariabackup: error: cannot stat ./test/t9997901.ibd
[01] mariabackup: Error: xtrabackup_copy_datafile() failed.
[01] mariabackup: Error: failed to copy datafile.

Note that an MDL was claimed to have been acquired on the table t9997860. Why did we not attempt to copy t9997860.ibd instead of trying to copy the file with a much older name t9997901.ibd? (The number counts backwards.) Also, maybe we should actually check that the MDL acquisition worked? I would not be surprised if the table had already been renamed to something else while we were trying to acquire the MDL.

Even better, backup with concurrent RENAME TABLE should work without any --lock-ddl-per-table option. The file copying should not lead to termination this easily, and there could be some connection to the MLOG_FILE_RENAME2 records that are being parsed from the redo log.

The backup could fail in different ways. The reason I tested this was that in one case, mariabackup --prepare reported that some files were missing. That is, the backup did not terminate with an error, but it failed to produce a complete result.



 Comments   
Comment by Vladislav Vaintroub [ 2018-06-22 ]

Fixed so that --lock-ddl-per-table now fails in such situation.
The odds of race condtion are really tiny there.

marko, yes, it is possible to parse redo log, but I would not do that, I think the effort to
implement is not proportional to importance of RENAME.

Rather, we need more convenient single "LOCK for BACKUP" , like in MySQL 8.0 or percona, that blocks all DDLs

Generated at Thu Feb 08 08:29:31 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.