[MDEV-23982] Mariabackup hangs during backup Created: 2020-10-19  Updated: 2020-10-19  Resolved: 2020-10-19

Status: Closed
Project: MariaDB Server
Component/s: Backup
Affects Version/s: 10.2, 10.3, 10.4, 10.5, 10.6
Fix Version/s: 10.2.35, 10.3.26, 10.4.16, 10.5.7

Type: Bug Priority: Major
Reporter: Marko Mäkelä Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: backup, hang

Issue Links:
Problem/Incident
is caused by MDEV-13318 Crash recovery failure after the serv... Closed

 Description   

MDEV-13318 introduced a condition to Mariabackup that can cause it to hang if the server goes idle after a particular kind of redo log write where the last written redo log block has no payload after the 12-byte header. In log0recv.cc we would treat blocks with less than 12 bytes of length as corrupted, but a block with exactly 12 bytes is fine. However, Mariabackup would disregard blocks that have exactly 12 bytes of length.

The problem was found in the MDEV-23855 branch via a test failure like this:

mariabackup.incremental_encrypted '4k,innodb' w2 [ fail ]  timeout after 900 seconds
        Test ended at 2020-10-19 08:16:36
 
Test case timeout after 900 seconds
 
== /dev/shm/var/2/log/incremental_encrypted.log == 
call mtr.add_suppression("InnoDB: New log files created");
CREATE TABLE t(i INT) ENGINE INNODB ENCRYPTED=YES;
INSERT INTO t VALUES(1);
# Create full backup , modify table, then create incremental/differential backup
SET GLOBAL innodb_flush_log_at_trx_commit = 1;
INSERT INTO t VALUES(2);
SELECT * FROM t;
i
1
2
 
 == /dev/shm/var/2/tmp/analyze-timeout-mysqld.1.err ==
mysqltest: Could not open connection 'default' after 500 attempts: 2002 Can't connect to local MySQL server through socket '/dev/shm/var/tmp/2/mysqld.1.sock' (111)

The MariaDB server process is idle, and the reason for the hang was that mariabackup --backup in xtrabackup_copy_logfile() would never reach the end of the log, and instead kept complaining that the last block is corrupted, which is it not:

[00] 2020-10-19 12:56:20 >> log scanned up to (59392)
[00] 2020-10-19 12:56:21 garbage block: 59392,12
[00] 2020-10-19 12:56:21 >> log scanned up to (59392)
[00] 2020-10-19 12:56:22 garbage block: 59392,12
[00] 2020-10-19 12:56:22 >> log scanned up to (59392)

The fix is simple: accept a log block with no payload bytes:

diff --git a/extra/mariabackup/xtrabackup.cc b/extra/mariabackup/xtrabackup.cc
index 13c10c2921d..8ad9af9a9c9 100644
--- a/extra/mariabackup/xtrabackup.cc
+++ b/extra/mariabackup/xtrabackup.cc
@@ -2650,7 +2650,7 @@ static lsn_t xtrabackup_copy_log(lsn_t start_lsn, lsn_t end_lsn, bool last)
 			/* We got a full log block. */
 			scanned_lsn += data_len;
 		} else if (data_len >= log_sys.trailer_offset()
-			   || data_len <= LOG_BLOCK_HDR_SIZE) {
+			   || data_len < LOG_BLOCK_HDR_SIZE) {
 			/* We got a garbage block (abrupt end of the log). */
 			msg(0,"garbage block: " LSN_PF ",%zu",scanned_lsn, data_len);
 			break;


Generated at Thu Feb 08 09:26:33 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.