[MDEV-27621] Backup fails with FATAL ERROR: Was only able to copy log from .. to .., not .. Created: 2022-01-25  Updated: 2022-02-01  Resolved: 2022-01-27

Status: Closed
Project: MariaDB Server
Component/s: mariabackup
Affects Version/s: 10.8
Fix Version/s: 10.8.1

Type: Bug Priority: Blocker
Reporter: Elena Stepanova Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Problem/Incident
is caused by MDEV-14425 Change the InnoDB redo log format to ... Closed
Relates
relates to MDEV-14992 BACKUP: in-server backup Open

 Description   

The test performs a number of tiny updates on a single table, and after they are done, runs mariabackup --backup. On earlier versions, it works fine, but on 10.8 it fails almost reliably with

10.8 05050867

[00] FATAL ERROR: 2022-01-25 20:17:19 Was only able to copy log from 25657966 to 27195457, not 27418414; try increasing innodb_log_file_size

The value of innodb_log_file_size is indeed small, in order to minimize the test, but the issue is scalable. I started from the default 96M, with several tables with more columns, more keys, and bigger number of UPDATEs, but otherwise very similar load. I suppose it scales further up the same way, I didn't try.

It fails for me most of the times, both in shm and on SSD, but it's not deterministic and the result can vary on different machines and builds.
If it doesn't fail for you, try to re-run. If it continues not to fail, try to increase the number of cycles. It also seems to fail more willingly on faster (non-debug) builds.

Possibly it can be made more stable by more precise handling of the InnoDB log.

--source include/have_innodb.inc
 
CREATE TABLE t (pk int,
 c1 int, c2 int, c3 int, c4 int, c5 int, c6 int, c7 int,
 c8 int, c9 int, c10 int, c11 int, c12 int, c13 int, c14 int,
 c15 int, c16 int, c17 int, c18 int, c19 int, c20 int, c21 int,
 c22 int, c23 int, c24 int, c25 int, c26 int, c27 int, c28 int,
 c29 int, c30 int, c31 int, c32 int, c33 int, c34 int, c35 int,
 c36 int, c37 int, c38 int, c39 int, c40 int, c41 int, c42 int,
 c43 int, c44 int, c45 int, c46 int, c47 int, c48 int, c49 int,
 primary key (pk),
 key (c1), key (c2), key (c3), key (c4), key (c5), key (c6), key (c7),
 key (c8), key (c9), key (c10), key (c11), key (c12), key (c13), key (c14),
 key (c15), key (c16), key (c17), key (c18), key (c19), key (c20), key (c21),
 key (c22), key (c23), key (c24), key (c25), key (c26), key (c27), key (c28),
 key (c29), key (c30), key (c31), key (c32), key (c33), key (c34), key (c35),
 key (c36), key (c37), key (c38), key (c39), key (c40), key (c41), key (c42),
 key (c43), key (c44), key (c45), key (c46), key (c47), key (c48), key (c49)
) ENGINE=InnoDB;
INSERT INTO t (pk) VALUES (1);
 
--let $restart_parameters= --innodb-log-file-size=2M
--source include/restart_mysqld.inc
 
SET rand_seed1= 1, rand_seed2= 1;
--disable_query_log
 
--let $cycles= 2000
while ($cycles)
{
  --let $columns= 49
  while ($columns)
  {
    --eval UPDATE t AS alias1 SET alias1.c$columns = ROUND(RAND())*255 - 128;
    --dec $columns
  }
  UPDATE t AS alias1 SET alias1.pk = ROUND(RAND())*255 - 128;
  --dec $cycles
}
--enable_query_log
 
--error 0,1
--exec $XTRABACKUP --backup --target-dir=$MYSQL_TMP_DIR/mbackup_dir --protocol=tcp --port=$MASTER_MYPORT --user=root > $MYSQL_TMP_DIR/mbackup.log 2>&1
if ($sys_errno)
{
  --echo ############ Tail of MariaBackup error log: ############
  --exec tail $MYSQL_TMP_DIR/mbackup.log
  --echo ########################################################
  --die "Backup failed"
}
 
# Cleanup
DROP TABLE t;

The failure started happening on 10.8 after this commit:

commit 685d958e38b825ad9829be311f26729cccf37c46 (HEAD)
Author: Marko Mäkelä <marko.makela@mariadb.com>
Date:   Fri Jan 21 16:03:47 2022 +0200
 
    MDEV-14425 Improve the redo log for concurrency



 Comments   
Comment by Marko Mäkelä [ 2022-01-26 ]

The test works for me with the cycles values of 2000, 5000, 10000 when running on /dev/shm and using libpmem (mmap interface to the server’s 2-megabyte ib_logfile0).

I then switched to the pwrite and pread based interface:

mkdir /run/user/$UID/var
rm -fr var
ln -s /run/user/$UID/var var
./mtr mariabackup.MDEV-27621

Also with this (and cycles=10000) the test passes for me:

10.8 93756c992f18e86e380cb02078c4ba4cea123e5a

mariabackup.MDEV-27621 'innodb'              [ pass ]  121680

On /dev/shm (fake PMEM) it was slightly faster, again with cycles=10000:

mariabackup.MDEV-27621 'innodb'              [ pass ]  111996

On NVMe, I saw about 16 MiB/s writes and 2.5 MiB/s reads happening while the test is running. And the test finally failed (already with cycles=2000):

[00] 2022-01-26 15:15:36 Retrying read of log at LSN=135608916
[00] FATAL ERROR: 2022-01-26 15:15:36 Was only able to copy log from 134574484 to 135608916, not 136398636; try increasing innodb_log_file_size

The likely reason why you see this happening after MDEV-14425 is that some memset() and my_crc32c() calls that were executed inside write_lock or flush_lock were removed. Even though this is a single-threaded test, it could be that removing that memset() overhead increased the relative speed of the server compared to the backup process.

I tried once more, with innodb_flush_method=O_DSYNC so that O_DIRECT I/O will be used on the log file. Again, I am seeing about the same write and read rates.

[00] FATAL ERROR: 2022-01-26 15:22:34 Was only able to copy log from 134513958 to 135608964, not 136396766; try increasing innodb_log_file_size

I do not think that there is any easy fix for this. Because there are much more writes happening than reads, the issue cannot even be that mariadb-backup --backup would be unnecessarily re-reading too large portions of the log file.

elenst, when you use a larger redo log (and larger workload), will the test pass if you invoke mariadb-backup with a larger innodb_log_buffer_size? The minimum value is 2MiB (changed in MDEV-14425). The maximum useful value would be the innodb_log_file_size of the server.

Ever since I heard about the closed-source InnoDB Hot Backup product (ibbackup) in 2003 or 2004, I thought that the inter-process communication via the database log and data files is error-prone and broken by design. This is just another proof of that.

As far as I can tell, the only reason why the separate ibbackup process was created was licensing: Innobase Oy wanted to get additional revenue directly from end users, by offering a closed-source online backup program. Later, Percona XtraBackup was created. It implemented the fairly small part of the closed-source code. At that time, some #ifdef UNIV_HOTBACKUP hints were hepfully available in the InnoDB storage engine code. And yet later, mariabackup or mariadb-backup was created out of xtrabackup, initially, to support file format changes that are unique to MariaDB.

This problem would ultimately be fixed by implementing server-side backup (MDEV-14992), so that the same server process will produce both the main (circular) server log and a replica (append-only) log that will be needed to recover or prepare the backup.

Comment by Elena Stepanova [ 2022-01-26 ]

Elena Stepanova, when you use a larger redo log (and larger workload), will the test pass if you invoke mariadb-backup with a larger innodb_log_buffer_size? The minimum value is 2MiB (changed in MDEV-14425). The maximum useful value would be the innodb_log_file_size of the server.

Normal non-MTR-downgraded tests run (and now frequently fail) with default innodb_log_file_size and innodb_log_buffer_size, which for 10.8 are now 96M and 16M accordingly. Same tests running with default values never failed this way before MDEV-14425. Which part of MDEV-14425 caused the difference, I can't say.

Comment by Marko Mäkelä [ 2022-01-26 ]

I tested the following modification to the test case:

@@ -19,13 +19,13 @@
 ) ENGINE=InnoDB;
 INSERT INTO t (pk) VALUES (1);
 
---let $restart_parameters= --innodb-log-file-size=2M
+--let $restart_parameters= --innodb-log-file-size=10M
 --source include/restart_mysqld.inc
 
 SET rand_seed1= 1, rand_seed2= 1;
 --disable_query_log
 
---let $cycles= 2000
+--let $cycles= 10000
 while ($cycles)
 {
   --let $columns= 49
@@ -40,7 +40,7 @@
 --enable_query_log
 
 --error 0,1
---exec $XTRABACKUP --backup --target-dir=$MYSQL_TMP_DIR/mbackup_dir --protocol=tcp --port=$MASTER_MYPORT --user=root > $MYSQL_TMP_DIR/mbackup.log 2>&1
+--exec $XTRABACKUP --backup --innodb-log-buffer-size=10M --target-dir=$MYSQL_TMP_DIR/mbackup_dir --protocol=tcp --port=$MASTER_MYPORT --user=root > $MYSQL_TMP_DIR/mbackup.log 2>&1
 if ($sys_errno)
 {
   --echo ############ Tail of MariaBackup error log: ############

That test passed on my NVMe, using a gcc -Og debug build:

10.8 93756c992f18e86e380cb02078c4ba4cea123e5a

mariabackup.MDEV-27621 'innodb'          [ pass ]  120690

Note: I saw a great reduction of the write rate, possibly because the server would be less busy with checkpoint flushing due to the ridiculously small redo log size.

We should keep in mind that normally, we recommend the buffer pool size to be something like 80% of the available RAM, and (now that MDEV-26784 is fixed) the redo log size to be at least equal to the buffer pool size. Obviously we cannot let a concurrently running backup process to hog too much memory for its innodb_log_buffer_size. We would just want some sanity check that a backup is still doable with reasonable file and buffer size parameters.

Perhaps mleich can play with this further? I think that we need reasonable parameters, let’s say, at least innodb_log_file_size=96M which I believe is the default. Increase the cycles until it starts to fail (if it starts to fail at all), and then see what is the minimum value of --innodb-log-buffer-size with which mariadb-backup will start to work. Its default value 2M can definitely be too small. The server’s innodb_log_buffer_size should not matter.

Comment by Elena Stepanova [ 2022-01-26 ]

I don't know what you mean by "default value 2M". As I said, the default value of -innodb-log-buffer-size is 16M, assuming that mariabackup uses the same defaults as the server.
And I apparently have to repeat again:
The problem has nothing to do with 2M log file size, if that's the only excuse for not fixing it then it doesn't fit. it is happening with the default server options, which are 96M/16M.

Comment by Marko Mäkelä [ 2022-01-26 ]

For mariadb-backup, the default innodb_log_buffer_size (which as of MDEV-14425 limits how much log will be be pread() at a time during backup) is only 2MiB. The server-side parameter does not matter there. However, the server-side innodb_log_file_size and possibly innodb_buffer_pool_size matter.

Comment by Elena Stepanova [ 2022-01-26 ]

Okay. In my tests it also fails with mariabackup --backup --innodb-log-buffer-size=96M, plus at least server-side

  • default innodb_log_file_size and innodb_buffer_pool_size;
  • innodb-log-file-size=256M and innodb-buffer-pool-size=128M;
  • innodb-log-file-size=256M innodb-buffer-pool-size=1G

With this, I'm passing over further experiments to mleich.

Comment by Marko Mäkelä [ 2022-01-27 ]

I can confirm that this is indeed directly caused by an implementation deficiency in MDEV-14425, and not indirectly due to the design problem that MDEV-14992 would address later.

The problem is that when the circular ib_logfile0 wraps around, then mariadb-backup --backup will only (re)read the portion from the current position. I added rr record to the exec line in the provided script and on the second try, it failed with an extreme case. We were re-reading the last 512 bytes, and each time got PREMATURE_EOF from the log parser, for trying to parse the last 16 bytes of that block.

diff --git a/extra/mariabackup/xtrabackup.cc b/extra/mariabackup/xtrabackup.cc
index a96fdf36132..55f90db61d3 100644
--- a/extra/mariabackup/xtrabackup.cc
+++ b/extra/mariabackup/xtrabackup.cc
@@ -3002,10 +3002,21 @@ static bool xtrabackup_copy_logfile()
                                   recv_sys.offset);
         source_offset&= ~block_size_1;
         size_t size{log_sys.buf_size - recv_sys.len};
-        if (source_offset + size > log_sys.file_size)
-          size= static_cast<size_t>(log_sys.file_size - source_offset);
-        ut_ad(size <= log_sys.buf_size);
-        log_sys.log.read(source_offset, {log_sys.buf, size});
+        if (UNIV_UNLIKELY(source_offset + size > log_sys.file_size))
+        {
+          const size_t first{size_t(log_sys.file_size - source_offset)};
+          ut_ad(first <= log_sys.buf_size);
+          log_sys.log.read(source_offset, {log_sys.buf, first});
+          size-= first;
+          if (log_sys.START_OFFSET + size > source_offset)
+            size= size_t(source_offset - log_sys.START_OFFSET);
+          if (size)
+            log_sys.log.read(log_sys.START_OFFSET,
+                             {log_sys.buf + first, size});
+          size+= first;
+        }
+        else
+          log_sys.log.read(source_offset, {log_sys.buf, size});
         recv_sys.len= size;
       }
 

With the above patch, the simple test also passed on my NVMe, with 12 tests running in parallel. The read rates were about 1/5 of the write rates throughout the test.

mariabackup.too-slow-MDEV-27621 'innodb' w12 [ 10 pass ]  16268
--------------------------------------------------------------------------
The servers were restarted 12 times
Spent 1927.289 of 171 seconds executing testcases
 
Completed: All 120 tests were successful.

Comment by Marko Mäkelä [ 2022-01-27 ]

A test of the code before the above fix produced a backup with an invalid log file:

2022-01-27 14:35:52 0 [ERROR] InnoDB: Missing FILE_CHECKPOINT(89018307) at 144541792
2022-01-27 14:35:52 0 [ERROR] InnoDB: Log scan aborted at LSN 144541792
[00] 2022-01-27 14:35:52 mariadb-backup: srv_start() returned 11 (Generic error).

The problem with the log file is that at the offset where we expect to find optional FILE_MODIFY records and a FILE_CHECKPOINT record that points to the checkpoint LSN, we instead find some data that could be in the middle of some mini-transaction. The data is in clear text and does appear in several .ibd files.

A possible explanation is that due to this bug, mariadb-backup --backup kept retrying a read of the (say) 2048 last bytes of the server’s ib_logfile0 until the server had generated a multiple of (innodb_log_file_size-12288)·n bytes more log, and mariadb-backup would finally observe that a mini-transaction luckily ended at the very end of that block. This then allowed the log file parsing in the backup to wrap around to byte offset 12288. The effect of this would be that (innodb_log_file_size-12288)·n bytes were omitted from the backed-up log.

Generated at Thu Feb 08 09:54:19 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.