Details
-
Bug
-
Status: Closed (View Workflow)
-
Blocker
-
Resolution: Fixed
-
10.8(EOL)
-
None
Description
The test performs a number of tiny updates on a single table, and after they are done, runs mariabackup --backup. On earlier versions, it works fine, but on 10.8 it fails almost reliably with
10.8 05050867 |
[00] FATAL ERROR: 2022-01-25 20:17:19 Was only able to copy log from 25657966 to 27195457, not 27418414; try increasing innodb_log_file_size
|
The value of innodb_log_file_size is indeed small, in order to minimize the test, but the issue is scalable. I started from the default 96M, with several tables with more columns, more keys, and bigger number of UPDATEs, but otherwise very similar load. I suppose it scales further up the same way, I didn't try.
It fails for me most of the times, both in shm and on SSD, but it's not deterministic and the result can vary on different machines and builds.
If it doesn't fail for you, try to re-run. If it continues not to fail, try to increase the number of cycles. It also seems to fail more willingly on faster (non-debug) builds.
Possibly it can be made more stable by more precise handling of the InnoDB log.
--source include/have_innodb.inc
|
|
CREATE TABLE t (pk int, |
c1 int, c2 int, c3 int, c4 int, c5 int, c6 int, c7 int, |
c8 int, c9 int, c10 int, c11 int, c12 int, c13 int, c14 int, |
c15 int, c16 int, c17 int, c18 int, c19 int, c20 int, c21 int, |
c22 int, c23 int, c24 int, c25 int, c26 int, c27 int, c28 int, |
c29 int, c30 int, c31 int, c32 int, c33 int, c34 int, c35 int, |
c36 int, c37 int, c38 int, c39 int, c40 int, c41 int, c42 int, |
c43 int, c44 int, c45 int, c46 int, c47 int, c48 int, c49 int, |
primary key (pk), |
key (c1), key (c2), key (c3), key (c4), key (c5), key (c6), key (c7), |
key (c8), key (c9), key (c10), key (c11), key (c12), key (c13), key (c14), |
key (c15), key (c16), key (c17), key (c18), key (c19), key (c20), key (c21), |
key (c22), key (c23), key (c24), key (c25), key (c26), key (c27), key (c28), |
key (c29), key (c30), key (c31), key (c32), key (c33), key (c34), key (c35), |
key (c36), key (c37), key (c38), key (c39), key (c40), key (c41), key (c42), |
key (c43), key (c44), key (c45), key (c46), key (c47), key (c48), key (c49) |
) ENGINE=InnoDB;
|
INSERT INTO t (pk) VALUES (1); |
|
--let $restart_parameters= --innodb-log-file-size=2M
|
--source include/restart_mysqld.inc
|
|
SET rand_seed1= 1, rand_seed2= 1; |
--disable_query_log
|
|
--let $cycles= 2000
|
while ($cycles)
|
{
|
--let $columns= 49 |
while ($columns)
|
{
|
--eval UPDATE t AS alias1 SET alias1.c$columns = ROUND(RAND())*255 - 128; |
--dec $columns |
}
|
UPDATE t AS alias1 SET alias1.pk = ROUND(RAND())*255 - 128; |
--dec $cycles |
}
|
--enable_query_log
|
|
--error 0,1
|
--exec $XTRABACKUP --backup --target-dir=$MYSQL_TMP_DIR/mbackup_dir --protocol=tcp --port=$MASTER_MYPORT --user=root > $MYSQL_TMP_DIR/mbackup.log 2>&1
|
if ($sys_errno) |
{
|
--echo ############ Tail of MariaBackup error log: ############ |
--exec tail $MYSQL_TMP_DIR/mbackup.log |
--echo ######################################################## |
--die "Backup failed" |
}
|
|
# Cleanup
|
DROP TABLE t; |
The failure started happening on 10.8 after this commit:
commit 685d958e38b825ad9829be311f26729cccf37c46 (HEAD)
|
Author: Marko Mäkelä <marko.makela@mariadb.com>
|
Date: Fri Jan 21 16:03:47 2022 +0200
|
|
MDEV-14425 Improve the redo log for concurrency
|
Attachments
Issue Links
- is caused by
-
MDEV-14425 Change the InnoDB redo log format to reduce write amplification
-
- Closed
-
- relates to
-
MDEV-14992 BACKUP: in-server backup
-
- Open
-
The test works for me with the cycles values of 2000, 5000, 10000 when running on /dev/shm and using libpmem (mmap interface to the server’s 2-megabyte ib_logfile0).
I then switched to the pwrite and pread based interface:
mkdir /run/user/$UID/var
rm -fr var
ln -s /run/user/$UID/var var
./mtr mariabackup.MDEV-27621
Also with this (and cycles=10000) the test passes for me:
10.8 93756c992f18e86e380cb02078c4ba4cea123e5a
mariabackup.MDEV-27621 'innodb' [ pass ] 121680
On /dev/shm (fake PMEM) it was slightly faster, again with cycles=10000:
mariabackup.MDEV-27621 'innodb' [ pass ] 111996
On NVMe, I saw about 16 MiB/s writes and 2.5 MiB/s reads happening while the test is running. And the test finally failed (already with cycles=2000):
[00] 2022-01-26 15:15:36 Retrying read of log at LSN=135608916
[00] FATAL ERROR: 2022-01-26 15:15:36 Was only able to copy log from 134574484 to 135608916, not 136398636; try increasing innodb_log_file_size
The likely reason why you see this happening after
MDEV-14425is that some memset() and my_crc32c() calls that were executed inside write_lock or flush_lock were removed. Even though this is a single-threaded test, it could be that removing that memset() overhead increased the relative speed of the server compared to the backup process.I tried once more, with innodb_flush_method=O_DSYNC so that O_DIRECT I/O will be used on the log file. Again, I am seeing about the same write and read rates.
[00] FATAL ERROR: 2022-01-26 15:22:34 Was only able to copy log from 134513958 to 135608964, not 136396766; try increasing innodb_log_file_size
I do not think that there is any easy fix for this. Because there are much more writes happening than reads, the issue cannot even be that mariadb-backup --backup would be unnecessarily re-reading too large portions of the log file.
elenst, when you use a larger redo log (and larger workload), will the test pass if you invoke mariadb-backup with a larger innodb_log_buffer_size? The minimum value is 2MiB (changed in
MDEV-14425). The maximum useful value would be the innodb_log_file_size of the server.Ever since I heard about the closed-source InnoDB Hot Backup product (ibbackup) in 2003 or 2004, I thought that the inter-process communication via the database log and data files is error-prone and broken by design. This is just another proof of that.
As far as I can tell, the only reason why the separate ibbackup process was created was licensing: Innobase Oy wanted to get additional revenue directly from end users, by offering a closed-source online backup program. Later, Percona XtraBackup was created. It implemented the fairly small part of the closed-source code. At that time, some #ifdef UNIV_HOTBACKUP hints were hepfully available in the InnoDB storage engine code. And yet later, mariabackup or mariadb-backup was created out of xtrabackup, initially, to support file format changes that are unique to MariaDB.
This problem would ultimately be fixed by implementing server-side backup (MDEV-14992), so that the same server process will produce both the main (circular) server log and a replica (append-only) log that will be needed to recover or prepare the backup.