Details
-
Bug
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
11.4.4
-
None
-
None
-
RHEL-9
Using the repositories from mariadb.com
Description
We recently upgraded a three-node master-slave cluster from 10.11 to 11.4. The cluster runs 100+ databases with ~25,000 tables, but the overall data size is just about 100 GB. As it is mostly business operations, the cluster is busier during the daytime and sees less load during thge nighttime and weekends. Each mode takes one backup with mariadb-backup per day, shifted by 8 hours (i.e. node A taks a backup at midnight, node B at 8 AM, node C at 4 PM).
Until the upgrade this worked without any issues. After the upgrade backups frequently (up to 50% of the time) fail with a message like this duirng the --backup stage:
[00] 2024-11-28 00:12:53 Finished backing up non-InnoDB tables and files
|
[00] 2024-11-28 00:12:53 Waiting for log copy thread to read lsn 7676153454539
|
[00] 2024-11-28 00:12:53 Retrying read of log at LSN=7676152544983
|
[00] 2024-11-28 00:12:54 Retrying read of log at LSN=7676152544983
|
[00] 2024-11-28 00:12:55 Retrying read of log at LSN=7676152544983
|
[00] 2024-11-28 00:12:57 Retrying read of log at LSN=7676152544983
|
[00] 2024-11-28 00:12:58 Retrying read of log at LSN=7676152544983
|
[00] 2024-11-28 00:12:58 Was only able to copy log from 7675927450389 to 7676152544983, not 7676153454539; try increasing innodb_log_file_size
|
[00] 2024-11-28 00:12:58 Retrying read of log at LSN=7676152544983
|
What we did so far:
- Increased the InnoDB log four-fold, from its previous value of 256 MB to 1 GB. This did not help.
- Took notice of
MDEV-34850, but it is set to "fixed in 11.4.4" and we have exactly this version:[root@cgdcpsql1 mysql]# rpm -qa | grep MariaDB
MariaDB-shared-11.4.4-1.el9.x86_64
MariaDB-common-11.4.4-1.el9.x86_64
MariaDB-client-11.4.4-1.el9.x86_64
MariaDB-server-11.4.4-1.el9.x86_64
MariaDB-backup-11.4.4-1.el9.x86_64
On a side note: we have the binlog size also set to 1 GB and we typically see 2 rotations per 24 hours. As the whole --backup stage prior to the error takes 10-15 minutes, it is a bit hard to believe that a full gigabyte of changes landed during this small time window - and even if it did, it would had caused a binlog rotation, which we don't see. To match the above excerpt from the mariadb-backup output, here are the binlogs on the same machine; the failed backup got as started exactly at 12:00 on Nov 18, while binlog 1536 was opened at 11:10 on Nov 17 and was rotated at 08:07 on Nov 18.
[root@cgdcpsql1 mysql]# ls -l cgdcpsql1-bin*
|
-rw-rw----. 1 mysql mysql 1073741927 Nov 25 12:59 cgdcpsql1-bin.001532
|
-rw-rw----. 1 mysql mysql 327680 Nov 25 12:59 cgdcpsql1-bin.001532.idx
|
-rw-rw----. 1 mysql mysql 1073741995 Nov 26 09:15 cgdcpsql1-bin.001533
|
-rw-rw----. 1 mysql mysql 323584 Nov 26 09:15 cgdcpsql1-bin.001533.idx
|
-rw-rw----. 1 mysql mysql 1073809451 Nov 27 01:25 cgdcpsql1-bin.001534
|
-rw-rw----. 1 mysql mysql 307200 Nov 27 01:25 cgdcpsql1-bin.001534.idx
|
-rw-rw----. 1 mysql mysql 1074324069 Nov 27 11:10 cgdcpsql1-bin.001535
|
-rw-rw----. 1 mysql mysql 237568 Nov 27 11:10 cgdcpsql1-bin.001535.idx
|
-rw-rw----. 1 mysql mysql 1097485064 Nov 28 08:07 cgdcpsql1-bin.001536
|
-rw-rw----. 1 mysql mysql 335872 Nov 28 08:07 cgdcpsql1-bin.001536.idx
|
-rw-rw----. 1 mysql mysql 94644419 Nov 28 09:07 cgdcpsql1-bin.001537
|
-rw-rw----. 1 mysql mysql 24576 Nov 28 09:02 cgdcpsql1-bin.001537.idx
|
-rw-rw----. 1 mysql mysql 138 Nov 28 08:07 cgdcpsql1-bin.index
|
We run mariadb-backup via cron with a minimal set of options like this:
/usr/bin/mariabackup --open-files-limit=131072 --user root --target-dir=... --backup
|
Is there anything we need to add to mariadb-backup for 11.x? (E.g., I see MDEV-34850 mentioning innodb_log_file_mmap from MDEV-34062, but I don's see this in neither "mariadb-backup --help" nor in the InnDB system variables page at https://mariadb.com/kb/en/innodb-system-variables ). Even mmap is not used, the system seems to have plenty of free memory:
[root@cgdcpsql1 mysql]# free -m
|
total used free shared buff/cache available
|
Mem: 31840 17849 1173 494 13765 13990
|
Swap: 2047 1 2046
|
The platform is RHEL-9, we use the RPM packages, provided by mariadb.com.