[MDEV-30000] make mariadb-backup to force an innodb checkpoint Created: 2022-11-11 Updated: 2024-01-12 |
|
| Status: | Open |
| Project: | MariaDB Server |
| Component/s: | Backup, Storage Engine - InnoDB |
| Affects Version/s: | None |
| Fix Version/s: | 10.5, 10.6 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Sergei Golubchik | Assignee: | Sergei Golubchik |
| Resolution: | Unresolved | Votes: | 6 |
| Labels: | regression | ||
| Attachments: |
|
||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||
| Description |
|
Since One way to solve it would be to let mariadb-backup to force an InnoDB checkpoint before a backup. There are many ways of doing it, innodb.page_cleaner test shows one of them, marko knows more. Perhaps it should be optional, but, likely, enabled by default. Alternatively, InnoDB can force a checkpoint automatically when entering a certain backup stage. |
| Comments |
| Comment by Sebastian Bergmann [ 2022-11-14 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Since we are Stuck for weeks on this we will try your suggested Workaround now sergei. As mentioned on answers.launchpad.net/maria/+question/703793 a permanent setting of this parameter innodb_max_dirty_pages_pct_lwm=0.001 has a huge neagtive impact on write I/O (8-10 times):
I hope this will not impact performance too much atm of setting innodb_max_dirty_pages_pct_lwm .. we will let you know. 1 Question though: Since it is existing since mariadb 10.5.7 - does that mean we are the only ones on this planet using incremental mariadb backups? BR | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Sebastian Bergmann [ 2022-11-18 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||
|
We managed to create a Script using these mechanics, it works for us. Still we are interested in a solid official solution of course. BR | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Sergei Golubchik [ 2022-11-18 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||
|
ryanthur, thanks for the confirmation!! Good that it worked | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-03-16 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||
|
For MariaDB 10.5.7 and 10.5.8, the default value of the parameter innodb_max_dirty_pages_pct_lwm=0 had a different meaning, which was corrected in I will try to reproduce this, to see if this would be fixed by | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-03-16 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I created incremental_innodb.test
On a development branch that includes fixes of
I got a similar result with the current head of the 10.6 branch as well:
Both executables were built with cmake -DCMAKE_BUILD_TYPE=RelWithDebInfo, and because the build directory was in /dev/shm, so was the data directory. The number at the end of the last line is the test execution time in milliseconds (3 minutes and 23 seconds, or 4 minutes and 45 seconds). ryanthur, can you please provide some exact steps for reproducing this problem, similar to the above test case? | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Michael Roessler [ 2023-04-13 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hello Marko, sorry for the delay. I am a collegue of Sebastian Bergmann and like to provide information. I tested the incremental backups with and without checkpoints. To my opinion it is still necessary to provoke checkpoints manually with following mariadb commands before mariabackup starts:
Between the 2 mysql commands I waited up to 120 seconds. The mariabackup command is in a bigger script but the mariabackup command is like this it uses --compress and the qpress command which is available from Percona:
I got following results with 10.6.10 and 10.6.12 with and without provoking checkpoints.
First incremental backup: +100,000 data sets The second incremental backup should be smaller because of tenfold less data volume. This is only the case, if I do checkpoints. I produced the data sets with bash script:
For the second incremental backup I used 10000 (ten thousand) instead of 1000000 (One hundred thousand) in the script. I guess it is still not so easy to reproduce, but I hope it is easy enough. If you have any question, please let me know. Kind regards Michael | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-05-12 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||
|
mroessler65536, thank you for the reproducer, and sorry for missing your update, causing this relatively simple fix to miss the MariaDB Server 10.6.13 release. I will take a look hopefully soon. | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-06-06 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I can reproduce this with the following test case if I remove the two sections enclosed in SET GLOBAL:
The commented-out lines at the end would replace the ib_logfile0 with zero-length files and then restore the backup. If I run the test case as is (retaining the checkpoint control), I will get close to an optimal result:
The test confirms my understanding of the logic that triggering a log checkpoint only is useful before a full backup is being made. I may have misunderstood something, but incremental backups would seem to actually work based on copying a section of ib_logfile0 where the previous backup left off. Perhaps the .delta files were a work-around from an era when DDL operations were not crash-safe in InnoDB (before MariaDB Server 10.6)? I am basing this reasoning on the following error message:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-06-06 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I forgot that possibly a typical use case of incremental backup is one where the current log checkpoint LSN is after the end LSN of a previous backup. In that case, all data file pages that have been modified since the previous end LSN will have to be copied. It might actually be useful to trigger a checkpoint also when initiating an incremental backup. | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-06-06 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||
|
serg, can you please suggest how SET GLOBAL should be invoked by mariadb-backup in the most robust way, so that in case the connection between backup and the server is severed, the previous values will be restored? | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Sergei Golubchik [ 2023-06-06 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Wouldn't it be better if the server will automatically trigger a checkpoint when backup starts? without manipulations with innodb_max_dirty_pages_pct | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-06-06 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||
|
If we had server-side or server-assisted backup (MDEV-14992), it would be straightforward to implement logic "when backup starts". InnoDB checkpoints will automatically be triggered at the end of each page writing batch. The default settings are such that the latest checkpoint may be rather old, mainly due to innodb_max_dirty_pages_pct=90. The actual checkpoint age (difference between checkpoint LSN and current LSN) depends on the key distribution accessed during the workload, and to some extent, on luck. | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-06-06 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Here is a smaller test case:
A possible code fix would be along these lines. It is missing the wait for INNODB_BUFFER_POOL_PAGES_DIRTY to reach 0, and I think that we would want to have a command line option to disable this feature. I would also find it better if the "roll back" of SET GLOBAL could be initiated on the server side.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-06-07 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I came up with a solution that almost works. The handler for 70100 (killed connection) does not seem to run either on KILL from another connection, nor on disconnect of the client that submitted the SQL for execution. Here is a test case that demonstrates the latter problem:
The following output indicates that the SQL that was submitted by connection cleanup indeed did not run to completion, but it did not clean up afterwards either:
If it had run to completion, the latter query would return 0. If the handler for 70100 had executed, the global variable would have been restored to its default value (90). The logic behind the 70100 handler is also related to some rather frequent test failures ( | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-07-05 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||
|
In InnoDB (any MariaDB version), a log checkpoint can simply be initiated by calling the function log_make_checkpoint(). It could be simplest to call that function when executing the appropriate BACKUP STAGE statement. | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2024-01-12 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||
|
serg, in debug builds, one could invoke
to force a log checkpoint. If we made that parameter available in all builds, then this bug could easily be fixed by making mariadb-backup --backup issue that statement under some (which?) conditions. |