To anyone who tries to reproduce this: If preparing a backup only took a short time, there cannot have been much log written since the latest checkpoint. You would probably need some concurrent write workload on the server while the backup is running. While backup is running, the server can receive further writes and be producing more log. Backup cannot switch to a later checkpoint on the fly; it will copy all log from the checkpoint that was the latest when the backup started.
Log checkpoints starting with MariaDB Server 10.5.7 (after the refactoring in MDEV-23855 and MDEV-23399) are triggered at the end of each batch of writing modified pages from the buffer pool to the data files. That can happen every second, if you have enabled "background flushing" (see MDEV-27295). The innodb_log_file_size and innodb_buffer_pool_size matters a lot.
When preparing a backup, the buffer pool size is specified by the --use-memory parameter. preparelog.txt
seems to be using the default size of about 100MiB. The interesting part of the log starts like this:
2022-10-29 12:50:33 0 [Note] InnoDB: Initializing buffer pool, total size = 104857600, chunk size = 104857600
|
2022-10-29 12:50:33 0 [Note] InnoDB: Completed initialization of buffer pool
|
2022-10-29 12:50:33 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=113635532803184,113635804884887
|
2022-10-29 12:50:48 0 [Note] InnoDB: Read redo log up to LSN=113648002206720
|
The difference between the two LSNs in the checkpoint message is 272,081,703 bytes. This roughly corresponds to the distance between the previous and the latest checkpoint.
In the InnoDB crash recovery, until all of the circular ib_logfile0 has been parsed, we cannot know the last recovery LSN. We would know an upper bound of it: in the worst case, there will be log records right up to the file offset where the latest checkpoint started. The formula used to be more complicated, but with MDEV-14425 (MariaDB Server 10.8), the maximum end LSN should be the checkpoint LSN plus the size of the payload area (the size of ib_logfile0 minus the 12,288-byte header).
In mariadb-backup --prepare, the last recovery LSN would actually be known: it is roughly the sum of the checkpoint LSN and the size of the append-only ib_logfile0 that was created during backup.
In the log, the latest message about reading log that I can see is the following:
2022-10-30 1:44:21 0 [Note] InnoDB: To recover: 527 pages from log
|
2022-10-30 1:44:35 0 [Note] InnoDB: Starting a batch to recover 5062 pages from redo log.
|
2022-10-30 1:44:35 0 [Note] InnoDB: Starting a batch to recover 5289 pages from redo log.
|
2022-10-30 1:44:36 0 [Note] InnoDB: Read redo log up to LSN=113635785294848
|
2022-10-30 1:44:36 0 [Note] InnoDB: Starting a batch to recover 5390 pages from redo log.
|
This actually is 19,590,039 less than the LSN was when the checkpoint was triggered before backup was initiated.
In MDEV-21351 (MariaDB 10.5.1), the memory management during crash recovery was simplified and improved. The rule in recv_sys_t::is_memory_exhausted() is that the number of buffer pool pages that are allocated for redo log records must not exceed a third of the size of the buffer pool. Because memory blocks can be freed as soon as all log records in it have been applied to data pages, new recovery batches can be start. That is why we are seeing so many of those tiny recovery batches (such as 5,062 pages) above, starting several times per second.
The largest batch seems to be the following:
2022-10-29 12:52:29 0 [Note] InnoDB: Starting a batch to recover 82554 pages from redo log.
|
This means that up to 33MiB of redo log records (one third of the 100MiB buffer pool) are referring to that many pages. Initially, the buffer pool has space for 4,266 pages. Maybe some page reads can be avoided (MDEV-12699) when the pages are being freed or completely initialized according to log records. The 82,554 pages are 19 times the size of the available buffer pool.
According to the log, things seem to run extremely slowly:
2022-10-29 13:14:03 0 [Note] InnoDB: To recover: 988 pages from log
|
2022-10-29 13:14:18 0 [Note] InnoDB: To recover: 929 pages from log
|
How could it take 15 seconds to apply changes to only 59 pages? Was there any other I/O or CPU load concurrently running on the computer or the storage system where the backup was being prepared? What is the maximum write speed of the storage where the backup was being prepared? Are tables in ROW_FORMAT=COMPRESSED or page_compressed being used?
What is the size of the ib_logfile0 in the backup directory?
It might make sense to change the rule "use up to one third of the buffer pool for log records" to something like "reserve two thirds of the buffer pool for data pages to process in a batch". Any performance improvements would only be possible if someone can provide a copy of a backup with which this can be reproduced.
Depending on the server workload, one trick to improve --prepare speed would be to trigger a log checkpoint right before starting the backup:
SET @save_pct= @@GLOBAL.innodb_max_dirty_pages_pct;
|
SET @save_pct_lwm= @@GLOBAL.innodb_max_dirty_pages_pct_lwm;
|
|
SET GLOBAL innodb_max_dirty_pages_pct_lwm=0.0;
|
SET GLOBAL innodb_max_dirty_pages_pct=0.0;
|
|
-- wait for the following to reach 0
|
SELECT variable_value FROM information_schema.global_status
|
WHERE variable_name = 'INNODB_BUFFER_POOL_PAGES_DIRTY';
|
|
SET GLOBAL innodb_max_dirty_pages_pct = @save_pct;
|
SET GLOBAL innodb_max_dirty_pages_pct_lwm = @save_pct_lwm;
|
-- start the backup
|
In
MDEV-11027, the recovery progress reporting was refactored. For the normal server startup, progress should be reported every 15 seconds.I do not immediately see any reason why it would be different for mariadb-backup --backup.
Can you provide a test case that demonstrates the lack of progress can be reproduced?