[MDEV-28185] InnoDB generates redundant log checkpoints Created: 2022-03-28 Updated: 2022-04-05 Resolved: 2022-03-30 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Affects Version/s: | 10.2, 10.3, 10.4, 10.5, 10.6, 10.7, 10.8, 10.9 |
| Fix Version/s: | 10.5.16, 10.6.8, 10.7.4 |
| Type: | Bug | Priority: | Major |
| Reporter: | Marko Mäkelä | Assignee: | Marko Mäkelä |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | performance | ||
| Issue Links: |
|
||||||||
| Description |
|
The function log_checkpoint() or log_checkpoint_low() tries to avoid redundant writes. Here is the 10.3 version of log_checkpoint():
This logic fails to take two things into account:
In either case, the difference of oldest_lsn and log_sys.last_checkpoint_lsn could be more than the size of the checkpoint record. The simplest way to fix this could be to avoid writing a new checkpoint if oldest_lsn did not change since the previous checkpoint. |
| Comments |
| Comment by Marko Mäkelä [ 2022-03-29 ] | |||||||||||||
|
My attempt at introducing a log_sys.last_checkpoint_last_lsn field did not work. It could incorrectly cause log_checkpoint() to be skipped, if all log had been durably written during a previous flushing batch, but some pages had not been flushed. We should also memorize the previous end_lsn for that to work. This was revealed by the test innodb.innodb_bug34300. A simpler fix is to just check if the previous checkpoint only wrote a FILE_CHECKPOINT record, and skip the checkpoint in that case only. And skip it even if the record spanned two 512-byte log blocks. This fix was already implemented in | |||||||||||||
| Comment by Marko Mäkelä [ 2022-03-30 ] | |||||||||||||
|
I verified the fix with the following patch applied on top of it:
This added assertion would eventually fail during the following execution:
In the core dump, we had age=28, and the oldest_lsn field (which was equal to the latest checkpoint LSN) was 0x3a2d811. That is, (oldest_lsn&511)==0x11, which is only 3 bytes after the start of the log block payload area (0x0c). In other words, without the fix, we would have written a redundant FILE_CHECKPOINT record. Redundant FILE_CHECKPOINT records will still be written if during the previous checkpoint, any FILE_MODIFY} records were written. |