[MDEV-29905] Change buffer operations fail to check for log file overflow Created: 2022-10-28  Updated: 2022-11-23  Resolved: 2022-11-08

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.3, 10.4, 10.5, 10.6, 10.7, 10.8, 10.9, 10.10, 10.11
Fix Version/s: 10.11.2, 10.3.38, 10.4.28, 10.5.19, 10.6.12, 10.7.8, 10.8.7, 10.9.5, 10.10.3

Type: Bug Priority: Critical
Reporter: Marko Mäkelä Assignee: Marko Mäkelä
Resolution: Fixed Votes: 1
Labels: corruption

Issue Links:
Blocks
Relates
relates to MDEV-29982 Improve the InnoDB log overwrite erro... Closed
relates to MDEV-29984 innodb_fast_shutdown=0 fails to repor... Closed
relates to MDEV-30009 InnoDB shutdown hangs when the change... Closed
relates to MDEV-13637 InnoDB change buffer housekeeping can... Closed
relates to MDEV-27734 Set innodb_change_buffering=none by d... Closed
relates to MDEV-27784 log_overwrite_warning displays an err... Closed

 Description   

A log excerpt that was originally posted in MDEV-27784 strongly suggests that the change buffer merge may cause an InnoDB log file overflow:

2022-10-24 23:20:16 0 [Note] InnoDB: FTS optimize thread exiting.
2022-10-24 23:20:16 0 [Note] InnoDB: to purge 5 transactions
2022-10-24 23:20:17 0 [Note] InnoDB: Starting shutdown...
2022-10-24 23:20:17 0 [Note] InnoDB: Dumping buffer pool(s) to /data/maria_data/ib_buffer_pool
2022-10-24 23:20:17 0 [Note] InnoDB: Restricted to 354860 pages due to innodb_buf_pool_dump_pct=25
2022-10-24 23:20:17 0 [Note] InnoDB: Buffer pool(s) dump completed at 221024 23:20:17
2022-10-24 23:57:37 0 [ERROR] InnoDB: The age of the last checkpoint is 966373838, which exceeds the log capacity 966365799.
2022-10-24 23:57:53 0 [ERROR] InnoDB: The age of the last checkpoint is 974973721, which exceeds the log capacity 966365799.
2022-10-24 23:58:09 0 [ERROR] InnoDB: The age of the last checkpoint is 983054253, which exceeds the log capacity 966365799.
2022-10-24 23:58:25 0 [ERROR] InnoDB: The age of the last checkpoint is 991661268, which exceeds the log capacity 966365799.
...
2022-10-25  2:02:57 0 [ERROR] InnoDB: The age of the last checkpoint is 4351924949, which exceeds the log capacity 966365799.

At this point, the server was forcibly killed, and it was unable to recover, because the last valid checkpoint had been overwritten 4½ times.

A design constraint is that before any buffer page latch is acquired in a mini-transaction that will write something, log_free_check() must be called. In the entire change buffer subsystem, only the function ibuf_remove_free_page() contains such a call, added in MDEV-13637.



 Comments   
Comment by Marko Mäkelä [ 2022-11-08 ]

I added log_free_check() calls before every start of a write mini-transaction to the change buffer subsystem where I think it is safe to do (without causing hangs of the server). I think that before 10.5, the call to ibuf_merge_in_background() fixes the observed problem on slow shutdown. On merge to 10.5, that change needs to be applied to ibuf_merge_all(), which was introduced in MDEV-19514.

Comment by Marko Mäkelä [ 2022-11-21 ]

I believe that the provided log output indicates that the shutdown is suffering from MDEV-30009 as well.

Generated at Thu Feb 08 10:12:10 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.