[MDEV-29996] Duplicated call to buf_page_t::set_ibuf_exist() on recovery Created: 2022-11-10  Updated: 2023-02-02  Resolved: 2022-11-10

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.8.0, 10.6.6, 10.7.2, 10.8.1, 10.9.0, 10.6.7, 10.6.8, 10.6.9, 10.7.3, 10.7.4, 10.7.5, 10.8.2, 10.8.3, 10.8.4, 10.9.1, 10.9.2, 10.10.0, 10.10.1, 10.6.10, 10.7.6, 10.8.5, 10.9.3, 10.6.11, 10.8.6, 10.9.4, 10.10.2, 10.11.0, 10.11.1, 10.9.5, 10.10.3
Fix Version/s: 10.11.2, 10.6.12, 10.7.8, 10.8.7, 10.9.5, 10.10.3

Type: Bug Priority: Major
Reporter: Marko Mäkelä Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: recovery, regression-10.6

Attachments: File fbackup.tar.xz    
Issue Links:
Problem/Incident
is caused by MDEV-27058 Buffer page descriptors are too large Closed

 Description   

With the attached data directory, MariaDB 10.6 and presumably 10.7 would fail to recover:

mariadbd --innodb-page-size=4k --innodb-log-file-size=100663296 --innodb-fast-shutdown=0 --datadir /dev/shm/fbackup/data

10.6 fef9d6ef1db9a4648a54954c38ea4fbab2a6542c

2022-11-10 14:13:40 0 [Note] InnoDB: Initializing buffer pool, total size = 10737418240, chunk size = 134217728
2022-11-10 14:13:41 0 [Note] InnoDB: Completed initialization of buffer pool
2022-11-10 14:13:41 0 [Note] InnoDB: Setting O_DIRECT on file ./ibdata1 failed
2022-11-10 14:13:41 0 [Note] InnoDB: Opened 3 undo tablespaces
2022-11-10 14:13:41 0 [Warning] InnoDB: innodb_undo_tablespaces=0 disables dedicated undo log tablespaces
2022-11-10 14:13:41 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=93468483,97150807
2022-11-10 14:13:41 0 [Note] InnoDB: Opened 3 undo tablespaces
2022-11-10 14:13:41 0 [Warning] InnoDB: innodb_undo_tablespaces=0 disables dedicated undo log tablespaces
2022-11-10 14:13:41 0 [Note] InnoDB: Starting final batch to recover 4771 pages from redo log.
mariadbd: /mariadb/10.6m/storage/innobase/include/buf0buf.h:715: void buf_page_t::set_ibuf_exist(): Assertion `s < IBUF_EXIST || s >= REINIT' failed.

I did not check what would happen in a non-debug build, but I think that a hang is very well possible.

I did check that 10.5 (a few changes ahead of MariaDB Server 10.5.18) recovers this data directory just fine.

The following fixes it in 10.6:

diff --git a/storage/innobase/log/log0recv.cc b/storage/innobase/log/log0recv.cc
index b47b8d30c2a..546541e1082 100644
--- a/storage/innobase/log/log0recv.cc
+++ b/storage/innobase/log/log0recv.cc
@@ -1137,6 +1137,7 @@ class mlog_init_t
 					continue;
 				}
 				mysql_mutex_unlock(&recv_sys.mutex);
+				ut_ad(!block->page.is_ibuf_exist());
 				if (ibuf_page_exists(block->page.id(),
 						     block->zip_size())) {
 					block->page.set_ibuf_exist();
@@ -1148,6 +1149,7 @@ class mlog_init_t
 		}
 
 		mtr.commit();
+		clear();
 	}
 
 	/** Clear the data structure */

It turns out that multiple threads can invoke recv_sys.apply(true) nearly simultaneously, causing mlog_init to be applied several times.

The mlog_init was added already in MDEV-12699, and the ibuf_page_exists() call was added in MDEV-19514. The application of mlog_init was ‘idempotent’ until the block descriptor data structure was refactored in MDEV-27058 in MariaDB Server 10.6.6.

Note: The InnoDB change buffer was disabled by default in MDEV-27734.



 Comments   
Comment by Marko Mäkelä [ 2022-11-10 ]

I realized that the debug assertion in my above patch is duplicating the failing assertion inside buf_page_t::set_ibuf_exist(). We only need the call to clear().

Generated at Thu Feb 08 10:12:51 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.