[MDEV-23807] Assertion n_pending_flushes failed in fil_node_t::prepare_to_close_or_detach() Created: 2020-09-24  Updated: 2020-10-06  Resolved: 2020-09-25

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.5.4
Fix Version/s: 10.5.7

Type: Bug Priority: Blocker
Reporter: Marko Mäkelä Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: crash, shutdown

Issue Links:
Problem/Incident
is caused by MDEV-15053 Reduce buf_pool_t::mutex contention Closed

 Description   

In a 10.5-based branch, we got an assertion failure on the shutdown of mariabackup --prepare:

2020-09-18  6:37:20 0 [Note] InnoDB: Initializing buffer pool, total size = 104857600, chunk size = 104857600
2020-09-18  6:37:20 0 [Note] InnoDB: Completed initialization of buffer pool
2020-09-18  6:37:21 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
2020-09-18  6:37:21 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=3423646
2020-09-18  6:37:21 0 [Note] InnoDB: Starting final batch to recover 44 pages from redo log.
2020-09-18  6:37:22 0 [Note] InnoDB: Last binlog file './mysql-bin.000001', position 970415
[00] 2020-09-18 06:37:23 Last binlog file ./mysql-bin.000001, position 970415
2020-09-18 06:37:23 0x2bf14d750800  InnoDB: Assertion failure in file /home/mleich/Server/bb-10.5-MDEV-23399B/storage/innobase/fil/fil0fil.cc line 508
InnoDB: Failing assertion: n_pending_flushes == 0

I believe that this failure is possible also during server shutdown.

The fix should be to check for pending flushes while closing files:

diff --git a/storage/innobase/fil/fil0fil.cc b/storage/innobase/fil/fil0fil.cc
index f549938bfc2..84425e0d1b6 100644
--- a/storage/innobase/fil/fil0fil.cc
+++ b/storage/innobase/fil/fil0fil.cc
@@ -1601,7 +1601,8 @@ void fil_close_all_files()
 				if (!node->is_open()) {
 					goto next;
 				}
-				if (!node->n_pending) {
+				if (!node->n_pending
+				    && !node->n_pending_flushes) {
 					node->close();
 					goto next;
 				}
@@ -1609,7 +1610,9 @@ void fil_close_all_files()
 
 			ib::error() << "File '" << node->name
 				    << "' has " << node->n_pending
-				    << " operations";
+				    << " operations and "
+				    << node->n_pending_flushes
+				    << " flushes";
 		}
 
 		space = UT_LIST_GET_NEXT(space_list, space);



 Comments   
Comment by Marko Mäkelä [ 2020-09-25 ]

On closer inspection, it looks like this may have been caused by MDEV-15053, where we removed some tablespace lookups. The code indicated in the Description was introduced in MDEV-15053.

Generated at Thu Feb 08 09:25:12 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.