[MDEV-27967] Assertion !buf_pool.any_io_pending() failed in srv_prepare_to_delete_redo_log_file() Created: 2022-03-01  Updated: 2022-03-01  Resolved: 2022-03-01

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.5, 10.6, 10.7, 10.8
Fix Version/s: 10.5.16, 10.6.8, 10.7.4, 10.8.3

Type: Bug Priority: Major
Reporter: Marko Mäkelä Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: debug, race


 Description   

The log resizing test occasionally fails like this:

10.8 972b45642afee364cd140f17ae46c00cd0af6e11

innodb.log_file_size '4k,innodb'         w5 [ fail ]
2022-02-14 17:45:38 0 [Note] InnoDB: Starting final batch to recover 4 pages from redo log.
2022-02-14 17:45:38 0 [Note] InnoDB: Resizing redo log from 12.000MiB to 5.000MiB; LSN=24674766
mariadbd: /home/buildbot/maria-slave/fulltest-debug-big-only/build/storage/innobase/srv/srv0start.cc:852: lsn_t srv_prepare_to_delete_redo_log_file(): Assertion `!buf_pool.any_io_pending()' failed.

I was able to reproduce this failure locally by running this test 120 times in parallel with itself (3 times the number of hardware threads reported by nproc).

In the core dump, the assertion would have held. Therefore, it looks like we must actually protect the reads in this debug assertion with buf_pool.mutex in order to avoid false alarms. With the following patch, I am not able to reproduce any crash. The function is only being called by debug assertions (which are not present in release builds).

diff --git a/storage/innobase/include/buf0buf.h b/storage/innobase/include/buf0buf.h
index 89987b05fdb..fec8dcb4e2a 100644
--- a/storage/innobase/include/buf0buf.h
+++ b/storage/innobase/include/buf0buf.h
@@ -2042,9 +2042,14 @@ class buf_pool_t
   buf_tmp_buffer_t *io_buf_reserve() { return io_buf.reserve(); }
 
   /** @return whether any I/O is pending */
-  bool any_io_pending() const
+  bool any_io_pending()
   {
-    return n_pend_reads || n_flush_LRU() || n_flush_list();
+    if (n_pend_reads)
+      return true;
+    mysql_mutex_lock(&mutex);
+    const bool any_pending{n_flush_LRU_ || n_flush_list_};
+    mysql_mutex_unlock(&mutex);
+    return any_pending;
   }
   /** @return total amount of pending I/O */
   ulint io_pending() const



 Comments   
Comment by Marko Mäkelä [ 2022-03-01 ]

Here is the result of my local testing:

10.5 a710016d578dc1165c511db19a5d4a094e736012

innodb.log_file_size '4k,innodb'         w15 [ 100 pass ]  31791
innodb.log_file_size '4k,innodb'         w37 [ 100 pass ]  31869
innodb.log_file_size '4k,innodb'         w59 [ 100 pass ]  32189
--------------------------------------------------------------------------
The servers were restarted 0 times
Spent 409184.409 of 3497 seconds executing testcases
 
Completed: All 12000 tests were successful.

Generated at Thu Feb 08 09:57:00 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.