[MDEV-32861] InnoDB hangs when running out of I/O slots Created: 2023-11-22  Updated: 2023-12-05  Resolved: 2023-11-22

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.5, 10.6, 10.11, 11.0, 11.1, 11.2, 11.3
Fix Version/s: 10.5.24, 10.6.17, 10.11.7, 11.0.5, 11.1.4, 11.2.3, 11.3.2

Type: Bug Priority: Critical
Reporter: Marko Mäkelä Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: affects-tests, hang, regression

Issue Links:
Blocks
Problem/Incident
is caused by MDEV-26055 Adaptive flushing is still not gettin... Closed
is caused by MDEV-29367 Refactor tpool::cache Closed

 Description   

To reduce the rate of the infamous 'fake hangs' in rr replay I experimented with the following patch:

diff --git a/storage/innobase/handler/ha_innodb.cc b/storage/innobase/handler/ha_innodb.cc
index 58626c4667c..5fecbf15453 100644
--- a/storage/innobase/handler/ha_innodb.cc
+++ b/storage/innobase/handler/ha_innodb.cc
@@ -19525,7 +19525,7 @@ static MYSQL_SYSVAR_UINT(read_io_threads, srv_n_read_io_threads,
 static MYSQL_SYSVAR_UINT(write_io_threads, srv_n_write_io_threads,
   PLUGIN_VAR_RQCMDARG | PLUGIN_VAR_READONLY,
   "Number of background write I/O threads in InnoDB.",
-  NULL, NULL, 4, 2, 64, 0);
+  NULL, NULL, 4, 1, 64, 0);
 
 static MYSQL_SYSVAR_ULONG(force_recovery, srv_force_recovery,
   PLUGIN_VAR_RQCMDARG | PLUGIN_VAR_READONLY,
diff --git a/storage/innobase/include/os0file.h b/storage/innobase/include/os0file.h
index 5a8a8dd7fd9..673239d9468 100644
--- a/storage/innobase/include/os0file.h
+++ b/storage/innobase/include/os0file.h
@@ -260,7 +260,7 @@ struct os_file_size_t {
 	os_offset_t	m_alloc_size;
 };
 
-constexpr ulint OS_AIO_N_PENDING_IOS_PER_THREAD= 256;
+constexpr ulint OS_AIO_N_PENDING_IOS_PER_THREAD= 1;
 
 extern Atomic_counter<ulint> os_n_file_reads;
 extern ulint	os_n_file_writes;

With this patch, the following invocation would hang on bootstrap:

./mtr --boot-rr --mysqld=--innodb-write-io-threads=1 main.1st

The first reason turns out to be missing signaling of a condition variable:

diff --git a/tpool/tpool_structs.h b/tpool/tpool_structs.h
index b49204f2d75..96c250ebd59 100644
--- a/tpool/tpool_structs.h
+++ b/tpool/tpool_structs.h
@@ -138,12 +138,13 @@ template<typename T> class cache
   {
     std::unique_lock<std::mutex> lk(m_mtx);
     assert(!is_full());
+    const bool was_empty= is_empty();
     // put element to the logical end of the array
     m_cache[--m_pos] = ele;
 
     /* Notify waiters  when the cache becomes
      not empty, or when it becomes full */
-    if (m_pos == 1 || (m_waiters && is_full()))
+    if (was_empty || (is_full() && m_waiters))
       m_cv.notify_all();
   }
 

As far as I can tell, the above condition was broken in MDEV-29367.

Even with the above patch applied, the bootstrap would hang. The remaining reason is MDEV-24313 and innodb_doublewrite=ON. So, we must retain the minimum value of innodb_write_io_threads=2 and discard the first hunk of our instrumentation patch.

A further problem would be that many encryption related tests would end up in an infinite loop in buf_pool_t::io_buf_t::reserve(), which is something that had been last changed in MDEV-26055. The following patch would fix that:

@@ -1513,14 +1513,17 @@ void buf_pool_t::io_buf_t::close()
   n_slots= 0;
 }
 
-buf_tmp_buffer_t *buf_pool_t::io_buf_t::reserve()
+buf_tmp_buffer_t *buf_pool_t::io_buf_t::reserve(bool wait_for_reads)
 {
   for (;;)
   {
     for (buf_tmp_buffer_t *s= slots, *e= slots + n_slots; s != e; s++)
       if (s->acquire())
         return s;
+    buf_dblwr.flush_buffered_writes();
     os_aio_wait_until_no_pending_writes();
+    if (!wait_for_reads)
+      continue;
     for (buf_tmp_buffer_t *s= slots, *e= slots + n_slots; s != e; s++)
       if (s->acquire())
         return s;

When this function is being called from buf_page_decrypt_after_read(), we must pass wait_for_reads=false, because the current thread actually is executing in a read completion callback, and therefore os_aio_wait_until_no_pending_reads(); after the continue; would hang. In any case, we must schedule a write of a possibly partial doublewrite batch so that os_aio_wait_until_no_pending_writes(); has a chance of finishing.

With these fixes, a run with OS_AIO_N_PENDING_IOS_PER_THREAD=1 and

./mtr --big-test --force --parallel=auto --mysqld=--loose-innodb-read-io-threads=1 --mysqld=--loose-innodb-write-io-threads=2

completes fine in both 10.5 and 10.6, with the exception of the test sys_vars.innodb_read_io_threads_basic.

The motivation of setting these parameters is to give fewer chances to rr record to schedule threads in an unfair fashion that often leads to 'fake hangs' where write_slots->m_cache.m_pos (or less often read_slots->m_cache.m_pos) would be nonzero and some InnoDB threads are waiting for an extremely long time for a buffer page latch.



 Comments   
Comment by Vladislav Vaintroub [ 2023-11-22 ]

Good catch in tpool::cache bug. Seems good to push for me.

Generated at Thu Feb 08 10:34:35 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.