[MDEV-32861] InnoDB hangs when running out of I/O slots - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Affects Version/s: 10.6, 10.11, 10.5(EOL), 11.0(EOL), 11.1(EOL), 11.2(EOL), 11.3(EOL)
Fix Version/s: 10.5.24, 10.6.17, 10.11.7, 11.0.5, 11.1.4, 11.2.3, 11.3.2
Component/s: Storage Engine - InnoDB
Labels:

Description

To reduce the rate of the infamous 'fake hangs' in rr replay I experimented with the following patch:

diff --git a/storage/innobase/handler/ha_innodb.cc b/storage/innobase/handler/ha_innodb.cc

index 58626c4667c..5fecbf15453 100644

--- a/storage/innobase/handler/ha_innodb.cc

+++ b/storage/innobase/handler/ha_innodb.cc

@@ -19525,7 +19525,7 @@ static MYSQL_SYSVAR_UINT(read_io_threads, srv_n_read_io_threads,

 static MYSQL_SYSVAR_UINT(write_io_threads, srv_n_write_io_threads,

   PLUGIN_VAR_RQCMDARG | PLUGIN_VAR_READONLY,

   "Number of background write I/O threads in InnoDB.",

-  NULL, NULL, 4, 2, 64, 0);

+  NULL, NULL, 4, 1, 64, 0);

 static MYSQL_SYSVAR_ULONG(force_recovery, srv_force_recovery,

   PLUGIN_VAR_RQCMDARG | PLUGIN_VAR_READONLY,

diff --git a/storage/innobase/include/os0file.h b/storage/innobase/include/os0file.h

index 5a8a8dd7fd9..673239d9468 100644

--- a/storage/innobase/include/os0file.h

+++ b/storage/innobase/include/os0file.h

@@ -260,7 +260,7 @@ struct os_file_size_t {

 	os_offset_t	m_alloc_size;

};

-constexpr ulint OS_AIO_N_PENDING_IOS_PER_THREAD= 256;

+constexpr ulint OS_AIO_N_PENDING_IOS_PER_THREAD= 1;

 extern Atomic_counter<ulint> os_n_file_reads;

 extern ulint	os_n_file_writes;

With this patch, the following invocation would hang on bootstrap:

./mtr --boot-rr --mysqld=--innodb-write-io-threads=1 main.1st

The first reason turns out to be missing signaling of a condition variable:

diff --git a/tpool/tpool_structs.h b/tpool/tpool_structs.h

index b49204f2d75..96c250ebd59 100644

--- a/tpool/tpool_structs.h

+++ b/tpool/tpool_structs.h

@@ -138,12 +138,13 @@ template<typename T> class cache

     std::unique_lock<std::mutex> lk(m_mtx);

     assert(!is_full());

+    const bool was_empty= is_empty();

     // put element to the logical end of the array

     m_cache[--m_pos] = ele;

     /* Notify waiters  when the cache becomes

      not empty, or when it becomes full */

-    if (m_pos == 1 || (m_waiters && is_full()))

+    if (was_empty || (is_full() && m_waiters))

       m_cv.notify_all();

As far as I can tell, the above condition was broken in ~~MDEV-29367~~.

Even with the above patch applied, the bootstrap would hang. The remaining reason is ~~MDEV-24313~~ and innodb_doublewrite=ON. So, we must retain the minimum value of innodb_write_io_threads=2 and discard the first hunk of our instrumentation patch.

A further problem would be that many encryption related tests would end up in an infinite loop in buf_pool_t::io_buf_t::reserve(), which is something that had been last changed in ~~MDEV-26055~~. The following patch would fix that:

@@ -1513,14 +1513,17 @@ void buf_pool_t::io_buf_t::close()

   n_slots= 0;

-buf_tmp_buffer_t *buf_pool_t::io_buf_t::reserve()

+buf_tmp_buffer_t *buf_pool_t::io_buf_t::reserve(bool wait_for_reads)

   for (;;)

     for (buf_tmp_buffer_t *s= slots, *e= slots + n_slots; s != e; s++)

       if (s->acquire())

         return s;

+    buf_dblwr.flush_buffered_writes();

     os_aio_wait_until_no_pending_writes();

+    if (!wait_for_reads)

+      continue;

     for (buf_tmp_buffer_t *s= slots, *e= slots + n_slots; s != e; s++)

       if (s->acquire())

         return s;

When this function is being called from buf_page_decrypt_after_read(), we must pass wait_for_reads=false, because the current thread actually is executing in a read completion callback, and therefore os_aio_wait_until_no_pending_reads(); after the continue; would hang. In any case, we must schedule a write of a possibly partial doublewrite batch so that os_aio_wait_until_no_pending_writes(); has a chance of finishing.

With these fixes, a run with OS_AIO_N_PENDING_IOS_PER_THREAD=1 and

./mtr --big-test --force --parallel=auto --mysqld=--loose-innodb-read-io-threads=1 --mysqld=--loose-innodb-write-io-threads=2

completes fine in both 10.5 and 10.6, with the exception of the test sys_vars.innodb_read_io_threads_basic.

The motivation of setting these parameters is to give fewer chances to rr record to schedule threads in an unfair fashion that often leads to 'fake hangs' where write_slots->m_cache.m_pos (or less often read_slots->m_cache.m_pos) would be nonzero and some InnoDB threads are waiting for an extremely long time for a buffer page latch.

Attachments

Issue Links

is caused by

MDEV-26055 Adaptive flushing is still not getting invoked in 10.5.11

Closed

MDEV-29367 Refactor tpool::cache

Closed

relates to

MDEV-34641 MariaDB server getting stuck when ntp server switch time in midst of start up

Open

InnoDB hangs when running out of I/O slots

Details

Description

Attachments

Issue Links

Activity

People

Dates