[MDEV-33361] Excessive delays in SET GLOBAL innodb_log_file_size Created: 2024-02-02  Updated: 2024-02-02  Resolved: 2024-02-02

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.9, 10.10, 10.11, 11.0, 11.1, 11.2, 11.3, 11.4
Fix Version/s: 10.11.8, 11.0.6, 11.1.5, 11.2.4, 11.3.3, 11.4.2

Type: Bug Priority: Major
Reporter: Marko Mäkelä Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: hang, performance

Issue Links:
Problem/Incident
is caused by MDEV-27812 Allow innodb_log_file_size to change ... Closed
Relates

 Description   

When mleich tested a port of MDEV-27812 to MariaDB Enterprise Server 10.6, he noticed that an attempt to execute multiple concurrent SET GLOBAL innodb_log_file_size=...; would result in long waits. The culprit turned out to be an incorrect choice of a condition variable in a timed wait:

diff --git a/storage/innobase/handler/ha_innodb.cc b/storage/innobase/handler/ha_innodb.cc
index 407834f2008..06af558a2e7 100644
--- a/storage/innobase/handler/ha_innodb.cc
+++ b/storage/innobase/handler/ha_innodb.cc
@@ -18452,7 +18452,7 @@ static void innodb_log_file_size_update(THD *thd, st_mysql_sys_var*,
         const bool in_progress(buf_pool.get_oldest_modification(LSN_MAX) <
                                log_sys.resize_in_progress());
         if (in_progress)
-          my_cond_timedwait(&buf_pool.do_flush_list,
+          my_cond_timedwait(&buf_pool.done_flush_list,
                             &buf_pool.flush_list_mutex.m_mutex, &abstime);
         mysql_mutex_unlock(&buf_pool.flush_list_mutex);
         if (!log_sys.resize_in_progress())

The condition variable buf_pool.done_flush_list is broadcast by the buf_flush_page_cleaner() after the end of each batch, which is when the log checkpoint can advance and where any log resizing may be completed. The purpose of the condition variable buf_pool.do_flush_list is to wake up the buf_flush_page_cleaner() thread because there is work to do. If no thread is signaling that condition variable, this loop could unnecessarily wait for up to 5 seconds too long for the log resizing to be completed. By consuming signals it could also prevent the buf_flush_page_cleaner() thread from waking up.


Generated at Thu Feb 08 10:38:20 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.