Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
10.9(EOL), 10.10(EOL), 10.11, 11.0(EOL), 11.1(EOL), 11.2, 11.3(EOL), 11.4, 11.5(EOL), 11.6, 11.7
Description
While testing MDEV-33853, mleich produced a core dump of a server hang, which resulted in an intentional crash with the infamous message that innodb_fatal_semaphore_wait for dict_sys.latch was exceeded.
In the stack traces that I analyzed, there were several threads trying to execute SET GLOBAL innodb_log_file_size as well as one DDL operation that was waiting on an exclusive log_sys.latch and therefore in the end blocking all threads, while holding exclusive dict_sys.latch.
I think that the root cause is flawed logic in log_resize_acquire(). It could also make sense to protect that function with LOCK_global_system_variables so that multiple instances of log_resize_acquire() can be executed concurrently. A fix might look as follows. This will need some additional stress testing to validate this.
diff --git a/storage/innobase/handler/ha_innodb.cc b/storage/innobase/handler/ha_innodb.cc
|
index ce2d3958f9c..27adf670b11 100644
|
--- a/storage/innobase/handler/ha_innodb.cc
|
+++ b/storage/innobase/handler/ha_innodb.cc
|
@@ -18518,9 +18518,7 @@ buffer_pool_load_abort(
|
static void innodb_log_file_buffering_update(THD *thd, st_mysql_sys_var*,
|
void *, const void *save)
|
{
|
- mysql_mutex_unlock(&LOCK_global_system_variables);
|
log_sys.set_buffered(*static_cast<const my_bool*>(save));
|
- mysql_mutex_lock(&LOCK_global_system_variables);
|
}
|
#endif
|
|
@@ -18528,7 +18526,6 @@ static void innodb_log_file_size_update(THD *thd, st_mysql_sys_var*,
|
void *var, const void *save)
|
{
|
ut_ad(var == &srv_log_file_size);
|
- mysql_mutex_unlock(&LOCK_global_system_variables);
|
|
if (high_level_read_only)
|
ib_senderrf(thd, IB_LOG_LEVEL_ERROR, ER_READ_ONLY_MODE);
|
@@ -18551,13 +18548,15 @@ static void innodb_log_file_size_update(THD *thd, st_mysql_sys_var*,
|
ib_senderrf(thd, IB_LOG_LEVEL_ERROR, ER_CANT_CREATE_HANDLER_FILE);
|
break;
|
case log_t::RESIZE_STARTED:
|
+ mysql_mutex_unlock(&LOCK_global_system_variables);
|
const lsn_t start{log_sys.resize_in_progress()};
|
for (timespec abstime;;)
|
{
|
if (thd_kill_level(thd))
|
{
|
+ mysql_mutex_lock(&LOCK_global_system_variables);
|
log_sys.resize_abort();
|
- break;
|
+ return;
|
}
|
|
set_timespec(abstime, 5);
|
@@ -18588,9 +18587,9 @@ static void innodb_log_file_size_update(THD *thd, st_mysql_sys_var*,
|
if (!resizing || resizing > start /* only wait for our resize */)
|
break;
|
}
|
+ mysql_mutex_lock(&LOCK_global_system_variables);
|
}
|
}
|
- mysql_mutex_lock(&LOCK_global_system_variables);
|
}
|
|
static void innodb_log_spin_wait_delay_update(THD *, st_mysql_sys_var*,
|
diff --git a/storage/innobase/log/log0log.cc b/storage/innobase/log/log0log.cc
|
index d7aae556ce0..7c94876996c 100644
|
--- a/storage/innobase/log/log0log.cc
|
+++ b/storage/innobase/log/log0log.cc
|
@@ -384,6 +384,8 @@ void log_t::close_file()
|
/** Acquire all latches that protect the log. */
|
static void log_resize_acquire()
|
{
|
+ mysql_mutex_assert_owner(&LOCK_global_system_variables);
|
+
|
if (!log_sys.is_pmem())
|
{
|
while (flush_lock.acquire(log_sys.get_lsn() + 1, nullptr) != |
I am also wondering if the logic regarding flush_lock.acquire() and write_lock.acquire(), which are partly visible above, is correct.
wlad, I wonder if there is any chance that another thread in log_write_up_to() could get "in between" and start to wait for log_sys.latch before log_resize_acquire() completes log_sys.latch.wr_lock()? If yes, is there any way to acquire the write_lock and flush_lock with an even larger LSN (such as LSN_MAX) and rewind to the actual LSN once done? I have a concern that even with the above patch, a hang could be possible.
Attachments
Issue Links
- is caused by
-
MDEV-27812 Allow innodb_log_file_size to change without server restart
- Closed
- relates to
-
MDEV-31311 The test innodb.log_file_size_online occasionally hangs
- Closed
-
MDEV-33853 Async rollback prepared transactions during binlog crash recovery
- Closed