[MDEV-15554] InnoDB page_cleaner shutdown sometimes hangs Created: 2018-03-13  Updated: 2018-05-07  Resolved: 2018-03-13

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.2.2, 10.3.0
Fix Version/s: 10.2.14, 10.3.6

Type: Bug Priority: Major
Reporter: Marko Mäkelä Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: hang, shutdown, upstream

Issue Links:
Duplicate
is duplicated by MDEV-15683 InnoDB: Waiting for page_cleaner to f... Closed
Relates
relates to MDEV-8188 Server hangs on shutdown in logs_empt... Closed
relates to MDEV-13779 InnoDB fails to shut down purge worke... Closed
relates to MDEV-14080 InnoDB shutdown sometimes hangs Closed
relates to MDEV-14379 encryption.innodb_encrypt_log_corrup... Closed
relates to MDEV-14705 systemd: EXTEND_TIMEOUT_USEC= to avoi... Closed
Sprint: 10.2.14

 Description   

With the merge of the InnoDB changes from MySQL 5.7.9, MariaDB 10.2.2 inherited a new shutdown hang that was introduced in MySQL 5.7.4 or 5.7.5.

The hang is caused by a race condition or a lost signal. The purge thread would signal the worker threads only once, and then keep waiting for the workers to exit, without bothering to resignal them.

The following patch fixes the problem:

diff --git a/storage/innobase/buf/buf0flu.cc b/storage/innobase/buf/buf0flu.cc
index ca647368908..24b27d7462c 100644
--- a/storage/innobase/buf/buf0flu.cc
+++ b/storage/innobase/buf/buf0flu.cc
@@ -2718,25 +2718,6 @@ buf_flush_page_cleaner_init(void)
 	page_cleaner.is_running = true;
 }
 
-/**
-Close page_cleaner. */
-static
-void
-buf_flush_page_cleaner_close(void)
-{
-	ut_ad(!page_cleaner.is_running);
-
-	/* waiting for all worker threads exit */
-	while (page_cleaner.n_workers) {
-		os_thread_sleep(10000);
-	}
-
-	mutex_destroy(&page_cleaner.mutex);
-
-	os_event_destroy(page_cleaner.is_finished);
-	os_event_destroy(page_cleaner.is_requested);
-}
-
 /**
 Requests for all slots to flush all buffer pool instances.
 @param min_n	wished minimum mumber of blocks flushed
@@ -3438,9 +3419,17 @@ DECLARE_THREAD(buf_flush_page_cleaner_coordinator)(void*)
 	and no more access to page_cleaner structure by them.
 	Wakes worker threads up just to make them exit. */
 	page_cleaner.is_running = false;
-	os_event_set(page_cleaner.is_requested);
 
-	buf_flush_page_cleaner_close();
+	/* waiting for all worker threads exit */
+	while (page_cleaner.n_workers) {
+		os_event_set(page_cleaner.is_requested);
+		os_thread_sleep(10000);
+	}
+
+	mutex_destroy(&page_cleaner.mutex);
+
+	os_event_destroy(page_cleaner.is_finished);
+	os_event_destroy(page_cleaner.is_requested);
 
 	buf_page_cleaner_is_active = false;
 

As noted in MDEV-8188, the hang can be repeated by running multiple concurrent instances of the server bootstrap, or repeatedly running a single instance of the server bootstrap:

scripts/mysql_install_db --no-defaults --innodb_buffer_pool_size=2G


Generated at Thu Feb 08 08:22:12 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.