[MDEV-17003] service_manager_extend_timeout() being called too often Created: 2018-08-16  Updated: 2018-12-07  Resolved: 2018-08-21

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.1.33, 10.2.15, 10.3.6
Fix Version/s: 10.1.36, 10.2.18, 10.3.10

Type: Bug Priority: Major
Reporter: Marko Mäkelä Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: performance, regression, systemd

Issue Links:
Problem/Incident
is caused by MDEV-14705 systemd: EXTEND_TIMEOUT_USEC= to avoi... Closed
Relates
relates to MDEV-16168 Performance regression on sysbench wr... Closed
relates to MDEV-17571 Make systemd timeout behavior more co... Closed
relates to MDEV-17934 Make systemd timeout behavior more co... Closed

 Description   

The function srv_do_purge() is invoking service_manager_extend_timeout() without any rate limit, wasting CPU in both the MariaDB server and systemd.

The call would only make sense during a slow shutdown SET GLOBAL innodb_fast_shutdown=0. Even then, it would make sense to rate-limit the output, and write an accumulated count only every N seconds. The purpose of the call is to prevent systemd from killing the server due to seeming hang, and to inform the user about the progress. It would be even more useful if there was an indication of work that remains to be done.

Also other calls to service_manager_extend_timeout() should be appropriately rate-limited and possibly disabled outside startup or shutdown.

danblack, as the contributor of MDEV-14705, do you know the default timeout before any EXTEND_TIMEOUT_USEC message was sent to systemd?



 Comments   
Comment by Daniel Black [ 2018-08-16 ]

Default time out is https://www.freedesktop.org/software/systemd/man/systemd.service.html#TimeoutStopSec= which falls back to https://www.freedesktop.org/software/systemd/man/systemd-system.conf.html#DefaultTimeoutStartSec= (90s).

Comment by Marko Mäkelä [ 2018-08-21 ]

I checked all the InnoDB calls to service_manager_extend_timeout(), and in addition to the one in srv_purge_should_exit(), which actually looks like it should be executed on a slow shutdown only, there are calls in buf_dump() and log_write_up_to() that could be less frequent:

diff --git a/storage/xtradb/buf/buf0dump.cc b/storage/xtradb/buf/buf0dump.cc
index 90358d34b04..e9168d9f5d5 100644
--- a/storage/xtradb/buf/buf0dump.cc
+++ b/storage/xtradb/buf/buf0dump.cc
@@ -1,7 +1,7 @@
 /*****************************************************************************
 
 Copyright (c) 2011, 2017, Oracle and/or its affiliates. All Rights Reserved.
-Copyright (c) 2017, MariaDB Corporation. All Rights Reserved.
+Copyright (c) 2017, 2018, MariaDB Corporation.
 
 This program is free software; you can redistribute it and/or modify it under
 the terms of the GNU General Public License as published by the Free Software
@@ -334,7 +334,7 @@ buf_dump(
 					i + 1, srv_buf_pool_instances,
 					j + 1, n_pages);
 			}
-			if ( (j % 1024) == 0) {
+			if (SHUTTING_DOWN() && !(j % 1024)) {
 				service_manager_extend_timeout(INNODB_EXTEND_TIMEOUT_INTERVAL,
 					"Dumping buffer pool "
 					ULINTPF "/" ULINTPF ", "
diff --git a/storage/xtradb/log/log0log.cc b/storage/xtradb/log/log0log.cc
index 1420f5a3a12..c4dc89dc013 100644
--- a/storage/xtradb/log/log0log.cc
+++ b/storage/xtradb/log/log0log.cc
@@ -1561,12 +1561,6 @@ log_write_up_to(
 		return;
 	}
 
-	if (srv_shutdown_state != SRV_SHUTDOWN_NONE) {
-		service_manager_extend_timeout(INNODB_EXTEND_TIMEOUT_INTERVAL,
-					       "log write up to: " LSN_PF,
-					       lsn);
-	}
-
 loop:
 	ut_ad(++loop_count < 100);
 
@@ -1679,6 +1673,12 @@ log_write_up_to(
 	log_sys->buf_free += OS_FILE_LOG_BLOCK_SIZE;
 	log_sys->write_end_offset = log_sys->buf_free;
 
+	if (UNIV_UNLIKELY(srv_shutdown_state != SRV_SHUTDOWN_NONE)) {
+		service_manager_extend_timeout(INNODB_EXTEND_TIMEOUT_INTERVAL,
+					       "log write up to: " LSN_PF,
+					       lsn);
+	}
+
 	group = UT_LIST_GET_FIRST(log_sys->log_groups);
 
 	/* Do the write to the log files */

Next, I am trying to test if srv_purge_should_exit() could emit the output outside a slow shutdown SET GLOBAL innodb_fast_shutdown=0. I am also trying to see how that output could be rate-limited.

Comment by Marko Mäkelä [ 2018-08-24 ]

It turns out that in 10.3.6, as part of this merge the call to service_manager_extend_timeout() was moved to srv_do_purge(), and it is being executed also outside a slow shutdown (innodb_fast_shutdown=0).
Currently, my fix is only present in 10.1 and merged to 10.2. When it is merged to 10.3 (before the 10.3.10 release), we will have to review and test this carefully.
Thanks to yctn for pointing this out!

Generated at Thu Feb 08 08:33:11 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.