[MDEV-20139] innodb.innodb_buffer_pool_dump_pct failed in buildbot with timeout in wait_condition.inc Created: 2019-07-23  Updated: 2023-12-12  Resolved: 2023-12-12

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB, Tests
Affects Version/s: 10.2, 10.3, 10.4, 10.5, 10.6, 10.7, 10.8, 10.9, 10.10, 10.11, 11.0, 11.1, 11.2
Fix Version/s: 10.4.33, 10.5.24, 10.6.17, 10.11.7, 11.0.5, 11.1.4, 11.2.3

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: None


 Description   

http://buildbot.askmonty.org/buildbot/builders/kvm-bintar-xenial-amd64/builds/1311

10.2 666730ee5210dab845bb02bce023e249

innodb.innodb_buffer_pool_dump_pct 'innodb' w3 [ fail ]
        Test ended at 2019-07-03 08:56:14
 
CURRENT_TEST: innodb.innodb_buffer_pool_dump_pct
--- /usr/local/mariadb-10.2.26-linux-x86_64/mysql-test/suite/innodb/r/innodb_buffer_pool_dump_pct.result	2019-07-03 07:37:15.000000000 -0400
+++ /usr/local/mariadb-10.2.26-linux-x86_64/mysql-test/suite/innodb/r/innodb_buffer_pool_dump_pct.reject	2019-07-03 08:56:14.217187024 -0400
@@ -17,6 +17,18 @@
 FROM information_schema.global_status
 WHERE variable_name = 'INNODB_BUFFER_POOL_DUMP_STATUS';
 SET GLOBAL innodb_buffer_pool_dump_now=ON;
+Timeout in wait_condition.inc for SELECT count(*) = 1
+FROM information_schema.global_status
+WHERE variable_name = 'INNODB_BUFFER_POOL_DUMP_STATUS'
+AND variable_value != @IBPDS
+AND variable_value like 'Buffer pool(s) dump completed at%'
+Id	User	Host	db	Command	Time	State	Info	Progress
+1	system user		NULL	Daemon	NULL	InnoDB purge coordinator	NULL	0.000
+2	system user		NULL	Daemon	NULL	InnoDB purge worker	NULL	0.000
+3	system user		NULL	Daemon	NULL	InnoDB purge worker	NULL	0.000
+4	system user		NULL	Daemon	NULL	InnoDB purge worker	NULL	0.000
+5	system user		NULL	Daemon	NULL	InnoDB shutdown handler	NULL	0.000
+16	root	localhost	test	Query	0	init	show full processlist	0.000
 SET GLOBAL innodb_buffer_pool_dump_pct=DEFAULT;
 SET GLOBAL innodb_buffer_pool_filename=DEFAULT;
 DROP TABLE tab5;
 
mysqltest: Result length mismatch



 Comments   
Comment by Alice Sherepa [ 2021-06-24 ]

on 10.6 101da87228f11a1d http://buildbot.askmonty.org/buildbot/builders/win32-debug/builds/19914/steps/test/logs/stdio win32

innodb.innodb_buffer_pool_dump_pct 'innodb' w1 [ fail ]
        Test ended at 2021-06-23 17:27:33
 
CURRENT_TEST: innodb.innodb_buffer_pool_dump_pct
--- D:/win32-debug/build/src/mysql-test/suite/innodb/r/innodb_buffer_pool_dump_pct.result	2021-06-23 16:55:26.220356200 +0000
+++ D:\win32-debug\build\src\mysql-test\suite\innodb\r\innodb_buffer_pool_dump_pct.reject	2021-06-23 17:27:33.072445100 +0000
@@ -17,6 +17,13 @@
 FROM information_schema.global_status
 WHERE variable_name = 'INNODB_BUFFER_POOL_DUMP_STATUS';
 SET GLOBAL innodb_buffer_pool_dump_now=ON;
+Timeout in wait_condition.inc for SELECT count(*) = 1
+FROM information_schema.global_status
+WHERE variable_name = 'INNODB_BUFFER_POOL_DUMP_STATUS'
+AND variable_value != @IBPDS
+AND variable_value like 'Buffer pool(s) dump completed at%'
+Id	User	Host	db	Command	Time	State	Info	Progress
+7	root	localhost:52422	test	Query	0	starting	show full processlist	0.000
 SET GLOBAL innodb_buffer_pool_dump_pct=DEFAULT;
 SET GLOBAL innodb_buffer_pool_filename=DEFAULT;
 DROP TABLE tab5;
 
mysqltest: Result length mismatch

Comment by Marko Mäkelä [ 2023-11-29 ]

This test still fails regularly, several times per month.

A simple fix would be to not use a separate thread for the buffer pool resizing, but simply make the SET GLOBAL innodb_buffer_pool_size thread to resize the buffer pool and return when done. ralf.gebhardt, would that be an acceptable change of behaviour in GA releases?

Comment by Ralf Gebhardt [ 2023-12-11 ]

If this is a way to fix the bug while it is not breaking compatibility, why not? Is there anything special to this bug fix compared to others? Are there any user visible changes which would require changes in an application?

Comment by Marko Mäkelä [ 2023-12-12 ]

Sorry, I meant that SET GLOBAL innodb_buffer_dump_now=ON could block until the operation has been completed. There is some similar logic around SET GLOBAL innodb_buffer_pool_size as well.

Given that the failures are rather rare and possibly occur when the CI workers are under heavy load, I suspect that the failure probability could be reduced by the following:

diff --git a/mysql-test/suite/innodb/t/innodb_buffer_pool_dump_pct.test b/mysql-test/suite/innodb/t/innodb_buffer_pool_dump_pct.test
index 381091165ef..ecdeb0eda21 100644
--- a/mysql-test/suite/innodb/t/innodb_buffer_pool_dump_pct.test
+++ b/mysql-test/suite/innodb/t/innodb_buffer_pool_dump_pct.test
@@ -46,9 +46,9 @@ FROM information_schema.global_status
 WHERE variable_name = 'INNODB_BUFFER_POOL_DUMP_STATUS';
 SET GLOBAL innodb_buffer_pool_dump_now=ON;
 
-# Sleep one second in order to ensure that the time stamp is
+# Sleep at least one second in order to ensure that the time stamp is
 # different at next dump
---sleep 1
+--sleep 1.5
 
 let $wait_condition = SELECT count(*) = 1
 FROM information_schema.global_status
@@ -70,9 +70,9 @@ WHERE variable_name = 'INNODB_BUFFER_POOL_DUMP_STATUS';
 
 SET GLOBAL innodb_buffer_pool_dump_now=ON;
 
-# Sleep one second in order to ensure that the time stamp is
+# Sleep at least one second in order to ensure that the time stamp is
 # different at next dump
---sleep 1
+--sleep 1.5
 
 let $wait_condition = SELECT count(*) = 1
 FROM information_schema.global_status

That is, under some circumstances the time stamp in the status message (measured in seconds) does not advance even though the client attempted to sleep for 1 second. The sleep statement is implemented by do_sleep(), which invokes my_sleep(). It will invoke the Microsoft Windows Sleep() or POSIX select(2), ignoring any errors. On GNU/Linux, the timeout interval will be rounded up to the system clock granularity, so presumably the sleep should not be any shorter than requested. However, EINTR is documented as a possible error. On Microsoft Windows (where this test has failed as well), it is documented that the actual sleep duration may be less than the requested delay.

I am not convinced that increasing the delay is a good approach. I think that it is better to first wait that the file ib_buffer_pool has been created, and then wait that the status message indicates completion. The actual ‘interesting part’ of the test is to show that the size of buffer pool dumps depends on the parameter innodb_buffer_pool_dump_pct.

Comment by Marko Mäkelä [ 2023-12-12 ]

The test got more than 100× faster on Microsoft Windows Debug:

innodb.innodb_buffer_pool_dump_pct 'innodb' w26 [ pass ]    169

On my Debian GNU/Linux system, the test would at best complete in 14 or 19 milliseconds, for RelWithDebInfo and Debug.

Generated at Thu Feb 08 08:57:07 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.