[MDEV-14196] sys_vars.innodb_buffer_pool_load_now_basic failed in buildbot with timeout Created: 2017-10-29  Updated: 2017-11-05  Resolved: 2017-11-05

Status: Closed
Project: MariaDB Server
Component/s: Tests
Affects Version/s: 10.2
Fix Version/s: 10.0.34, 10.1.29, 10.2.11

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Elena Stepanova
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Relates
relates to MDEV-10651 sys_vars.innodb_buffer_pool_dump_pct_... Closed

 Description   

http://buildbot.askmonty.org/buildbot/builders/kvm-fulltest/builds/10584/steps/test_1/logs/stdio

sys_vars.innodb_buffer_pool_load_now_basic 'innodb' w2 [ fail ]
        Test ended at 2017-10-11 11:24:15
 
CURRENT_TEST: sys_vars.innodb_buffer_pool_load_now_basic
Warning: /mnt/buildbot/build/mariadb-10.2.10/libmysqld/examples/mysqltest_embedded: unknown variable 'loose-ssl-ca=/mnt/buildbot/build/mariadb-10.2.10/mysql-test/std_data/cacert.pem'
Warning: /mnt/buildbot/build/mariadb-10.2.10/libmysqld/examples/mysqltest_embedded: unknown variable 'loose-ssl-cert=/mnt/buildbot/build/mariadb-10.2.10/mysql-test/std_data/client-cert.pem'
Warning: /mnt/buildbot/build/mariadb-10.2.10/libmysqld/examples/mysqltest_embedded: unknown variable 'loose-ssl-key=/mnt/buildbot/build/mariadb-10.2.10/mysql-test/std_data/client-key.pem'
Warning: /mnt/buildbot/build/mariadb-10.2.10/libmysqld/examples/mysqltest_embedded: unknown option '--loose-skip-ssl'
--- /mnt/buildbot/build/mariadb-10.2.10/mysql-test/suite/sys_vars/r/innodb_buffer_pool_load_now_basic.result	2017-10-11 06:31:58.000000000 -0400
+++ /mnt/buildbot/build/mariadb-10.2.10/mysql-test/suite/sys_vars/r/innodb_buffer_pool_load_now_basic.reject	2017-10-11 11:24:13.832609279 -0400
@@ -4,6 +4,17 @@
 0
 SET GLOBAL innodb_buffer_pool_dump_now = ON;
 SET GLOBAL innodb_buffer_pool_load_now = ON;
+Timeout in wait_condition.inc for SELECT variable_value != 'Buffer pool(s) load completed at 171011 11:23:43' &&
+SUBSTR(variable_value, 1, 33) = 'Buffer pool(s) load completed at '
+  FROM information_schema.global_status
+WHERE LOWER(variable_name) = 'innodb_buffer_pool_load_status'
+Id	User	Host	db	Command	Time	State	Info	Progress
+1	system user		NULL	Daemon	NULL	InnoDB purge coordinator	NULL	0.000
+4	system user		NULL	Daemon	NULL	InnoDB purge worker	NULL	0.000
+2	system user		NULL	Daemon	NULL	InnoDB purge worker	NULL	0.000
+3	system user		NULL	Daemon	NULL	InnoDB purge worker	NULL	0.000
+5	system user		NULL	Daemon	NULL	InnoDB shutdown handler	NULL	0.000
+6	root	localhost	test	Query	0	cleaning up	show full processlist	0.000
 SELECT variable_value
 FROM information_schema.global_status
 WHERE LOWER(variable_name) = 'innodb_buffer_pool_load_status';
 
mysqltest: Result length mismatch



 Comments   
Comment by Elena Stepanova [ 2017-11-03 ]

Reproducible reliably with this injection:

diff --git a/storage/innobase/buf/buf0dump.cc b/storage/innobase/buf/buf0dump.cc
index 945a154..0cd4cf2 100644
--- a/storage/innobase/buf/buf0dump.cc
+++ b/storage/innobase/buf/buf0dump.cc
@@ -813,6 +813,7 @@ DECLARE_THREAD(buf_dump_thread)(void*)
 #ifdef WITH_WSREP
                if (!wsrep_recovery) {
 #endif /* WITH_WSREP */
+                       my_sleep(3000000);
                        buf_load();
 #ifdef WITH_WSREP
                }

$ perl ./mtr  sys_vars.innodb_buffer_pool_load_now_basic  
...
sys_vars.innodb_buffer_pool_load_now_basic 'innodb' [ fail ]
        Test ended at 2017-11-03 20:35:35
 
CURRENT_TEST: sys_vars.innodb_buffer_pool_load_now_basic
--- /data/src/10.2-bug/mysql-test/suite/sys_vars/r/innodb_buffer_pool_load_now_basic.result	2017-08-28 14:54:48.487645987 +0300
+++ /data/src/10.2-bug/mysql-test/suite/sys_vars/r/innodb_buffer_pool_load_now_basic.reject	2017-11-03 20:35:35.006826206 +0200
@@ -4,6 +4,17 @@
 0
 SET GLOBAL innodb_buffer_pool_dump_now = ON;
 SET GLOBAL innodb_buffer_pool_load_now = ON;
+Timeout in wait_condition.inc for SELECT variable_value != 'Buffer pool(s) load completed at 171103 20:35:04' &&
+SUBSTR(variable_value, 1, 33) = 'Buffer pool(s) load completed at '
+  FROM information_schema.global_status
+WHERE LOWER(variable_name) = 'innodb_buffer_pool_load_status'
+Id	User	Host	db	Command	Time	State	Info	Progress
+1	system user		NULL	Daemon	NULL	InnoDB purge coordinator	NULL	0.000
+3	system user		NULL	Daemon	NULL	InnoDB purge worker	NULL	0.000
+2	system user		NULL	Daemon	NULL	InnoDB purge worker	NULL	0.000
+4	system user		NULL	Daemon	NULL	InnoDB purge worker	NULL	0.000
+5	system user		NULL	Daemon	NULL	InnoDB shutdown handler	NULL	0.000
+9	root	localhost	test	Query	0	init	show full processlist	0.000
 SELECT variable_value
 FROM information_schema.global_status
 WHERE LOWER(variable_name) = 'innodb_buffer_pool_load_status';
 
mysqltest: Result length mismatch

So, the failure happens because of a delayed buffer pool load on server startup.

The problem was introduced by this commit in 10.2:

commit 14b1c8c80d89ab54cadedf94fa7604b8f917aecd
Author: Monty
Date:   Wed Oct 5 01:09:52 2016 +0300
 
    After merge and bug fixes
    - Fixed compiler warnings
    - Removed have_debug.inc from innochecksum_3
    - Fixed race condition in innodb_buffer_pool_load
    ...

particularly, but this change:

--- a/mysql-test/suite/sys_vars/t/innodb_buffer_pool_load_now_basic.test
+++ b/mysql-test/suite/sys_vars/t/innodb_buffer_pool_load_now_basic.test
@@ -20,8 +20,10 @@ let $old_status= `SELECT variable_value FROM information_schema.global_status
 # let $wait_condition =
 #  SELECT TRIM(SUBSTR('$old_status', -8)) != DATE_FORMAT(CURTIME(), '%k:%i:%s');
 # -- source include/wait_condition.inc
-if (`SELECT variable_value LIKE '%dump completed at%' FROM information_schema.global_status
-     WHERE LOWER(variable_name) = 'innodb_buffer_pool_dump_status'`)
+if (`SELECT count(*) > 0 FROM information_schema.global_status
+     WHERE (LOWER(variable_name) = 'innodb_buffer_pool_dump_status' or
+            LOWER(variable_name) = 'innodb_buffer_pool_load_status')
+            and variable_value LIKE '%completed at%'`)
 {
   -- sleep 2
 }
@@ -42,12 +44,17 @@ let $wait_condition =
 -- let $file = `SELECT CONCAT(@@datadir, @@global.innodb_buffer_pool_filename)`
 -- file_exists $file
 
+let $old_load_status=
+   `SELECT variable_value FROM information_schema.global_status
+    WHERE LOWER(variable_name) = 'innodb_buffer_pool_load_status'`;
+
 # Load the dump
 SET GLOBAL innodb_buffer_pool_load_now = ON;
 
 # Wait for the load to complete
 let $wait_condition =
-  SELECT SUBSTR(variable_value, 1, 33) = 'Buffer pool(s) load completed at '
+  SELECT variable_value != '$old_load_status' &&
+         SUBSTR(variable_value, 1, 33) = 'Buffer pool(s) load completed at '
   FROM information_schema.global_status
   WHERE LOWER(variable_name) = 'innodb_buffer_pool_load_status';
 --disable_warnings

The change apparently attempted to solve this failure:

@@ -8,4 +8,4 @@
 FROM information_schema.global_status
 WHERE LOWER(variable_name) = 'innodb_buffer_pool_load_status';
 variable_value
-Buffer pool(s) load completed at TIMESTAMP_NOW
+Loaded 1/85 pages
 
mysqltest: Result length mismatch

but it's not accurate.

Here is what happens in the test now:

1) test starts
2) test stores the value of innodb_buffer_pool_dump_status variable;
3) test checks whether innodb_buffer_pool_dump_status or innodb_buffer_pool_load_status contain 'completed';
4) if they do, the test waits for 2 seconds to make sure the timestamp has changed;
5) test executes buffer pool dump and checks that it's reflected in innodb_buffer_pool_dump_status – that the value has changed comparing to the old stored value;
6) test stores the value of innodb_buffer_pool_load_status (it thinks it wouldn't have changed since the beginning of the test, since we didn't do any loads yet);
7) test executes buffer pool load and checks that it's reflected in innodb_buffer_pool_dump_status – that the value has changed comparing to the old stored value

If the test is running after server restart without innodb_buffer_pool_load_at_startup, then at step (3) neither of the variables would contain "completed", as neither a dump nor a load has been done yet, so the test proceeds immediately, and in both further checks, at steps (5) and (7), the value necessarily changes comparing to the original ones.

If the test runs after some other test which executed either a dump or a load, or if the load has already happened on server startup, then on step 4 it waits for 2 seconds, and then at further checks at steps (5) and (7) the values will also change, because they'll contain different timestamps.

The problem happens if the load is delayed, as imitated by the injection above. In this case, when the test checks values at step (3), they still don't contain any "completed", so it proceeds. And after that, it (wrongly) assumes that the values cannot change any longer, except for when we expect it. However, by step 6, the delayed load on startup finishes, the value gets the current timestamp, and thus when the test executes step (7), the new value gets the same timestamp as the previous one, and the check fails.

The solution could be to split the check on step (3), to check the dump value before doing the dump, and load value before doing the load, not both at once. There can still be a tiny window for a race condition, but the accuracy of the checks is limited anyway, because of MDEV-9867.

Generated at Thu Feb 08 08:11:40 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.