[MDEV-9713] Sporadic test failure: sys_vars.innodb_buffer_pool_load_now_basic Created: 2016-03-11  Updated: 2017-11-05  Resolved: 2016-04-04

Status: Closed
Project: MariaDB Server
Component/s: Tests
Affects Version/s: 10.0, 10.1
Fix Version/s: 10.0.25, 10.1.14

Type: Bug Priority: Minor
Reporter: Otto Kekäläinen Assignee: Elena Stepanova
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Relates
relates to MDEV-9867 InnoDB can be a second behind the server Closed
relates to MDEV-10651 sys_vars.innodb_buffer_pool_dump_pct_... Closed

 Description   

I noticed in Debian build logs that this fails sporadically:

sys_vars.innodb_buffer_pool_load_now_basic 'xtradb' w6 [ fail ]
        Test ended at 2016-03-12 03:03:35
 
CURRENT_TEST: sys_vars.innodb_buffer_pool_load_now_basic
mysqltest: At line 23: command "file_exists" failed with error: 1  my_errno: 0  errno: 2
 
The result from queries just before the failure was:
SET @orig = @@global.innodb_buffer_pool_load_now;
SELECT @orig;
@orig
0
SET GLOBAL innodb_buffer_pool_dump_now = ON;

(from https://tests.reproducible-builds.org/logs/unstable/amd64/mariadb-10.0_10.0.24-4.build2.log.gz)

elenst mentioned it in TODO-723, so it is a known one, but there was no MDEV so I created one now.



 Comments   
Comment by Elena Stepanova [ 2016-03-11 ]

It is a basic race condition + undesirable interrelation between tests. It is easy reproduce by adding a sleep somewhere here:

diff --git a/storage/xtradb/buf/buf0dump.cc b/storage/xtradb/buf/buf0dump.cc
index 2b3b506..60e2f36 100644
--- a/storage/xtradb/buf/buf0dump.cc
+++ b/storage/xtradb/buf/buf0dump.cc
@@ -193,6 +193,8 @@ buf_dump(
                    "%s%c%s", srv_data_home, SRV_PATH_SEPARATOR,
                    srv_buf_dump_filename);
 
+    my_sleep(1000000);
+
        ut_snprintf(tmp_filename, sizeof(tmp_filename),
                    "%s.incomplete", full_filename);

And running two tests in a specific order:

perl ./mtr sys_vars.innodb_buffer_pool_filename_basic,xtradb sys_vars.innodb_buffer_pool_load_now_basic,xtradb --noreorder

What happens here is that the first test, innodb_buffer_pool_filename_basic, executes buffer pool dump into a different file name and sets the status variable innodb_buffer_pool_dump_status 'Buffer pool(s) dump completed at'. It then properly restores the file name to the default value.
Then the second test (the one in question) starts executing. It triggers buffer pool dump in the background thread and starts watching the status variable innodb_buffer_pool_dump_status waiting for it to say that the dump was completed. If it checks the value for the first time after the dump started, it's fine, because the value will be set to something about "in progress", and will be changed to "completed" only when the file is created. But if it checks the value before the dump started, which we cause by adding the sleep, the previous status variable value is still "dump completed", so the wait ends immediately. However, the file isn't there yet (because the previous test dumped into a different file name, remember?). Hence the error.

Comment by Elena Stepanova [ 2016-03-12 ]

To jplindst:
here is one way to fix it. Please check if you agree, or you can think of something better.
For a note, the added wait condition is just to avoid a sleep. It saves ~1 sec execution time, but if you think it's not worth the complexity, please feel free to replace it with sleep 1.
The additional clause in the second wait condition is necessary anyway.

diff --git a/mysql-test/suite/sys_vars/t/innodb_buffer_pool_load_now_basic.test b/mysql-test/suite/sys_vars/t/innodb_buffer_pool_load_now_basic.test
index fe376a3..5cf8156 100644
--- a/mysql-test/suite/sys_vars/t/innodb_buffer_pool_load_now_basic.test
+++ b/mysql-test/suite/sys_vars/t/innodb_buffer_pool_load_now_basic.test
@@ -8,12 +8,20 @@
 SET @orig = @@global.innodb_buffer_pool_load_now;
 SELECT @orig;
 
+let $old_status= `SELECT variable_value FROM information_schema.global_status
+    WHERE LOWER(variable_name) = 'innodb_buffer_pool_dump_status'`;
+
+let $wait_condition =
+  SELECT TRIM(SUBSTR('$old_status', -8)) != DATE_FORMAT(CURTIME(), '%l:%i:%s');
+-- source include/wait_condition.inc
+
 # Do the dump
 SET GLOBAL innodb_buffer_pool_dump_now = ON;
 
 # Wait for the dump to complete
 let $wait_condition =
-  SELECT SUBSTR(variable_value, 1, 33) = 'Buffer pool(s) dump completed at '
+  SELECT variable_value != '$old_status'
+     AND SUBSTR(variable_value, 1, 33) = 'Buffer pool(s) dump completed at '
   FROM information_schema.global_status
   WHERE LOWER(variable_name) = 'innodb_buffer_pool_dump_status';
 -- source include/wait_condition.inc

Comment by Jan Lindström (Inactive) [ 2016-03-12 ]

ok to push your fix.

Comment by Elena Stepanova [ 2016-03-12 ]

Comparing to the patch above, changed the hour format from '%l' to '%k', as '%l' didn't work correctly for afternoon values.

https://github.com/MariaDB/server/commit/0125e588f3967bcaf01cf60b6a64e92758158ac4

Comment by Elena Stepanova [ 2016-03-13 ]

Got a new failure with the change – also sporadic, happened only once:
http://buildbot.askmonty.org/buildbot/builders/kvm-deb-precise-amd64/builds/5853/steps/test_4/logs/stdio

sys_vars.innodb_buffer_pool_load_now_basic 'innodb_plugin' w3 [ fail ]
        Test ended at 2016-03-12 20:13:43
 
CURRENT_TEST: sys_vars.innodb_buffer_pool_load_now_basic
--- /usr/share/mysql/mysql-test/suite/sys_vars/r/innodb_buffer_pool_load_now_basic.result	2016-03-12 18:03:51.000000000 +0200
+++ /run/shm/var/3/log/innodb_buffer_pool_load_now_basic.reject	2016-03-12 20:13:43.294185838 +0200
@@ -3,6 +3,12 @@
 @orig
 0
 SET GLOBAL innodb_buffer_pool_dump_now = ON;
+Timeout in wait_condition.inc for SELECT variable_value != 'Buffer pool(s) dump completed at 160312 20:13:12'
+     AND SUBSTR(variable_value, 1, 33) = 'Buffer pool(s) dump completed at '
+  FROM information_schema.global_status
+WHERE LOWER(variable_name) = 'innodb_buffer_pool_dump_status'
+Id	User	Host	db	Command	Time	State	Info	Progress
+30	root	localhost	test	Query	0	init	show full processlist	0.000
 SET GLOBAL innodb_buffer_pool_load_now = ON;
 SELECT variable_value
 FROM information_schema.global_status

From the error log, it's pretty clear why the wait condition timed out:

CURRENT_TEST: sys_vars.innodb_buffer_pool_filename_basic
2016-03-12 20:13:12 7facde3fd700 InnoDB: Dumping buffer pool(s) to .//innodb_foobar_dump
2016-03-12 20:13:12 7facde3fd700 InnoDB: Buffer pool(s) dump completed at 160312 20:13:12
CURRENT_TEST: sys_vars.innodb_buffer_pool_instances_basic
CURRENT_TEST: sys_vars.innodb_buffer_pool_load_abort_basic
CURRENT_TEST: sys_vars.innodb_buffer_pool_load_at_startup_basic
CURRENT_TEST: sys_vars.innodb_buffer_pool_load_now_basic
2016-03-12 20:13:12 7facde3fd700 InnoDB: Dumping buffer pool(s) to .//ib_buffer_pool
2016-03-12 20:13:12 7facde3fd700 InnoDB: Buffer pool(s) dump completed at 160312 20:13:12
2016-03-12 20:13:43 7facde3fd700 InnoDB: Loading buffer pool(s) from .//ib_buffer_pool
2016-03-12 20:13:43 7facde3fd700 InnoDB: Buffer pool(s) load completed at 160312 20:13:43

So, the clever wait for the next second did not work, the test performed the dump the same second as the previous one happened, hence the variable value never changed, and wait condition did not succeed.

The problem is, I have no idea why the wait for the next second did not work, it's pretty straightforward. Unfortunately, there is no general log to see the details. I thought maybe there is some timezone specifics on the machine, or something of the sort, but I ran the same VM, and there seems to be nothing special about it, everything works as expected. I'll keep thinking.

Comment by Elena Stepanova [ 2016-03-13 ]

It turned out to be easy enough to reproduce by running the test with repeat:

....
sys_vars.innodb_buffer_pool_load_now_basic 'innodb_plugin' [ pass ]    713
sys_vars.innodb_buffer_pool_load_now_basic 'innodb_plugin' [ pass ]    814
sys_vars.innodb_buffer_pool_load_now_basic 'innodb_plugin' [ pass ]    813
sys_vars.innodb_buffer_pool_load_now_basic 'innodb_plugin' [ fail ]
        Test ended at 2016-03-12 20:10:07
 
CURRENT_TEST: sys_vars.innodb_buffer_pool_load_now_basic
--- /bld/10.0-release/mysql-test/suite/sys_vars/r/innodb_buffer_pool_load_now_basic.result      2016-01-17 10:16:40.000000000 -0500
+++ /bld/10.0-release/mysql-test/suite/sys_vars/r/innodb_buffer_pool_load_now_basic.reject      2016-03-12 20:10:07.434489216 -0500
@@ -3,6 +3,12 @@
 @orig
 0
 SET GLOBAL innodb_buffer_pool_dump_now = ON;
+Timeout in wait_condition.inc for SELECT variable_value != 'Buffer pool(s) dump completed at 160312 20:09:36'
+     AND SUBSTR(variable_value, 1, 33) = 'Buffer pool(s) dump completed at '
+  FROM information_schema.global_status
+WHERE LOWER(variable_name) = 'innodb_buffer_pool_dump_status'
+Id     User    Host    db      Command Time    State   Info    Progress
+79     root    localhost       test    Query   0       init    show full processlist   0.000
 SET GLOBAL innodb_buffer_pool_load_now = ON;
 SELECT variable_value
 FROM information_schema.global_status

Here is what we have in the error log:

...
CURRENT_TEST: sys_vars.innodb_buffer_pool_load_now_basic
2016-03-12 20:09:35 7fd8705fb700 InnoDB: Dumping buffer pool(s) to .//ib_buffer_pool
2016-03-12 20:09:35 7fd8705fb700 InnoDB: Buffer pool(s) dump completed at 160312 20:09:35
2016-03-12 20:09:35 7fd8705fb700 InnoDB: Loading buffer pool(s) from .//ib_buffer_pool
2016-03-12 20:09:35 7fd8705fb700 InnoDB: Buffer pool(s) load completed at 160312 20:09:35
CURRENT_TEST: sys_vars.innodb_buffer_pool_load_now_basic
2016-03-12 20:09:36 7fd8705fb700 InnoDB: Dumping buffer pool(s) to .//ib_buffer_pool
2016-03-12 20:09:36 7fd8705fb700 InnoDB: Buffer pool(s) dump completed at 160312 20:09:36
2016-03-12 20:09:36 7fd8705fb700 InnoDB: Loading buffer pool(s) from .//ib_buffer_pool
2016-03-12 20:09:36 7fd8705fb700 InnoDB: Buffer pool(s) load completed at 160312 20:09:36
CURRENT_TEST: sys_vars.innodb_buffer_pool_load_now_basic
2016-03-12 20:09:36 7fd8705fb700 InnoDB: Dumping buffer pool(s) to .//ib_buffer_pool
2016-03-12 20:09:36 7fd8705fb700 InnoDB: Buffer pool(s) dump completed at 160312 20:09:36
2016-03-12 20:10:07 7fd8705fb700 InnoDB: Loading buffer pool(s) from .//ib_buffer_pool
2016-03-12 20:10:07 7fd8705fb700 InnoDB: Buffer pool(s) load completed at 160312 20:10:07

That is, the last test says the dump was performed on the same second as the previous one.
But here is what we have in the general log:

                   79 Query     SELECT TRIM(SUBSTR('Buffer pool(s) dump completed at 160312 20:09:36', -8)) != DATE_FORMAT(CURTIME(), '%k:%i:%s')
                   79 Query     SELECT TRIM(SUBSTR('Buffer pool(s) dump completed at 160312 20:09:36', -8)) != DATE_FORMAT(CURTIME(), '%k:%i:%s')
                   79 Query     SELECT TRIM(SUBSTR('Buffer pool(s) dump completed at 160312 20:09:36', -8)) != DATE_FORMAT(CURTIME(), '%k:%i:%s')
                   79 Query     SELECT TRIM(SUBSTR('Buffer pool(s) dump completed at 160312 20:09:36', -8)) != DATE_FORMAT(CURTIME(), '%k:%i:%s')
                   79 Query     SELECT TRIM(SUBSTR('Buffer pool(s) dump completed at 160312 20:09:36', -8)) != DATE_FORMAT(CURTIME(), '%k:%i:%s')
                   79 Query     SELECT TRIM(SUBSTR('Buffer pool(s) dump completed at 160312 20:09:36', -8)) != DATE_FORMAT(CURTIME(), '%k:%i:%s')
                   79 Query     SELECT TRIM(SUBSTR('Buffer pool(s) dump completed at 160312 20:09:36', -8)) != DATE_FORMAT(CURTIME(), '%k:%i:%s')
160312 20:09:37    79 Query     SELECT TRIM(SUBSTR('Buffer pool(s) dump completed at 160312 20:09:36', -8)) != DATE_FORMAT(CURTIME(), '%k:%i:%s')
                   79 Query     SET GLOBAL innodb_buffer_pool_dump_now = ON
                   79 Query     SELECT variable_value != 'Buffer pool(s) dump completed at 160312 20:09:36'
     AND SUBSTR(variable_value, 1, 33) = 'Buffer pool(s) dump completed at '
  FROM information_schema.global_status

So, the waiting on the current time worked perfectly – the test did wait till CURTIME switched to the next second, only only then, at 20:09:37, the dump was performed.
However, InnoDB still wrote its records to the error log with the older timestamp, and used the same for the variable value. Further failure is caused by this inconsistency.

Thinking about it, it's no news that InnoDB has its own understanding of time, but I didn't know it would affect seconds. I don't know if it's a bug, maybe it is, but it's so tiny that I doubt anybody would care fixing it. Just for accuracy, lets ask jplindst: see the explanation above – do you want to look into why InnoDB would use different time comparing to the server?

If not, I'll think of some other way to fix the test. sleep 1 also does not help here, it can hit the same problem; maybe longer sleeps could do the trick, but then again, if InnoDB can be 1 second late, maybe it can be 10 seconds late?

On a separate note, while running the test repeatedly I found out that the final part of the test, the LOAD part, also does not work properly, on a similar reason to what we had here initially: if the status variable was previously set, the last SELECT from I_S can happen too early and show an unexpected value. I'll fix it along with re-fixing the current issue.

Comment by Jan Lindström (Inactive) [ 2016-03-14 ]

I think this is because InnoDB log is written first and then (when stmt completed ?) to general log, if we are near to end of second it could mean that there looks like there is a second difference. Both logs should have more accuracy on timestamps.

Comment by Elena Stepanova [ 2016-03-14 ]

jplindst,
I don't think it can be so. First, statements are written to the general log before they are executed, otherwise we would never have crashing queries in general logs, and we always do.
But more importantly, even the previous statement, the last SELECT TRIM, is also written with the newer timestamp. Now, it cannot possibly be explained by InnoDB writing its log before the SET statement was written to the general log. By this time, InnoDB knew nothing about future request for the dump.

Comment by Elena Stepanova [ 2016-03-29 ]

As discussed with jplindst, the reason of the issue with InnoDB timestamps is currently unknown, so I will change the test not to fail due to it, and instead will file a bug about the timestamp problem.

Comment by Elena Stepanova [ 2016-04-04 ]

The InnoDB timestamp issue filed as MDEV-9867, the wait condition has been replaced with a conditional sleep.

https://github.com/MariaDB/server/commit/f8adeccd78bff80725a95b73447e34a5f4528179

Generated at Thu Feb 08 07:36:46 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.