Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-14196

sys_vars.innodb_buffer_pool_load_now_basic failed in buildbot with timeout

Details

    Description

      http://buildbot.askmonty.org/buildbot/builders/kvm-fulltest/builds/10584/steps/test_1/logs/stdio

      sys_vars.innodb_buffer_pool_load_now_basic 'innodb' w2 [ fail ]
              Test ended at 2017-10-11 11:24:15
       
      CURRENT_TEST: sys_vars.innodb_buffer_pool_load_now_basic
      Warning: /mnt/buildbot/build/mariadb-10.2.10/libmysqld/examples/mysqltest_embedded: unknown variable 'loose-ssl-ca=/mnt/buildbot/build/mariadb-10.2.10/mysql-test/std_data/cacert.pem'
      Warning: /mnt/buildbot/build/mariadb-10.2.10/libmysqld/examples/mysqltest_embedded: unknown variable 'loose-ssl-cert=/mnt/buildbot/build/mariadb-10.2.10/mysql-test/std_data/client-cert.pem'
      Warning: /mnt/buildbot/build/mariadb-10.2.10/libmysqld/examples/mysqltest_embedded: unknown variable 'loose-ssl-key=/mnt/buildbot/build/mariadb-10.2.10/mysql-test/std_data/client-key.pem'
      Warning: /mnt/buildbot/build/mariadb-10.2.10/libmysqld/examples/mysqltest_embedded: unknown option '--loose-skip-ssl'
      --- /mnt/buildbot/build/mariadb-10.2.10/mysql-test/suite/sys_vars/r/innodb_buffer_pool_load_now_basic.result	2017-10-11 06:31:58.000000000 -0400
      +++ /mnt/buildbot/build/mariadb-10.2.10/mysql-test/suite/sys_vars/r/innodb_buffer_pool_load_now_basic.reject	2017-10-11 11:24:13.832609279 -0400
      @@ -4,6 +4,17 @@
       0
       SET GLOBAL innodb_buffer_pool_dump_now = ON;
       SET GLOBAL innodb_buffer_pool_load_now = ON;
      +Timeout in wait_condition.inc for SELECT variable_value != 'Buffer pool(s) load completed at 171011 11:23:43' &&
      +SUBSTR(variable_value, 1, 33) = 'Buffer pool(s) load completed at '
      +  FROM information_schema.global_status
      +WHERE LOWER(variable_name) = 'innodb_buffer_pool_load_status'
      +Id	User	Host	db	Command	Time	State	Info	Progress
      +1	system user		NULL	Daemon	NULL	InnoDB purge coordinator	NULL	0.000
      +4	system user		NULL	Daemon	NULL	InnoDB purge worker	NULL	0.000
      +2	system user		NULL	Daemon	NULL	InnoDB purge worker	NULL	0.000
      +3	system user		NULL	Daemon	NULL	InnoDB purge worker	NULL	0.000
      +5	system user		NULL	Daemon	NULL	InnoDB shutdown handler	NULL	0.000
      +6	root	localhost	test	Query	0	cleaning up	show full processlist	0.000
       SELECT variable_value
       FROM information_schema.global_status
       WHERE LOWER(variable_name) = 'innodb_buffer_pool_load_status';
       
      mysqltest: Result length mismatch
      

      Attachments

        Issue Links

          Activity

            elenst Elena Stepanova added a comment - - edited

            Reproducible reliably with this injection:

            diff --git a/storage/innobase/buf/buf0dump.cc b/storage/innobase/buf/buf0dump.cc
            index 945a154..0cd4cf2 100644
            --- a/storage/innobase/buf/buf0dump.cc
            +++ b/storage/innobase/buf/buf0dump.cc
            @@ -813,6 +813,7 @@ DECLARE_THREAD(buf_dump_thread)(void*)
             #ifdef WITH_WSREP
                            if (!wsrep_recovery) {
             #endif /* WITH_WSREP */
            +                       my_sleep(3000000);
                                    buf_load();
             #ifdef WITH_WSREP
                            }
            

            $ perl ./mtr  sys_vars.innodb_buffer_pool_load_now_basic  
            ...
            sys_vars.innodb_buffer_pool_load_now_basic 'innodb' [ fail ]
                    Test ended at 2017-11-03 20:35:35
             
            CURRENT_TEST: sys_vars.innodb_buffer_pool_load_now_basic
            --- /data/src/10.2-bug/mysql-test/suite/sys_vars/r/innodb_buffer_pool_load_now_basic.result	2017-08-28 14:54:48.487645987 +0300
            +++ /data/src/10.2-bug/mysql-test/suite/sys_vars/r/innodb_buffer_pool_load_now_basic.reject	2017-11-03 20:35:35.006826206 +0200
            @@ -4,6 +4,17 @@
             0
             SET GLOBAL innodb_buffer_pool_dump_now = ON;
             SET GLOBAL innodb_buffer_pool_load_now = ON;
            +Timeout in wait_condition.inc for SELECT variable_value != 'Buffer pool(s) load completed at 171103 20:35:04' &&
            +SUBSTR(variable_value, 1, 33) = 'Buffer pool(s) load completed at '
            +  FROM information_schema.global_status
            +WHERE LOWER(variable_name) = 'innodb_buffer_pool_load_status'
            +Id	User	Host	db	Command	Time	State	Info	Progress
            +1	system user		NULL	Daemon	NULL	InnoDB purge coordinator	NULL	0.000
            +3	system user		NULL	Daemon	NULL	InnoDB purge worker	NULL	0.000
            +2	system user		NULL	Daemon	NULL	InnoDB purge worker	NULL	0.000
            +4	system user		NULL	Daemon	NULL	InnoDB purge worker	NULL	0.000
            +5	system user		NULL	Daemon	NULL	InnoDB shutdown handler	NULL	0.000
            +9	root	localhost	test	Query	0	init	show full processlist	0.000
             SELECT variable_value
             FROM information_schema.global_status
             WHERE LOWER(variable_name) = 'innodb_buffer_pool_load_status';
             
            mysqltest: Result length mismatch
            

            So, the failure happens because of a delayed buffer pool load on server startup.

            The problem was introduced by this commit in 10.2:

            commit 14b1c8c80d89ab54cadedf94fa7604b8f917aecd
            Author: Monty
            Date:   Wed Oct 5 01:09:52 2016 +0300
             
                After merge and bug fixes
                - Fixed compiler warnings
                - Removed have_debug.inc from innochecksum_3
                - Fixed race condition in innodb_buffer_pool_load
                ...
            

            particularly, but this change:

            --- a/mysql-test/suite/sys_vars/t/innodb_buffer_pool_load_now_basic.test
            +++ b/mysql-test/suite/sys_vars/t/innodb_buffer_pool_load_now_basic.test
            @@ -20,8 +20,10 @@ let $old_status= `SELECT variable_value FROM information_schema.global_status
             # let $wait_condition =
             #  SELECT TRIM(SUBSTR('$old_status', -8)) != DATE_FORMAT(CURTIME(), '%k:%i:%s');
             # -- source include/wait_condition.inc
            -if (`SELECT variable_value LIKE '%dump completed at%' FROM information_schema.global_status
            -     WHERE LOWER(variable_name) = 'innodb_buffer_pool_dump_status'`)
            +if (`SELECT count(*) > 0 FROM information_schema.global_status
            +     WHERE (LOWER(variable_name) = 'innodb_buffer_pool_dump_status' or
            +            LOWER(variable_name) = 'innodb_buffer_pool_load_status')
            +            and variable_value LIKE '%completed at%'`)
             {
               -- sleep 2
             }
            @@ -42,12 +44,17 @@ let $wait_condition =
             -- let $file = `SELECT CONCAT(@@datadir, @@global.innodb_buffer_pool_filename)`
             -- file_exists $file
             
            +let $old_load_status=
            +   `SELECT variable_value FROM information_schema.global_status
            +    WHERE LOWER(variable_name) = 'innodb_buffer_pool_load_status'`;
            +
             # Load the dump
             SET GLOBAL innodb_buffer_pool_load_now = ON;
             
             # Wait for the load to complete
             let $wait_condition =
            -  SELECT SUBSTR(variable_value, 1, 33) = 'Buffer pool(s) load completed at '
            +  SELECT variable_value != '$old_load_status' &&
            +         SUBSTR(variable_value, 1, 33) = 'Buffer pool(s) load completed at '
               FROM information_schema.global_status
               WHERE LOWER(variable_name) = 'innodb_buffer_pool_load_status';
             --disable_warnings
            

            The change apparently attempted to solve this failure:

            @@ -8,4 +8,4 @@
             FROM information_schema.global_status
             WHERE LOWER(variable_name) = 'innodb_buffer_pool_load_status';
             variable_value
            -Buffer pool(s) load completed at TIMESTAMP_NOW
            +Loaded 1/85 pages
             
            mysqltest: Result length mismatch
            

            but it's not accurate.

            Here is what happens in the test now:

            1) test starts
            2) test stores the value of innodb_buffer_pool_dump_status variable;
            3) test checks whether innodb_buffer_pool_dump_status or innodb_buffer_pool_load_status contain 'completed';
            4) if they do, the test waits for 2 seconds to make sure the timestamp has changed;
            5) test executes buffer pool dump and checks that it's reflected in innodb_buffer_pool_dump_status – that the value has changed comparing to the old stored value;
            6) test stores the value of innodb_buffer_pool_load_status (it thinks it wouldn't have changed since the beginning of the test, since we didn't do any loads yet);
            7) test executes buffer pool load and checks that it's reflected in innodb_buffer_pool_dump_status – that the value has changed comparing to the old stored value

            If the test is running after server restart without innodb_buffer_pool_load_at_startup, then at step (3) neither of the variables would contain "completed", as neither a dump nor a load has been done yet, so the test proceeds immediately, and in both further checks, at steps (5) and (7), the value necessarily changes comparing to the original ones.

            If the test runs after some other test which executed either a dump or a load, or if the load has already happened on server startup, then on step 4 it waits for 2 seconds, and then at further checks at steps (5) and (7) the values will also change, because they'll contain different timestamps.

            The problem happens if the load is delayed, as imitated by the injection above. In this case, when the test checks values at step (3), they still don't contain any "completed", so it proceeds. And after that, it (wrongly) assumes that the values cannot change any longer, except for when we expect it. However, by step 6, the delayed load on startup finishes, the value gets the current timestamp, and thus when the test executes step (7), the new value gets the same timestamp as the previous one, and the check fails.

            The solution could be to split the check on step (3), to check the dump value before doing the dump, and load value before doing the load, not both at once. There can still be a tiny window for a race condition, but the accuracy of the checks is limited anyway, because of MDEV-9867.

            elenst Elena Stepanova added a comment - - edited Reproducible reliably with this injection: diff --git a/storage/innobase/buf/buf0dump.cc b/storage/innobase/buf/buf0dump.cc index 945a154..0cd4cf2 100644 --- a/storage/innobase/buf/buf0dump.cc +++ b/storage/innobase/buf/buf0dump.cc @@ -813,6 +813,7 @@ DECLARE_THREAD(buf_dump_thread)(void*) #ifdef WITH_WSREP if (!wsrep_recovery) { #endif /* WITH_WSREP */ + my_sleep(3000000); buf_load(); #ifdef WITH_WSREP } $ perl ./mtr sys_vars.innodb_buffer_pool_load_now_basic ... sys_vars.innodb_buffer_pool_load_now_basic 'innodb' [ fail ] Test ended at 2017-11-03 20:35:35   CURRENT_TEST: sys_vars.innodb_buffer_pool_load_now_basic --- /data/src/10.2-bug/mysql-test/suite/sys_vars/r/innodb_buffer_pool_load_now_basic.result 2017-08-28 14:54:48.487645987 +0300 +++ /data/src/10.2-bug/mysql-test/suite/sys_vars/r/innodb_buffer_pool_load_now_basic.reject 2017-11-03 20:35:35.006826206 +0200 @@ -4,6 +4,17 @@ 0 SET GLOBAL innodb_buffer_pool_dump_now = ON; SET GLOBAL innodb_buffer_pool_load_now = ON; +Timeout in wait_condition.inc for SELECT variable_value != 'Buffer pool(s) load completed at 171103 20:35:04' && +SUBSTR(variable_value, 1, 33) = 'Buffer pool(s) load completed at ' + FROM information_schema.global_status +WHERE LOWER(variable_name) = 'innodb_buffer_pool_load_status' +Id User Host db Command Time State Info Progress +1 system user NULL Daemon NULL InnoDB purge coordinator NULL 0.000 +3 system user NULL Daemon NULL InnoDB purge worker NULL 0.000 +2 system user NULL Daemon NULL InnoDB purge worker NULL 0.000 +4 system user NULL Daemon NULL InnoDB purge worker NULL 0.000 +5 system user NULL Daemon NULL InnoDB shutdown handler NULL 0.000 +9 root localhost test Query 0 init show full processlist 0.000 SELECT variable_value FROM information_schema.global_status WHERE LOWER(variable_name) = 'innodb_buffer_pool_load_status';   mysqltest: Result length mismatch So, the failure happens because of a delayed buffer pool load on server startup. The problem was introduced by this commit in 10.2: commit 14b1c8c80d89ab54cadedf94fa7604b8f917aecd Author: Monty Date: Wed Oct 5 01:09:52 2016 +0300   After merge and bug fixes - Fixed compiler warnings - Removed have_debug.inc from innochecksum_3 - Fixed race condition in innodb_buffer_pool_load ... particularly, but this change: --- a/mysql-test/suite/sys_vars/t/innodb_buffer_pool_load_now_basic.test +++ b/mysql-test/suite/sys_vars/t/innodb_buffer_pool_load_now_basic.test @@ -20,8 +20,10 @@ let $old_status= `SELECT variable_value FROM information_schema.global_status # let $wait_condition = # SELECT TRIM(SUBSTR('$old_status', -8)) != DATE_FORMAT(CURTIME(), '%k:%i:%s'); # -- source include/wait_condition.inc -if (`SELECT variable_value LIKE '%dump completed at%' FROM information_schema.global_status - WHERE LOWER(variable_name) = 'innodb_buffer_pool_dump_status'`) +if (`SELECT count(*) > 0 FROM information_schema.global_status + WHERE (LOWER(variable_name) = 'innodb_buffer_pool_dump_status' or + LOWER(variable_name) = 'innodb_buffer_pool_load_status') + and variable_value LIKE '%completed at%'`) { -- sleep 2 } @@ -42,12 +44,17 @@ let $wait_condition = -- let $file = `SELECT CONCAT(@@datadir, @@global.innodb_buffer_pool_filename)` -- file_exists $file +let $old_load_status= + `SELECT variable_value FROM information_schema.global_status + WHERE LOWER(variable_name) = 'innodb_buffer_pool_load_status'`; + # Load the dump SET GLOBAL innodb_buffer_pool_load_now = ON; # Wait for the load to complete let $wait_condition = - SELECT SUBSTR(variable_value, 1, 33) = 'Buffer pool(s) load completed at ' + SELECT variable_value != '$old_load_status' && + SUBSTR(variable_value, 1, 33) = 'Buffer pool(s) load completed at ' FROM information_schema.global_status WHERE LOWER(variable_name) = 'innodb_buffer_pool_load_status'; --disable_warnings The change apparently attempted to solve this failure: @@ -8,4 +8,4 @@ FROM information_schema.global_status WHERE LOWER(variable_name) = 'innodb_buffer_pool_load_status'; variable_value -Buffer pool(s) load completed at TIMESTAMP_NOW +Loaded 1/85 pages   mysqltest: Result length mismatch but it's not accurate. Here is what happens in the test now: 1) test starts 2) test stores the value of innodb_buffer_pool_dump_status variable; 3) test checks whether innodb_buffer_pool_dump_status or innodb_buffer_pool_load_status contain 'completed' ; 4) if they do, the test waits for 2 seconds to make sure the timestamp has changed; 5) test executes buffer pool dump and checks that it's reflected in innodb_buffer_pool_dump_status – that the value has changed comparing to the old stored value; 6) test stores the value of innodb_buffer_pool_load_status (it thinks it wouldn't have changed since the beginning of the test, since we didn't do any loads yet); 7) test executes buffer pool load and checks that it's reflected in innodb_buffer_pool_dump_status – that the value has changed comparing to the old stored value If the test is running after server restart without innodb_buffer_pool_load_at_startup , then at step (3) neither of the variables would contain "completed", as neither a dump nor a load has been done yet, so the test proceeds immediately, and in both further checks, at steps (5) and (7), the value necessarily changes comparing to the original ones. If the test runs after some other test which executed either a dump or a load, or if the load has already happened on server startup, then on step 4 it waits for 2 seconds, and then at further checks at steps (5) and (7) the values will also change, because they'll contain different timestamps. The problem happens if the load is delayed, as imitated by the injection above. In this case, when the test checks values at step (3), they still don't contain any "completed", so it proceeds. And after that, it (wrongly) assumes that the values cannot change any longer, except for when we expect it. However, by step 6, the delayed load on startup finishes, the value gets the current timestamp, and thus when the test executes step (7), the new value gets the same timestamp as the previous one, and the check fails. The solution could be to split the check on step (3), to check the dump value before doing the dump, and load value before doing the load, not both at once. There can still be a tiny window for a race condition, but the accuracy of the checks is limited anyway, because of MDEV-9867 .

            People

              elenst Elena Stepanova
              elenst Elena Stepanova
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.