Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-10674

main.show_explain failed in buildbot

Details

    Description

      http://buildbot.askmonty.org/buildbot/builders/bld-dan-debug/builds/3241/steps/test_1/logs/stdio
      (test run with ps-protocol)

      main.show_explain 'xtradb'               w1 [ fail ]
              Test ended at 2016-05-08 15:26:44
       
      CURRENT_TEST: main.show_explain
      mysqltest: At line 880: query 'reap' succeeded - should have failed with errno 1317...
       
      The result from queries just before the failure was:
      < snip >
      11	data1
      12	data1
      13	data1
      14	data1
      15	data1
      16	data1
      17	data1
      18	data1
      19	data1
      20	data1
      set autocommit=0;
      select * from t1 where pk between 10 and 20 for update;
      # do: send_eval show explain for thr2;
      Timeout in wait_condition.inc for select State='show explain' from information_schema.processlist where id=33
      Id	User	Host	db	Command	Time	State	Info	Progress
      33	root	localhost	test	Sleep	30		NULL	0.000
      34	root	localhost	test	Query	30	Sending data	select * from t1 where pk between 10 and 20 for update	0.000
      38	root	localhost	test	Sleep	30		NULL	0.000
      39	root	localhost	test	Query	0	init	show full processlist	0.000
      kill query $thr_default;
      

      Attachments

        Issue Links

          Activity

            10.0 is also affected:
            http://buildbot.askmonty.org/buildbot/builders/win32-debug/builds/2297/steps/test/logs/stdio

            main.show_explain 'xtradb'               w2 [ fail ]
                    Test ended at 2017-01-17 21:58:18
             
            CURRENT_TEST: main.show_explain
            mysqltest: At line 879: query 'reap' succeeded - should have failed with errno 1317...
             
            The result from queries just before the failure was:
            < snip >
            11	data1
            12	data1
            13	data1
            14	data1
            15	data1
            16	data1
            17	data1
            18	data1
            19	data1
            20	data1
            set autocommit=0;
            select * from t1 where pk between 10 and 20 for update;
            # do: send_eval show explain for thr2;
            Timeout in wait_condition.inc for select State='show explain' from information_schema.processlist where id=33
            Id	User	Host	db	Command	Time	State	Info	Progress
            33	root	localhost:61791	test	Sleep	39		NULL	0.000
            34	root	localhost:61795	test	Query	39	Sending data	select * from t1 where pk between 10 and 20 for update	0.000
            38	root	localhost:61984	test	Sleep	39		NULL	0.000
            39	root	localhost:61985	test	Query	0	init	show full processlist	0.000
            kill query $thr_default;
            

            elenst Elena Stepanova added a comment - 10.0 is also affected: http://buildbot.askmonty.org/buildbot/builders/win32-debug/builds/2297/steps/test/logs/stdio main.show_explain 'xtradb' w2 [ fail ] Test ended at 2017-01-17 21:58:18   CURRENT_TEST: main.show_explain mysqltest: At line 879: query 'reap' succeeded - should have failed with errno 1317...   The result from queries just before the failure was: < snip > 11 data1 12 data1 13 data1 14 data1 15 data1 16 data1 17 data1 18 data1 19 data1 20 data1 set autocommit=0; select * from t1 where pk between 10 and 20 for update; # do: send_eval show explain for thr2; Timeout in wait_condition.inc for select State='show explain' from information_schema.processlist where id=33 Id User Host db Command Time State Info Progress 33 root localhost:61791 test Sleep 39 NULL 0.000 34 root localhost:61795 test Query 39 Sending data select * from t1 where pk between 10 and 20 for update 0.000 38 root localhost:61984 test Sleep 39 NULL 0.000 39 root localhost:61985 test Query 0 init show full processlist 0.000 kill query $thr_default;
            alice Alice Sherepa added a comment -

            on 10.3 http://buildbot.askmonty.org/buildbot/builders/p8-rhel7-bintar-debug/builds/3129/steps/test/logs/stdio

            main.show_explain 'innodb'               w3 [ fail ]
                    Test ended at 2017-11-18 23:18:59
             
            CURRENT_TEST: main.show_explain
            mysqltest: At line 880: query 'reap' succeeded - should have failed with errno 1317...
             
            The result from queries just before the failure was:
            < snip >
            20	data1
            connection con1;
            set autocommit=0;
            select * from t1 where pk between 10 and 20 for update;
            connection default;
            # do: send_eval show explain for thr2;
            connection con3;
            Timeout in wait_condition.inc for select State='show explain' from information_schema.processlist where id=51
            Id	User	Host	db	Command	Time	State	Info	Progress
            1	system user		NULL	Daemon	NULL	InnoDB purge coordinator	NULL	0.000
            2	system user		NULL	Daemon	NULL	InnoDB purge worker	NULL	0.000
            3	system user		NULL	Daemon	NULL	InnoDB purge worker	NULL	0.000
            4	system user		NULL	Daemon	NULL	InnoDB purge worker	NULL	0.000
            5	system user		NULL	Daemon	NULL	InnoDB shutdown handler	NULL	0.000
            51	root	localhost	test	Sleep	31		NULL	0.000
            52	root	localhost	test	Query	31	Sending data	select * from t1 where pk between 10 and 20 for update	0.000
            56	root	localhost	test	Sleep	31		NULL	0.000
            57	root	localhost	test	Query	0	Init	show full processlist	0.000
            kill query $thr_default;
            connection default;
            

            alice Alice Sherepa added a comment - on 10.3 http://buildbot.askmonty.org/buildbot/builders/p8-rhel7-bintar-debug/builds/3129/steps/test/logs/stdio main.show_explain 'innodb' w3 [ fail ] Test ended at 2017-11-18 23:18:59   CURRENT_TEST: main.show_explain mysqltest: At line 880: query 'reap' succeeded - should have failed with errno 1317...   The result from queries just before the failure was: < snip > 20 data1 connection con1; set autocommit=0; select * from t1 where pk between 10 and 20 for update; connection default; # do: send_eval show explain for thr2; connection con3; Timeout in wait_condition.inc for select State='show explain' from information_schema.processlist where id=51 Id User Host db Command Time State Info Progress 1 system user NULL Daemon NULL InnoDB purge coordinator NULL 0.000 2 system user NULL Daemon NULL InnoDB purge worker NULL 0.000 3 system user NULL Daemon NULL InnoDB purge worker NULL 0.000 4 system user NULL Daemon NULL InnoDB purge worker NULL 0.000 5 system user NULL Daemon NULL InnoDB shutdown handler NULL 0.000 51 root localhost test Sleep 31 NULL 0.000 52 root localhost test Query 31 Sending data select * from t1 where pk between 10 and 20 for update 0.000 56 root localhost test Sleep 31 NULL 0.000 57 root localhost test Query 0 Init show full processlist 0.000 kill query $thr_default; connection default;

            I got this locally on 10.3 (very close to mariadb-10.3.18).

            marko Marko Mäkelä added a comment - I got this locally on 10.3 (very close to mariadb-10.3.18).

            This looks like a badly written test to me. See details below. Here is a recent failure from 10.4: http://buildbot.askmonty.org/buildbot/builders/kvm-fulltest2/builds/24152/steps/mtr_nm/logs/stdio

            10.4 2f7b37b02154748b223e385a7d7787900ab37b5e

            CURRENT_TEST: main.show_explain
            mysqltest: At line 882: query 'reap' succeeded - should have failed with errno 1317...
             
            The result from queries just before the failure was:
            < snip >
            20	data1
            connection con1;
            set autocommit=0;
            select * from t1 where pk between 10 and 20 for update;
            connection default;
            # do: send_eval show explain for thr2;
            connection con3;
            Timeout in wait_condition.inc for select State='show explain' from information_schema.processlist where id=234
            Id	User	Host	db	Command	Time	State	Info	Progress
            1	system user		NULL	Daemon	NULL	InnoDB purge coordinator	NULL	0.000
            3	system user		NULL	Daemon	NULL	InnoDB purge worker	NULL	0.000
            4	system user		NULL	Daemon	NULL	InnoDB purge worker	NULL	0.000
            2	system user		NULL	Daemon	NULL	InnoDB purge worker	NULL	0.000
            5	system user		NULL	Daemon	NULL	InnoDB shutdown handler	NULL	0.000
            234	root	localhost	test	Sleep	30		NULL	0.000
            235	root	localhost	test	Query	30	Sending data	select * from t1 where pk between 10 and 20 for update	0.000
            239	root	localhost	test	Sleep	30		NULL	0.000
            240	root	localhost	test	Query	0	Init	show full processlist	0.000
            kill query $thr_default;
            connection default;
             
            More results from queries before failure can be found in /mnt/buildbot/build/mariadb-10.4.15/mysql-test/var/3/log/show_explain.log
            

            Let us look at the changes:

            git show 105e3ae6c93f57498fa6c504dfbb92203b0d1056 -- mysql-test/t/show_explain.test
            

            diff --git a/mysql-test/t/show_explain.test b/mysql-test/t/show_explain.test
            index 91e3be34408..6735a0e851e 100644
            --- a/mysql-test/t/show_explain.test
            +++ b/mysql-test/t/show_explain.test
            @@ -224,7 +224,9 @@ connection default;
             --source include/wait_condition.inc
             --error ER_TARGET_NOT_EXPLAINABLE
             evalp show explain for $thr2;
            +evalp kill query $thr2;
             connection con1;
            +--error ER_QUERY_INTERRUPTED
             reap;
             set debug_dbug=@old_debug;
             
            @@ -399,7 +401,9 @@ connection default;
             --source include/wait_condition.inc
             --error ER_TARGET_NOT_EXPLAINABLE
             evalp show explain for $thr2;
            +evalp kill query $thr2;
             connection con1;
            +--error ER_QUERY_INTERRUPTED
             reap;
             set debug_dbug=@old_debug;
             DROP VIEW v1;
            

            The test is using DBUG, but for some reason, it is missing proper DEBUG_SYNC instrumentation to ensure that the KILL QUERY will be processed before the statement that was submitted to execution earlier was completed.

            Actually, the above mentioned commit is merely duplicating earlier use of such pattern. The error at line 882 was introduced even earlier:

            commit 3e90dc1f77dc3fa51d542bf82a336753310f7776
            Author: Sergey Petrunya <psergey@askmonty.org>
            Date:   Thu Jul 5 22:04:13 2012 +0400
             
                MWL#182: Explain running statements
                - Make SHOW EXPLAIN command be KILLable with KILL QUERY.
            

            marko Marko Mäkelä added a comment - This looks like a badly written test to me. See details below. Here is a recent failure from 10.4: http://buildbot.askmonty.org/buildbot/builders/kvm-fulltest2/builds/24152/steps/mtr_nm/logs/stdio 10.4 2f7b37b02154748b223e385a7d7787900ab37b5e CURRENT_TEST: main.show_explain mysqltest: At line 882: query 'reap' succeeded - should have failed with errno 1317...   The result from queries just before the failure was: < snip > 20 data1 connection con1; set autocommit=0; select * from t1 where pk between 10 and 20 for update; connection default; # do: send_eval show explain for thr2; connection con3; Timeout in wait_condition.inc for select State='show explain' from information_schema.processlist where id=234 Id User Host db Command Time State Info Progress 1 system user NULL Daemon NULL InnoDB purge coordinator NULL 0.000 3 system user NULL Daemon NULL InnoDB purge worker NULL 0.000 4 system user NULL Daemon NULL InnoDB purge worker NULL 0.000 2 system user NULL Daemon NULL InnoDB purge worker NULL 0.000 5 system user NULL Daemon NULL InnoDB shutdown handler NULL 0.000 234 root localhost test Sleep 30 NULL 0.000 235 root localhost test Query 30 Sending data select * from t1 where pk between 10 and 20 for update 0.000 239 root localhost test Sleep 30 NULL 0.000 240 root localhost test Query 0 Init show full processlist 0.000 kill query $thr_default; connection default;   More results from queries before failure can be found in /mnt/buildbot/build/mariadb-10.4.15/mysql-test/var/3/log/show_explain.log Let us look at the changes: git show 105e3ae6c93f57498fa6c504dfbb92203b0d1056 -- mysql-test/t/show_explain.test diff --git a/mysql-test/t/show_explain.test b/mysql-test/t/show_explain.test index 91e3be34408..6735a0e851e 100644 --- a/mysql-test/t/show_explain.test +++ b/mysql-test/t/show_explain.test @@ -224,7 +224,9 @@ connection default; --source include/wait_condition.inc --error ER_TARGET_NOT_EXPLAINABLE evalp show explain for $thr2; +evalp kill query $thr2; connection con1; +--error ER_QUERY_INTERRUPTED reap; set debug_dbug=@old_debug; @@ -399,7 +401,9 @@ connection default; --source include/wait_condition.inc --error ER_TARGET_NOT_EXPLAINABLE evalp show explain for $thr2; +evalp kill query $thr2; connection con1; +--error ER_QUERY_INTERRUPTED reap; set debug_dbug=@old_debug; DROP VIEW v1; The test is using DBUG , but for some reason, it is missing proper DEBUG_SYNC instrumentation to ensure that the KILL QUERY will be processed before the statement that was submitted to execution earlier was completed. Actually, the above mentioned commit is merely duplicating earlier use of such pattern. The error at line 882 was introduced even earlier: commit 3e90dc1f77dc3fa51d542bf82a336753310f7776 Author: Sergey Petrunya <psergey@askmonty.org> Date: Thu Jul 5 22:04:13 2012 +0400   MWL#182: Explain running statements - Make SHOW EXPLAIN command be KILLable with KILL QUERY.

            I made this a blocker as this affects protected branches and thus can stop people from pushing

            monty Michael Widenius added a comment - I made this a blocker as this affects protected branches and thus can stop people from pushing

            I was able to reproduce once

            CURRENT_TEST: main.show_explain15
            mysqltest: At line 880: query 'reap' succeeded - should have failed with errno 1317...
            

            --- good-execution.log
            +++ bad-execution.log
            @@ -966,82 +966,16 @@ 
             connection default;
             # do: send_eval show explain for thr2;
             connection con3;
            +Timeout in wait_condition.inc for select State='show explain' from information_schema.processlist where id=139
            +Id     User    Host    db      Command Time    State   Info    Progress
            +1      system user             NULL    Daemon  NULL    InnoDB purge worker     NULL    0.000
            +2      system user             NULL    Daemon  NULL    InnoDB purge worker     NULL    0.000
            +3      system user             NULL    Daemon  NULL    InnoDB purge coordinator        NULL    0.000
            +4      system user             NULL    Daemon  NULL    InnoDB purge worker     NULL    0.000
            +5      system user             NULL    Daemon  NULL    InnoDB shutdown handler NULL    0.000
            +139    root    localhost       test    Sleep   30              NULL    0.000
            +140    root    localhost       test    Query   30      Sending data    select * from t1 where pk between 10 and 20 for update  0.000
            +144    root    localhost       test    Sleep   30              NULL    0.000
            +145    root    localhost       test    Query   0       Init    show full processlist   0.000
             kill query $thr_default;
             connection default;
            -ERROR 70100: Query execution was interrupted
            -connection con2;
            

            The test was expecting to find the SHOW EXPLAIN query to be stuck waiting to get explain on a thread that's stuck in innodb lock wait.

            The query that's stuck in innodb lock wait is there (id=140). The SHOW EXPLAIN has managed to succeed somehow.

            psergei Sergei Petrunia added a comment - I was able to reproduce once CURRENT_TEST: main.show_explain15 mysqltest: At line 880: query 'reap' succeeded - should have failed with errno 1317... --- good-execution.log +++ bad-execution.log @@ -966,82 +966,16 @@ connection default; # do: send_eval show explain for thr2; connection con3; +Timeout in wait_condition.inc for select State='show explain' from information_schema.processlist where id=139 +Id User Host db Command Time State Info Progress +1 system user NULL Daemon NULL InnoDB purge worker NULL 0.000 +2 system user NULL Daemon NULL InnoDB purge worker NULL 0.000 +3 system user NULL Daemon NULL InnoDB purge coordinator NULL 0.000 +4 system user NULL Daemon NULL InnoDB purge worker NULL 0.000 +5 system user NULL Daemon NULL InnoDB shutdown handler NULL 0.000 +139 root localhost test Sleep 30 NULL 0.000 +140 root localhost test Query 30 Sending data select * from t1 where pk between 10 and 20 for update 0.000 +144 root localhost test Sleep 30 NULL 0.000 +145 root localhost test Query 0 Init show full processlist 0.000 kill query $thr_default; connection default; -ERROR 70100: Query execution was interrupted -connection con2; The test was expecting to find the SHOW EXPLAIN query to be stuck waiting to get explain on a thread that's stuck in innodb lock wait. The query that's stuck in innodb lock wait is there (id=140). The SHOW EXPLAIN has managed to succeed somehow.

            Could something like this happen if InnoDB lock wait experiences a spurious wakeup (caused by either pthreads or some other reason)? The wakeup will process APCs, including the APC to produce EXPLAIN output...

            psergei Sergei Petrunia added a comment - Could something like this happen if InnoDB lock wait experiences a spurious wakeup (caused by either pthreads or some other reason)? The wakeup will process APCs, including the APC to produce EXPLAIN output...

            ...no, the cause seems to be the following issue in the test. Look at these lines in the test:

            # This will freeze
            send
            select * from t1 where pk between 10 and 20 for update;
            

            # run SHOW EXPLAIN on a frozen thread
            connection default;
            let $save_wait_condition= $wait_condition;
            let $wait_condition= select State='Sending data' from information_schema.processlist where id=$thr2;
            --source include/wait_condition.inc
            

            Here, the test expects that the above for update query is blocked inside
            InnoDB, waiting for a lock.
            However, let's suppose it hasn't reached the lock acquistion yet and is still
            at the start of do_select() (which JOIN::exec_inner calls right after
            calling THD_STAGE_INFO(thd, stage_sending_data) call)

            Then the SHOW EXPLAIN call

            --echo # do: send_eval show explain for thr2;
            --disable_query_log
            send_eval show explain for $thr2;
            --enable_query_log
            

            has a chance to finish: the SELECT...FOR UPDATE query will call
            thd->apc_target.process_apcs() before it is stuck waiting on an innodb lock.

            psergei Sergei Petrunia added a comment - ...no, the cause seems to be the following issue in the test. Look at these lines in the test: # This will freeze send select * from t1 where pk between 10 and 20 for update; # run SHOW EXPLAIN on a frozen thread connection default; let $save_wait_condition= $wait_condition; let $wait_condition= select State='Sending data' from information_schema.processlist where id=$thr2; --source include/wait_condition.inc Here, the test expects that the above for update query is blocked inside InnoDB, waiting for a lock. However, let's suppose it hasn't reached the lock acquistion yet and is still at the start of do_select() (which JOIN::exec_inner calls right after calling THD_STAGE_INFO(thd, stage_sending_data) call) Then the SHOW EXPLAIN call --echo # do: send_eval show explain for thr2; --disable_query_log send_eval show explain for $thr2; --enable_query_log has a chance to finish: the SELECT...FOR UPDATE query will call thd->apc_target.process_apcs() before it is stuck waiting on an innodb lock.
            psergei Sergei Petrunia added a comment - - edited

            Candidate patch: http://lists.askmonty.org/pipermail/commits/2021-May/014592.html . I would like to get a review for this from somebody in InnoDB team. marko could you please take a look?

            psergei Sergei Petrunia added a comment - - edited Candidate patch: http://lists.askmonty.org/pipermail/commits/2021-May/014592.html . I would like to get a review for this from somebody in InnoDB team. marko could you please take a look?

            The patch looks otherwise OK, but it forgot to load the plugins:

            At line 43: query 'let $success= `$wait_condition`' failed with wrong errno 1109: 'Unknown table 'INNODB_LOCK_WAITS' in information_schema', instead of 0...
            

            I pushed an amended version of the patch to bb-10.2-release.

            marko Marko Mäkelä added a comment - The patch looks otherwise OK, but it forgot to load the plugins: At line 43: query 'let $success= `$wait_condition`' failed with wrong errno 1109: 'Unknown table 'INNODB_LOCK_WAITS' in information_schema', instead of 0... I pushed an amended version of the patch to bb-10.2-release.

            People

              marko Marko Mäkelä
              elenst Elena Stepanova
              Votes:
              1 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.