Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-10674

main.show_explain failed in buildbot

Details

    Description

      http://buildbot.askmonty.org/buildbot/builders/bld-dan-debug/builds/3241/steps/test_1/logs/stdio
      (test run with ps-protocol)

      main.show_explain 'xtradb'               w1 [ fail ]
              Test ended at 2016-05-08 15:26:44
       
      CURRENT_TEST: main.show_explain
      mysqltest: At line 880: query 'reap' succeeded - should have failed with errno 1317...
       
      The result from queries just before the failure was:
      < snip >
      11	data1
      12	data1
      13	data1
      14	data1
      15	data1
      16	data1
      17	data1
      18	data1
      19	data1
      20	data1
      set autocommit=0;
      select * from t1 where pk between 10 and 20 for update;
      # do: send_eval show explain for thr2;
      Timeout in wait_condition.inc for select State='show explain' from information_schema.processlist where id=33
      Id	User	Host	db	Command	Time	State	Info	Progress
      33	root	localhost	test	Sleep	30		NULL	0.000
      34	root	localhost	test	Query	30	Sending data	select * from t1 where pk between 10 and 20 for update	0.000
      38	root	localhost	test	Sleep	30		NULL	0.000
      39	root	localhost	test	Query	0	init	show full processlist	0.000
      kill query $thr_default;
      

      Attachments

        Issue Links

          Activity

            I was able to reproduce once

            CURRENT_TEST: main.show_explain15
            mysqltest: At line 880: query 'reap' succeeded - should have failed with errno 1317...
            

            --- good-execution.log
            +++ bad-execution.log
            @@ -966,82 +966,16 @@ 
             connection default;
             # do: send_eval show explain for thr2;
             connection con3;
            +Timeout in wait_condition.inc for select State='show explain' from information_schema.processlist where id=139
            +Id     User    Host    db      Command Time    State   Info    Progress
            +1      system user             NULL    Daemon  NULL    InnoDB purge worker     NULL    0.000
            +2      system user             NULL    Daemon  NULL    InnoDB purge worker     NULL    0.000
            +3      system user             NULL    Daemon  NULL    InnoDB purge coordinator        NULL    0.000
            +4      system user             NULL    Daemon  NULL    InnoDB purge worker     NULL    0.000
            +5      system user             NULL    Daemon  NULL    InnoDB shutdown handler NULL    0.000
            +139    root    localhost       test    Sleep   30              NULL    0.000
            +140    root    localhost       test    Query   30      Sending data    select * from t1 where pk between 10 and 20 for update  0.000
            +144    root    localhost       test    Sleep   30              NULL    0.000
            +145    root    localhost       test    Query   0       Init    show full processlist   0.000
             kill query $thr_default;
             connection default;
            -ERROR 70100: Query execution was interrupted
            -connection con2;
            

            The test was expecting to find the SHOW EXPLAIN query to be stuck waiting to get explain on a thread that's stuck in innodb lock wait.

            The query that's stuck in innodb lock wait is there (id=140). The SHOW EXPLAIN has managed to succeed somehow.

            psergei Sergei Petrunia added a comment - I was able to reproduce once CURRENT_TEST: main.show_explain15 mysqltest: At line 880: query 'reap' succeeded - should have failed with errno 1317... --- good-execution.log +++ bad-execution.log @@ -966,82 +966,16 @@ connection default; # do: send_eval show explain for thr2; connection con3; +Timeout in wait_condition.inc for select State='show explain' from information_schema.processlist where id=139 +Id User Host db Command Time State Info Progress +1 system user NULL Daemon NULL InnoDB purge worker NULL 0.000 +2 system user NULL Daemon NULL InnoDB purge worker NULL 0.000 +3 system user NULL Daemon NULL InnoDB purge coordinator NULL 0.000 +4 system user NULL Daemon NULL InnoDB purge worker NULL 0.000 +5 system user NULL Daemon NULL InnoDB shutdown handler NULL 0.000 +139 root localhost test Sleep 30 NULL 0.000 +140 root localhost test Query 30 Sending data select * from t1 where pk between 10 and 20 for update 0.000 +144 root localhost test Sleep 30 NULL 0.000 +145 root localhost test Query 0 Init show full processlist 0.000 kill query $thr_default; connection default; -ERROR 70100: Query execution was interrupted -connection con2; The test was expecting to find the SHOW EXPLAIN query to be stuck waiting to get explain on a thread that's stuck in innodb lock wait. The query that's stuck in innodb lock wait is there (id=140). The SHOW EXPLAIN has managed to succeed somehow.

            Could something like this happen if InnoDB lock wait experiences a spurious wakeup (caused by either pthreads or some other reason)? The wakeup will process APCs, including the APC to produce EXPLAIN output...

            psergei Sergei Petrunia added a comment - Could something like this happen if InnoDB lock wait experiences a spurious wakeup (caused by either pthreads or some other reason)? The wakeup will process APCs, including the APC to produce EXPLAIN output...

            ...no, the cause seems to be the following issue in the test. Look at these lines in the test:

            # This will freeze
            send
            select * from t1 where pk between 10 and 20 for update;
            

            # run SHOW EXPLAIN on a frozen thread
            connection default;
            let $save_wait_condition= $wait_condition;
            let $wait_condition= select State='Sending data' from information_schema.processlist where id=$thr2;
            --source include/wait_condition.inc
            

            Here, the test expects that the above for update query is blocked inside
            InnoDB, waiting for a lock.
            However, let's suppose it hasn't reached the lock acquistion yet and is still
            at the start of do_select() (which JOIN::exec_inner calls right after
            calling THD_STAGE_INFO(thd, stage_sending_data) call)

            Then the SHOW EXPLAIN call

            --echo # do: send_eval show explain for thr2;
            --disable_query_log
            send_eval show explain for $thr2;
            --enable_query_log
            

            has a chance to finish: the SELECT...FOR UPDATE query will call
            thd->apc_target.process_apcs() before it is stuck waiting on an innodb lock.

            psergei Sergei Petrunia added a comment - ...no, the cause seems to be the following issue in the test. Look at these lines in the test: # This will freeze send select * from t1 where pk between 10 and 20 for update; # run SHOW EXPLAIN on a frozen thread connection default; let $save_wait_condition= $wait_condition; let $wait_condition= select State='Sending data' from information_schema.processlist where id=$thr2; --source include/wait_condition.inc Here, the test expects that the above for update query is blocked inside InnoDB, waiting for a lock. However, let's suppose it hasn't reached the lock acquistion yet and is still at the start of do_select() (which JOIN::exec_inner calls right after calling THD_STAGE_INFO(thd, stage_sending_data) call) Then the SHOW EXPLAIN call --echo # do: send_eval show explain for thr2; --disable_query_log send_eval show explain for $thr2; --enable_query_log has a chance to finish: the SELECT...FOR UPDATE query will call thd->apc_target.process_apcs() before it is stuck waiting on an innodb lock.
            psergei Sergei Petrunia added a comment - - edited

            Candidate patch: http://lists.askmonty.org/pipermail/commits/2021-May/014592.html . I would like to get a review for this from somebody in InnoDB team. marko could you please take a look?

            psergei Sergei Petrunia added a comment - - edited Candidate patch: http://lists.askmonty.org/pipermail/commits/2021-May/014592.html . I would like to get a review for this from somebody in InnoDB team. marko could you please take a look?

            The patch looks otherwise OK, but it forgot to load the plugins:

            At line 43: query 'let $success= `$wait_condition`' failed with wrong errno 1109: 'Unknown table 'INNODB_LOCK_WAITS' in information_schema', instead of 0...
            

            I pushed an amended version of the patch to bb-10.2-release.

            marko Marko Mäkelä added a comment - The patch looks otherwise OK, but it forgot to load the plugins: At line 43: query 'let $success= `$wait_condition`' failed with wrong errno 1109: 'Unknown table 'INNODB_LOCK_WAITS' in information_schema', instead of 0... I pushed an amended version of the patch to bb-10.2-release.

            People

              marko Marko Mäkelä
              elenst Elena Stepanova
              Votes:
              1 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.