|
10.0 is also affected:
http://buildbot.askmonty.org/buildbot/builders/win32-debug/builds/2297/steps/test/logs/stdio
main.show_explain 'xtradb' w2 [ fail ]
|
Test ended at 2017-01-17 21:58:18
|
|
CURRENT_TEST: main.show_explain
|
mysqltest: At line 879: query 'reap' succeeded - should have failed with errno 1317...
|
|
The result from queries just before the failure was:
|
< snip >
|
11 data1
|
12 data1
|
13 data1
|
14 data1
|
15 data1
|
16 data1
|
17 data1
|
18 data1
|
19 data1
|
20 data1
|
set autocommit=0;
|
select * from t1 where pk between 10 and 20 for update;
|
# do: send_eval show explain for thr2;
|
Timeout in wait_condition.inc for select State='show explain' from information_schema.processlist where id=33
|
Id User Host db Command Time State Info Progress
|
33 root localhost:61791 test Sleep 39 NULL 0.000
|
34 root localhost:61795 test Query 39 Sending data select * from t1 where pk between 10 and 20 for update 0.000
|
38 root localhost:61984 test Sleep 39 NULL 0.000
|
39 root localhost:61985 test Query 0 init show full processlist 0.000
|
kill query $thr_default;
|
|
|
on 10.3 http://buildbot.askmonty.org/buildbot/builders/p8-rhel7-bintar-debug/builds/3129/steps/test/logs/stdio
main.show_explain 'innodb' w3 [ fail ]
|
Test ended at 2017-11-18 23:18:59
|
|
CURRENT_TEST: main.show_explain
|
mysqltest: At line 880: query 'reap' succeeded - should have failed with errno 1317...
|
|
The result from queries just before the failure was:
|
< snip >
|
20 data1
|
connection con1;
|
set autocommit=0;
|
select * from t1 where pk between 10 and 20 for update;
|
connection default;
|
# do: send_eval show explain for thr2;
|
connection con3;
|
Timeout in wait_condition.inc for select State='show explain' from information_schema.processlist where id=51
|
Id User Host db Command Time State Info Progress
|
1 system user NULL Daemon NULL InnoDB purge coordinator NULL 0.000
|
2 system user NULL Daemon NULL InnoDB purge worker NULL 0.000
|
3 system user NULL Daemon NULL InnoDB purge worker NULL 0.000
|
4 system user NULL Daemon NULL InnoDB purge worker NULL 0.000
|
5 system user NULL Daemon NULL InnoDB shutdown handler NULL 0.000
|
51 root localhost test Sleep 31 NULL 0.000
|
52 root localhost test Query 31 Sending data select * from t1 where pk between 10 and 20 for update 0.000
|
56 root localhost test Sleep 31 NULL 0.000
|
57 root localhost test Query 0 Init show full processlist 0.000
|
kill query $thr_default;
|
connection default;
|
|
|
I got this locally on 10.3 (very close to mariadb-10.3.18).
|
|
This looks like a badly written test to me. See details below. Here is a recent failure from 10.4: http://buildbot.askmonty.org/buildbot/builders/kvm-fulltest2/builds/24152/steps/mtr_nm/logs/stdio
|
10.4 2f7b37b02154748b223e385a7d7787900ab37b5e
|
CURRENT_TEST: main.show_explain
|
mysqltest: At line 882: query 'reap' succeeded - should have failed with errno 1317...
|
|
The result from queries just before the failure was:
|
< snip >
|
20 data1
|
connection con1;
|
set autocommit=0;
|
select * from t1 where pk between 10 and 20 for update;
|
connection default;
|
# do: send_eval show explain for thr2;
|
connection con3;
|
Timeout in wait_condition.inc for select State='show explain' from information_schema.processlist where id=234
|
Id User Host db Command Time State Info Progress
|
1 system user NULL Daemon NULL InnoDB purge coordinator NULL 0.000
|
3 system user NULL Daemon NULL InnoDB purge worker NULL 0.000
|
4 system user NULL Daemon NULL InnoDB purge worker NULL 0.000
|
2 system user NULL Daemon NULL InnoDB purge worker NULL 0.000
|
5 system user NULL Daemon NULL InnoDB shutdown handler NULL 0.000
|
234 root localhost test Sleep 30 NULL 0.000
|
235 root localhost test Query 30 Sending data select * from t1 where pk between 10 and 20 for update 0.000
|
239 root localhost test Sleep 30 NULL 0.000
|
240 root localhost test Query 0 Init show full processlist 0.000
|
kill query $thr_default;
|
connection default;
|
|
More results from queries before failure can be found in /mnt/buildbot/build/mariadb-10.4.15/mysql-test/var/3/log/show_explain.log
|
Let us look at the changes:
git show 105e3ae6c93f57498fa6c504dfbb92203b0d1056 -- mysql-test/t/show_explain.test
|
diff --git a/mysql-test/t/show_explain.test b/mysql-test/t/show_explain.test
|
index 91e3be34408..6735a0e851e 100644
|
--- a/mysql-test/t/show_explain.test
|
+++ b/mysql-test/t/show_explain.test
|
@@ -224,7 +224,9 @@ connection default;
|
--source include/wait_condition.inc
|
--error ER_TARGET_NOT_EXPLAINABLE
|
evalp show explain for $thr2;
|
+evalp kill query $thr2;
|
connection con1;
|
+--error ER_QUERY_INTERRUPTED
|
reap;
|
set debug_dbug=@old_debug;
|
|
@@ -399,7 +401,9 @@ connection default;
|
--source include/wait_condition.inc
|
--error ER_TARGET_NOT_EXPLAINABLE
|
evalp show explain for $thr2;
|
+evalp kill query $thr2;
|
connection con1;
|
+--error ER_QUERY_INTERRUPTED
|
reap;
|
set debug_dbug=@old_debug;
|
DROP VIEW v1;
|
The test is using DBUG, but for some reason, it is missing proper DEBUG_SYNC instrumentation to ensure that the KILL QUERY will be processed before the statement that was submitted to execution earlier was completed.
Actually, the above mentioned commit is merely duplicating earlier use of such pattern. The error at line 882 was introduced even earlier:
commit 3e90dc1f77dc3fa51d542bf82a336753310f7776
|
Author: Sergey Petrunya <psergey@askmonty.org>
|
Date: Thu Jul 5 22:04:13 2012 +0400
|
|
MWL#182: Explain running statements
|
- Make SHOW EXPLAIN command be KILLable with KILL QUERY.
|
|
|
I made this a blocker as this affects protected branches and thus can stop people from pushing
|
|
I was able to reproduce once
CURRENT_TEST: main.show_explain15
|
mysqltest: At line 880: query 'reap' succeeded - should have failed with errno 1317...
|
--- good-execution.log
|
+++ bad-execution.log
|
@@ -966,82 +966,16 @@
|
connection default;
|
# do: send_eval show explain for thr2;
|
connection con3;
|
+Timeout in wait_condition.inc for select State='show explain' from information_schema.processlist where id=139
|
+Id User Host db Command Time State Info Progress
|
+1 system user NULL Daemon NULL InnoDB purge worker NULL 0.000
|
+2 system user NULL Daemon NULL InnoDB purge worker NULL 0.000
|
+3 system user NULL Daemon NULL InnoDB purge coordinator NULL 0.000
|
+4 system user NULL Daemon NULL InnoDB purge worker NULL 0.000
|
+5 system user NULL Daemon NULL InnoDB shutdown handler NULL 0.000
|
+139 root localhost test Sleep 30 NULL 0.000
|
+140 root localhost test Query 30 Sending data select * from t1 where pk between 10 and 20 for update 0.000
|
+144 root localhost test Sleep 30 NULL 0.000
|
+145 root localhost test Query 0 Init show full processlist 0.000
|
kill query $thr_default;
|
connection default;
|
-ERROR 70100: Query execution was interrupted
|
-connection con2;
|
The test was expecting to find the SHOW EXPLAIN query to be stuck waiting to get explain on a thread that's stuck in innodb lock wait.
The query that's stuck in innodb lock wait is there (id=140). The SHOW EXPLAIN has managed to succeed somehow.
|
|
Could something like this happen if InnoDB lock wait experiences a spurious wakeup (caused by either pthreads or some other reason)? The wakeup will process APCs, including the APC to produce EXPLAIN output...
|
|
...no, the cause seems to be the following issue in the test. Look at these lines in the test:
# This will freeze
|
send
|
select * from t1 where pk between 10 and 20 for update;
|
# run SHOW EXPLAIN on a frozen thread
|
connection default;
|
let $save_wait_condition= $wait_condition;
|
let $wait_condition= select State='Sending data' from information_schema.processlist where id=$thr2;
|
--source include/wait_condition.inc
|
Here, the test expects that the above for update query is blocked inside
InnoDB, waiting for a lock.
However, let's suppose it hasn't reached the lock acquistion yet and is still
at the start of do_select() (which JOIN::exec_inner calls right after
calling THD_STAGE_INFO(thd, stage_sending_data) call)
Then the SHOW EXPLAIN call
--echo # do: send_eval show explain for thr2;
|
--disable_query_log
|
send_eval show explain for $thr2;
|
--enable_query_log
|
has a chance to finish: the SELECT...FOR UPDATE query will call
thd->apc_target.process_apcs() before it is stuck waiting on an innodb lock.
|
|
Candidate patch: http://lists.askmonty.org/pipermail/commits/2021-May/014592.html . I would like to get a review for this from somebody in InnoDB team. marko could you please take a look?
|
|
The patch looks otherwise OK, but it forgot to load the plugins:
At line 43: query 'let $success= `$wait_condition`' failed with wrong errno 1109: 'Unknown table 'INNODB_LOCK_WAITS' in information_schema', instead of 0...
|
I pushed an amended version of the patch to bb-10.2-release.
|