[MDEV-30408] Spider test using semijoin=off returns wrong result (zero rows) Created: 2023-01-13  Updated: 2023-12-05

Status: Confirmed
Project: MariaDB Server
Component/s: Optimizer, Storage Engine - Spider
Affects Version/s: 10.4, 10.5, 10.6, 10.11, 11.0, 11.1, 11.2, 11.3, 11.4
Fix Version/s: 10.4, 10.5, 10.6, 10.11, 11.0, 11.1, 11.2, 11.3

Type: Bug Priority: Major
Reporter: Michael Widenius Assignee: Yuchen Pei
Resolution: Unresolved Votes: 0
Labels: spider-gbh


 Description   

I only tested this on 10.11. The bug could also be in earlier versions.

Applying this diff:

diff --git a/storage/spider/mysql-test/spider/bg/t/basic_sql.test b/storage/spider/mysql-test/spider/bg/t/basic_sql.test
index 9cf4ce99dc3..6a35fe8dd70 100644
--- a/storage/spider/mysql-test/spider/bg/t/basic_sql.test
+++ b/storage/spider/mysql-test/spider/bg/t/basic_sql.test
@@ -156,6 +156,7 @@ if ($USE_CHILD_GROUP2)
   }
 }
 --connection master_1
+set @@optimizer_switch="semijoin=off";
 SELECT a, b, date_format(c, '%Y-%m-%d %H:%i:%s') FROM ta_l ORDER BY a;
 
 --echo

Causes the following result change:
mtr spider/bg.basic_sql

+++ /home/my/maria-10.11/storage/spider/mysql-test/spider/bg/r/basic_sql.reject2023-01-13 20:10:47.205034143 +0200
@@ -49,6 +49,7 @@
 ) MASTER_1_ENGINE MASTER_1_CHARSET MASTER_1_COMMENT_2_1
 SELECT a, b, c FROM tb_l
 connection master_1;
+set @@optimizer_switch="semijoin=off";
 SELECT a, b, date_format(c, '%Y-%m-%d %H:%i:%s') FROM ta_l ORDER BY a;
 a      b       date_format(c, '%Y-%m-%d %H:%i:%s')
 1      a       2008-08-01 10:21:39
@@ -279,11 +280,6 @@
 SELECT a.a, a.b, date_format(a.c, '%Y-%m-%d %H:%i:%s') FROM tb_l a WHERE
 EXISTS (SELECT * FROM ta_l b WHERE b.b = a.b) ORDER BY a.a;
 a      b       date_format(a.c, '%Y-%m-%d %H:%i:%s')
-1      f       2008-07-01 10:21:39
-2      g       2000-02-01 00:00:00
-3      j       2007-05-04 20:03:11
-4      i       2003-10-30 05:01:03
-5      h       2001-10-31 23:59:59
 

select using pushdown



 Comments   
Comment by Yuchen Pei [ 2023-12-05 ]

Somehow with the semijoin=off optimizer switch, it uses the group by
handler (gbh) for the EXISTS subquery. Because such subqueries set
explicit_limit to true (see
Item_exists_subselect::fix_length_and_dec()), the gbh uses the default
limit value 0, which causes the subquery to return empty results.

Either we make gbh use the correct limit 1 in this case, or ban gbh
from being used. The latter requires checking why the optimizer switch
causes gbh to be used for the subquery.

Generated at Thu Feb 08 10:16:02 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.