Details
-
Bug
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
11.7.0
-
None
Description
For the query 4 in the TPC-H benchmark:
select
|
o_orderpriority,
|
count(*) as order_count |
from
|
ORDERS
|
where
|
o_orderdate >= date '1995-01-01' |
and o_orderdate < date '1995-01-01' + interval '3' month |
and exists ( |
select |
*
|
from |
LINEITEM
|
where |
l_orderkey = o_orderkey
|
and l_commitdate < l_receiptdate |
)
|
group by |
o_orderpriority
|
order by |
o_orderpriority;
|
When executing with ANALYZE, its query plan and execution time are:
id select_type table type possible_keys key key_len ref rows r_rows filtered r_filtered Extra
|
1 PRIMARY ORDERS ALL PRIMARY NULL NULL NULL 1488600 1500000.00 100.00 3.77 Using where; Using temporary; Using filesort
|
1 PRIMARY <subquery2> eq_ref distinct_key distinct_key 4 func 1 0.92 100.00 100.00
|
2 MATERIALIZED LINEITEM ALL PRIMARY NULL NULL NULL 5797664 6001215.00 100.00 63.21 Using where
|
|
|
0.00s user 0.01s system 0% cpu 8.147 total
|
If we apply the following patch:
diff --git a/sql/item_subselect.cc b/sql/item_subselect.cc
|
index 98a5450f0e1..fc11af0b568 100644
|
--- a/sql/item_subselect.cc
|
+++ b/sql/item_subselect.cc
|
@@ -3010,7 +3010,6 @@ static bool check_equality_for_exist2in(Item_func *func,
|
Item **outer_exp)
|
{
|
Item **args;
|
- if (func->functype() != Item_func::EQ_FUNC)
|
return FALSE;
|
DBUG_ASSERT(func->argument_count() == 2);
|
args= func->arguments();
|
The execution time is significantly improved:
id select_type table type possible_keys key key_len ref rows r_rows filtered r_filtered Extra
|
1 PRIMARY ORDERS ALL NULL NULL NULL NULL 1488600 1500000.00 100.00 3.46 Using where; Using temporary; Using filesort
|
2 DEPENDENT SUBQUERY LINEITEM ref PRIMARY PRIMARY 4 tpch.ORDERS.O_ORDERKEY 3 1.45 100.00 63.17 Using where
|
|
|
0.01s user 0.00s system 0% cpu 3.173 total
|
The performance seems unexpectedly improved after the patch. I am not proposing a patch to directly modify the code. I am wondering whether we can optimize the logic around the patch to enable the second query plan for better performance.
I used 1 GB of data for the TPC-H benchmark. Configuring TPC-H requires some effort, so I also attach my entire database for your reference: https://drive.google.com/file/d/1PQF7FOHu2VQYUC9aUG69M2u_7BgNCeN_/view?usp=sharing
The username is root. and the password is root.