Details
-
Bug
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
None
-
None
Description
For the query 18 in the TPC-H benchmark:
select
|
c_name,
|
c_custkey,
|
o_orderkey,
|
o_orderdate,
|
o_totalprice,
|
sum(l_quantity) |
from
|
CUSTOMER,
|
ORDERS,
|
LINEITEM
|
where
|
o_orderkey in ( |
select |
l_orderkey
|
from |
LINEITEM
|
group by |
l_orderkey
|
having |
sum(l_quantity) > 314 |
)
|
and c_custkey = o_custkey |
and o_orderkey = l_orderkey |
group by |
c_name,
|
c_custkey,
|
o_orderkey,
|
o_orderdate,
|
o_totalprice
|
order by |
o_totalprice desc, |
o_orderdate
|
limit
|
100;
|
When executing with ANALYZE, its query plan and execution time are:
id select_type table type possible_keys key key_len ref rows r_rows filtered r_filtered Extra
|
1 PRIMARY ORDERS ALL PRIMARY,ORDERS_FK1 NULL NULL NULL 1488600 1500000.00 100.00 100.00 Using temporary; Using filesort
|
1 PRIMARY CUSTOMER eq_ref PRIMARY PRIMARY 4 tpch.ORDERS.O_CUSTKEY 1 1.00 100.00 100.00
|
1 PRIMARY <subquery2> eq_ref distinct_key distinct_key 4 tpch.ORDERS.O_ORDERKEY 1 0.00 100.00 100.00
|
1 PRIMARY LINEITEM ref PRIMARY PRIMARY 4 tpch.ORDERS.O_ORDERKEY 3 7.00 100.00 100.00
|
2 MATERIALIZED LINEITEM index NULL PRIMARY 8 NULL 5797664 6001215.00 100.00 100.00
|
|
0.00s user 0.00s system 0% cpu 5.091 total
|
If we disable the semijoin optimization:
SET SESSION optimizer_switch='semijoin=off; |
The execution time is significantly improved:
id select_type table type possible_keys key key_len ref rows r_rows filtered r_filtered Extra
|
1 PRIMARY ORDERS ALL PRIMARY,ORDERS_FK1 NULL NULL NULL 1488600 1500000.00 100.00 0.00 Using where; Using temporary; Using filesort
|
1 PRIMARY CUSTOMER eq_ref PRIMARY PRIMARY 4 tpch.ORDERS.O_CUSTKEY 1 1.00 100.00 100.00
|
1 PRIMARY LINEITEM ref PRIMARY PRIMARY 4 tpch.ORDERS.O_ORDERKEY 3 7.00 100.00 100.00
|
2 MATERIALIZED LINEITEM index NULL PRIMARY 8 NULL 5797664 6001215.00 100.00 100.00
|
|
0.01s user 0.00s system 0% cpu 2.676 total
|
The performance seems unexpectedly improved after disabling the semijoin optimization. I observer that only disabling the semijoin for the SESSION works. If we execute SET GLOBAL optimizer_switch='semijoin=off';, the performance has no difference.
I used 1 GB of data for the TPC-H benchmark. Configuring TPC-H requires some effort, so I also attach my entire database for your reference: https://drive.google.com/file/d/1PQF7FOHu2VQYUC9aUG69M2u_7BgNCeN_/view?usp=sharing
The username is root. and the password is root.