[MDEV-35332] Performance Issue on TPC-H Query 4 - Jira

XML

Word

Printable

Details

Type: Bug
Status: Open (View Workflow)
Priority: Major
Resolution: Unresolved
Affects Version/s: 11.7.0
Fix Version/s: 11.7
Component/s: Optimizer
Labels:
None

Description

For the query 4 in the TPC-H benchmark:

select

    o_orderpriority,

    count(*) as order_count

from

    ORDERS

where

    o_orderdate >= date '1995-01-01'

    and o_orderdate < date '1995-01-01' + interval '3' month

    and exists (

        select

        from

            LINEITEM

        where

            l_orderkey = o_orderkey

            and l_commitdate < l_receiptdate

group by

    o_orderpriority

order by

    o_orderpriority;

When executing with ANALYZE, its query plan and execution time are:

id	select_type	table	type	possible_keys	key	key_len	ref	rows	r_rows	filtered	r_filtered	Extra

1	PRIMARY	ORDERS	ALL	PRIMARY	NULL	NULL	NULL	1488600	1500000.00	100.00	3.77	Using where; Using temporary; Using filesort

1	PRIMARY	<subquery2>	eq_ref	distinct_key	distinct_key	4	func	1	0.92	100.00	100.00

2	MATERIALIZED	LINEITEM	ALL	PRIMARY	NULL	NULL	NULL	5797664	6001215.00	100.00	63.21	Using where

0.00s user 0.01s system 0% cpu 8.147 total

If we apply the following patch:

diff --git a/sql/item_subselect.cc b/sql/item_subselect.cc

index 98a5450f0e1..fc11af0b568 100644

--- a/sql/item_subselect.cc

+++ b/sql/item_subselect.cc

@@ -3010,7 +3010,6 @@ static bool check_equality_for_exist2in(Item_func *func,

                                         Item **outer_exp)

   Item **args;

-  if (func->functype() != Item_func::EQ_FUNC)

     return FALSE;

   DBUG_ASSERT(func->argument_count() == 2);

   args= func->arguments();

The execution time is significantly improved:

id	select_type	table	type	possible_keys	key	key_len	ref	rows	r_rows	filtered	r_filtered	Extra

1	PRIMARY	ORDERS	ALL	NULL	NULL	NULL	NULL	1488600	1500000.00	100.00	3.46	Using where; Using temporary; Using filesort

2	DEPENDENT SUBQUERY	LINEITEM	ref	PRIMARY	PRIMARY	4	tpch.ORDERS.O_ORDERKEY	3	1.45	100.00	63.17	Using where

0.01s user 0.00s system 0% cpu 3.173 total

The performance seems unexpectedly improved after the patch. I am not proposing a patch to directly modify the code. I am wondering whether we can optimize the logic around the patch to enable the second query plan for better performance.

I used 1 GB of data for the TPC-H benchmark. Configuring TPC-H requires some effort, so I also attach my entire database for your reference: https://drive.google.com/file/d/1PQF7FOHu2VQYUC9aUG69M2u_7BgNCeN_/view?usp=sharing
The username is root. and the password is root.

Attachments

Activity

People

Assignee:: Sergei Petrunia

Reporter:: Jinsheng Ba

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 2024-11-04 07:49

Updated:: 2024-11-14 19:09

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.