[MDEV-35331] Performance Issue on TPC-H Query 7 - Jira

Details

Type: Bug
Status: Open (View Workflow)
Priority: Major
Resolution: Unresolved
Affects Version/s: 11.7.0
Fix Version/s: 11.7(EOL)
Component/s: Optimizer
Labels:
None

Description

For query 7 in the TPC-H benchmark;

select

    supp_nation,

    cust_nation,

    l_year,

    sum(volume) as revenue

from

        select

            n1.n_name as supp_nation,

            n2.n_name as cust_nation,

            extract(

                year

                from

                    l_shipdate

            ) as l_year,

            l_extendedprice * (1 - l_discount) as volume

        from

            SUPPLIER,

            LINEITEM,

            ORDERS,

            CUSTOMER,

            NATION n1,

            NATION n2

        where

            s_suppkey = l_suppkey

            and o_orderkey = l_orderkey

            and c_custkey = o_custkey

            and s_nationkey = n1.n_nationkey

            and c_nationkey = n2.n_nationkey

            and (

                    n1.n_name = 'JAPAN'

                    and n2.n_name = 'INDIA'

                or (

                    n1.n_name = 'INDIA'

                    and n2.n_name = 'JAPAN'

            and l_shipdate between date '1995-01-01'

            and date '1996-12-31'

    ) as shipping

group by

    supp_nation,

    cust_nation,

    l_year

order by

    supp_nation,

    cust_nation,

    l_year;

When executing with ANALYZE, its query plan and execution time are:

id	select_type	table	type	possible_keys	key	key_len	ref	rows	r_rows	filtered	r_filtered	Extra

1	SIMPLE	ORDERS	index	PRIMARY,ORDERS_FK1	ORDERS_FK1	4	NULL	1488600	1500000.00	100.00	100.00	Using index; Using temporary; Using filesort

1	SIMPLE	CUSTOMER	eq_ref	PRIMARY,CUSTOMER_FK1	PRIMARY	4	tpch.ORDERS.O_CUSTKEY	1	1.00	100.00	100.00

1	SIMPLE	n2	eq_ref	PRIMARY	PRIMARY	4	tpch.CUSTOMER.C_NATIONKEY	1	1.00	100.00	7.95	Using where

1	SIMPLE	LINEITEM	ref	PRIMARY	PRIMARY	4	tpch.ORDERS.O_ORDERKEY	3	4.00	100.00	30.38	Using where

1	SIMPLE	SUPPLIER	eq_ref	PRIMARY,SUPPLIER_FK1	PRIMARY	4	tpch.LINEITEM.L_SUPPKEY	1	1.00	100.00	100.00

1	SIMPLE	n1	eq_ref	PRIMARY	PRIMARY	4	tpch.SUPPLIER.S_NATIONKEY	1	1.00	100.00	4.10	Using where

0.00s user 0.00s system 0% cpu 11.165 total

If we apply the following patch:

diff --git a/sql/sql_select.cc b/sql/sql_select.cc

index d6e9818a5a2..eac319b24d3 100644

--- a/sql/sql_select.cc

+++ b/sql/sql_select.cc

@@ -12707,23 +12707,6 @@ prev_record_reads(const POSITION *position, uint idx, table_map found_ref,

   for (const POSITION *pos= cur_pos-1; pos != pos_end; pos--)

-    if (found_ref & pos->table->table->map)

-    {

-      /* Found a table we depend on */

-      found_ref= ~pos->table->table->map;

-      if (!found_ref)

-      {

-        /*

-          No more dependencies. We can use the cached values to improve things

-          a bit

-        */

-        if (pos->type == JT_EQ_REF)

-          found= COST_MULT(found, pos->identical_keys);

-        else if (pos->use_join_buffer)

-          found= COST_MULT(found, pos->loops / pos->refills);

-      }

-      break;

-    }

     if (unlikely(pos->use_join_buffer))

       /* Each refill can change the cached key */

The query plan and the execution time are:

id	select_type	table	type	possible_keys	key	key_len	ref	rows	r_rows	filtered	r_filtered	Extra

1	SIMPLE	LINEITEM	ALL	PRIMARY	NULL	NULL	NULL	5797664	6001215.00	100.00	30.47	Using where; Using temporary; Using filesort

1	SIMPLE	SUPPLIER	eq_ref	PRIMARY,SUPPLIER_FK1	PRIMARY	4	tpch.LINEITEM.L_SUPPKEY	1	1.00	100.00	100.00

1	SIMPLE	ORDERS	eq_ref	PRIMARY,ORDERS_FK1	PRIMARY	4	tpch.LINEITEM.L_ORDERKEY	1	1.00	100.00	100.00

1	SIMPLE	CUSTOMER	eq_ref	PRIMARY,CUSTOMER_FK1	PRIMARY	4	tpch.ORDERS.O_CUSTKEY	1	1.00	100.00	100.00

1	SIMPLE	n2	eq_ref	PRIMARY	PRIMARY	4	tpch.CUSTOMER.C_NATIONKEY	1	1.00	100.00	7.92	Using where

1	SIMPLE	n1	eq_ref	PRIMARY	PRIMARY	4	tpch.SUPPLIER.S_NATIONKEY	1	1.00	100.00	4.10	Using where

0.01s user 0.00s system 0% cpu 6.775 total

The performance seems unexpectedly improved after the patch. I am not proposing a patch to directly modify the code. I am wondering whether we can optimize the logic of function prev_record_reads() to enable the second query plan for a better performance.

I used 1 GB of data for the TPC-H benchmark. Configuring TPC-H requires some effort, so I also attach my entire database for your reference: https://drive.google.com/file/d/1PQF7FOHu2VQYUC9aUG69M2u_7BgNCeN_/view?usp=sharing
The username is root. and the password is root.

Attachments

Activity

There are no comments yet on this issue.

People

Assignee:: Sergei Petrunia

Reporter:: Jinsheng Ba

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 2024-11-04 07:38

Updated:: 2024-11-14 19:09

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server