Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-35332

Performance Issue on TPC-H Query 4

    XMLWordPrintable

Details

    • Bug
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • 11.7.0
    • 11.7
    • Optimizer
    • None

    Description

      For the query 4 in the TPC-H benchmark:

      select
          o_orderpriority,
          count(*) as order_count
      from
          ORDERS
      where
          o_orderdate >= date '1995-01-01'
          and o_orderdate < date '1995-01-01' + interval '3' month
          and exists (
              select
                  *
              from
                  LINEITEM
              where
                  l_orderkey = o_orderkey
                  and l_commitdate < l_receiptdate
          )
      group by
          o_orderpriority
      order by
          o_orderpriority;
      

      When executing with ANALYZE, its query plan and execution time are:

      id	select_type	table	type	possible_keys	key	key_len	ref	rows	r_rows	filtered	r_filtered	Extra
      1	PRIMARY	ORDERS	ALL	PRIMARY	NULL	NULL	NULL	1488600	1500000.00	100.00	3.77	Using where; Using temporary; Using filesort
      1	PRIMARY	<subquery2>	eq_ref	distinct_key	distinct_key	4	func	1	0.92	100.00	100.00	
      2	MATERIALIZED	LINEITEM	ALL	PRIMARY	NULL	NULL	NULL	5797664	6001215.00	100.00	63.21	Using where
       
       
      0.00s user 0.01s system 0% cpu 8.147 total
      

      If we apply the following patch:

      diff --git a/sql/item_subselect.cc b/sql/item_subselect.cc
      index 98a5450f0e1..fc11af0b568 100644
      --- a/sql/item_subselect.cc
      +++ b/sql/item_subselect.cc
      @@ -3010,7 +3010,6 @@ static bool check_equality_for_exist2in(Item_func *func,
                                               Item **outer_exp)
       {
         Item **args;
      -  if (func->functype() != Item_func::EQ_FUNC)
           return FALSE;
         DBUG_ASSERT(func->argument_count() == 2);
         args= func->arguments();
      

      The execution time is significantly improved:

      id	select_type	table	type	possible_keys	key	key_len	ref	rows	r_rows	filtered	r_filtered	Extra
      1	PRIMARY	ORDERS	ALL	NULL	NULL	NULL	NULL	1488600	1500000.00	100.00	3.46	Using where; Using temporary; Using filesort
      2	DEPENDENT SUBQUERY	LINEITEM	ref	PRIMARY	PRIMARY	4	tpch.ORDERS.O_ORDERKEY	3	1.45	100.00	63.17	Using where
       
       
      0.01s user 0.00s system 0% cpu 3.173 total
      

      The performance seems unexpectedly improved after the patch. I am not proposing a patch to directly modify the code. I am wondering whether we can optimize the logic around the patch to enable the second query plan for better performance.

      I used 1 GB of data for the TPC-H benchmark. Configuring TPC-H requires some effort, so I also attach my entire database for your reference: https://drive.google.com/file/d/1PQF7FOHu2VQYUC9aUG69M2u_7BgNCeN_/view?usp=sharing
      The username is root. and the password is root.

      Attachments

        Activity

          People

            psergei Sergei Petrunia
            bajinsheng Jinsheng Ba
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.