Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-35331

Performance Issue on TPC-H Query 7

    XMLWordPrintable

Details

    • Bug
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • 11.7.0
    • 11.7
    • Optimizer
    • None

    Description

      For query 7 in the TPC-H benchmark;

      select
          supp_nation,
          cust_nation,
          l_year,
          sum(volume) as revenue
      from
          (
              select
                  n1.n_name as supp_nation,
                  n2.n_name as cust_nation,
                  extract(
                      year
                      from
                          l_shipdate
                  ) as l_year,
                  l_extendedprice * (1 - l_discount) as volume
              from
                  SUPPLIER,
                  LINEITEM,
                  ORDERS,
                  CUSTOMER,
                  NATION n1,
                  NATION n2
              where
                  s_suppkey = l_suppkey
                  and o_orderkey = l_orderkey
                  and c_custkey = o_custkey
                  and s_nationkey = n1.n_nationkey
                  and c_nationkey = n2.n_nationkey
                  and (
                      (
                          n1.n_name = 'JAPAN'
                          and n2.n_name = 'INDIA'
                      )
                      or (
                          n1.n_name = 'INDIA'
                          and n2.n_name = 'JAPAN'
                      )
                  )
                  and l_shipdate between date '1995-01-01'
                  and date '1996-12-31'
          ) as shipping
      group by
          supp_nation,
          cust_nation,
          l_year
      order by
          supp_nation,
          cust_nation,
          l_year;
      

      When executing with ANALYZE, its query plan and execution time are:

      id	select_type	table	type	possible_keys	key	key_len	ref	rows	r_rows	filtered	r_filtered	Extra
      1	SIMPLE	ORDERS	index	PRIMARY,ORDERS_FK1	ORDERS_FK1	4	NULL	1488600	1500000.00	100.00	100.00	Using index; Using temporary; Using filesort
      1	SIMPLE	CUSTOMER	eq_ref	PRIMARY,CUSTOMER_FK1	PRIMARY	4	tpch.ORDERS.O_CUSTKEY	1	1.00	100.00	100.00	
      1	SIMPLE	n2	eq_ref	PRIMARY	PRIMARY	4	tpch.CUSTOMER.C_NATIONKEY	1	1.00	100.00	7.95	Using where
      1	SIMPLE	LINEITEM	ref	PRIMARY	PRIMARY	4	tpch.ORDERS.O_ORDERKEY	3	4.00	100.00	30.38	Using where
      1	SIMPLE	SUPPLIER	eq_ref	PRIMARY,SUPPLIER_FK1	PRIMARY	4	tpch.LINEITEM.L_SUPPKEY	1	1.00	100.00	100.00	
      1	SIMPLE	n1	eq_ref	PRIMARY	PRIMARY	4	tpch.SUPPLIER.S_NATIONKEY	1	1.00	100.00	4.10	Using where
       
      0.00s user 0.00s system 0% cpu 11.165 total
      

      If we apply the following patch:

      diff --git a/sql/sql_select.cc b/sql/sql_select.cc
      index d6e9818a5a2..eac319b24d3 100644
      --- a/sql/sql_select.cc
      +++ b/sql/sql_select.cc
      @@ -12707,23 +12707,6 @@ prev_record_reads(const POSITION *position, uint idx, table_map found_ref,
       
         for (const POSITION *pos= cur_pos-1; pos != pos_end; pos--)
         {
      -    if (found_ref & pos->table->table->map)
      -    {
      -      /* Found a table we depend on */
      -      found_ref= ~pos->table->table->map;
      -      if (!found_ref)
      -      {
      -        /*
      -          No more dependencies. We can use the cached values to improve things
      -          a bit
      -        */
      -        if (pos->type == JT_EQ_REF)
      -          found= COST_MULT(found, pos->identical_keys);
      -        else if (pos->use_join_buffer)
      -          found= COST_MULT(found, pos->loops / pos->refills);
      -      }
      -      break;
      -    }
           if (unlikely(pos->use_join_buffer))
           {
             /* Each refill can change the cached key */
      

      The query plan and the execution time are:

      id	select_type	table	type	possible_keys	key	key_len	ref	rows	r_rows	filtered	r_filtered	Extra
      1	SIMPLE	LINEITEM	ALL	PRIMARY	NULL	NULL	NULL	5797664	6001215.00	100.00	30.47	Using where; Using temporary; Using filesort
      1	SIMPLE	SUPPLIER	eq_ref	PRIMARY,SUPPLIER_FK1	PRIMARY	4	tpch.LINEITEM.L_SUPPKEY	1	1.00	100.00	100.00	
      1	SIMPLE	ORDERS	eq_ref	PRIMARY,ORDERS_FK1	PRIMARY	4	tpch.LINEITEM.L_ORDERKEY	1	1.00	100.00	100.00	
      1	SIMPLE	CUSTOMER	eq_ref	PRIMARY,CUSTOMER_FK1	PRIMARY	4	tpch.ORDERS.O_CUSTKEY	1	1.00	100.00	100.00	
      1	SIMPLE	n2	eq_ref	PRIMARY	PRIMARY	4	tpch.CUSTOMER.C_NATIONKEY	1	1.00	100.00	7.92	Using where
      1	SIMPLE	n1	eq_ref	PRIMARY	PRIMARY	4	tpch.SUPPLIER.S_NATIONKEY	1	1.00	100.00	4.10	Using where
       
      0.01s user 0.00s system 0% cpu 6.775 total
      

      The performance seems unexpectedly improved after the patch. I am not proposing a patch to directly modify the code. I am wondering whether we can optimize the logic of function prev_record_reads() to enable the second query plan for a better performance.

      I used 1 GB of data for the TPC-H benchmark. Configuring TPC-H requires some effort, so I also attach my entire database for your reference: https://drive.google.com/file/d/1PQF7FOHu2VQYUC9aUG69M2u_7BgNCeN_/view?usp=sharing
      The username is root. and the password is root.

      Attachments

        Activity

          People

            psergei Sergei Petrunia
            bajinsheng Jinsheng Ba
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.