Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-6015

Measure impact of optimizer_switch='exists_to_in=on' in 10.0

Details

    • Task
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • None
    • None
    • None

    Description

      See https://mariadb.atlassian.net/browse/MDEV-38?focusedCommentId=28609&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-28609.

      I am looking at Q4.

      ts ( select * from lineitem where l_orderkey = o_orderkey and l_commitdate < l_receiptdate ) group by o_orderpriority order by o_orderpriority;
      +------+--------------------+----------+-------+--------------------------------------------+---------------+---------+---------------------------+--------+----------+---------------------------------------------------------------------+
      | id   | select_type        | table    | type  | possible_keys                              | key           | key_len | ref                       | rows   | filtered | Extra                                                               |
      +------+--------------------+----------+-------+--------------------------------------------+---------------+---------+---------------------------+--------+----------+---------------------------------------------------------------------+
      |    1 | PRIMARY            | orders   | range | i_o_orderdate                              | i_o_orderdate | 4       | NULL                      | 137994 |   100.00 | Using index condition; Using where; Using temporary; Using filesort |
      |    2 | DEPENDENT SUBQUERY | lineitem | ref   | PRIMARY,i_l_orderkey,i_l_orderkey_quantity | PRIMARY       | 4       | dbt3sf1.orders.o_orderkey |      2 |   100.00 | Using where                                                         |
      +------+--------------------+----------+-------+--------------------------------------------+---------------+---------+---------------------------+--------+----------+---------------------------------------------------------------------+
      2 rows in set, 2 warnings (0.00 sec)
       
      MariaDB [dbt3sf1]> set optimizer_switch='exists_to_in=on';
      Query OK, 0 rows affected (0.00 sec)
       
      MariaDB [dbt3sf1]> explain extended select o_orderpriority, count(*) as order_count from orders where o_orderdate >= '1995-06-06' and o_orderdate < date_add('1995-06-06', interval 3 month) and exists ( select * from lineitem where l_orderkey = o_orderkey and l_commitdate < l_receiptdate ) group by o_orderpriority order by o_orderpriority;
      +------+-------------+----------+-------+--------------------------------------------+---------------+---------+---------------------------+--------+----------+--------------------------------------------------------+
      | id   | select_type | table    | type  | possible_keys                              | key           | key_len | ref                       | rows   | filtered | Extra                                                  |
      +------+-------------+----------+-------+--------------------------------------------+---------------+---------+---------------------------+--------+----------+--------------------------------------------------------+
      |    1 | PRIMARY     | orders   | range | PRIMARY,i_o_orderdate                      | i_o_orderdate | 4       | NULL                      | 137994 |   100.00 | Using index condition; Using temporary; Using filesort |
      |    1 | PRIMARY     | lineitem | ref   | PRIMARY,i_l_orderkey,i_l_orderkey_quantity | PRIMARY       | 4       | dbt3sf1.orders.o_orderkey |      2 |   100.00 | Using where; FirstMatch(orders)                        |
      +------+-------------+----------+-------+--------------------------------------------+---------------+---------+---------------------------+--------+----------+--------------------------------------------------------+
      2 rows in set, 2 warnings (0.00 sec)

      The query plan is similar. However, the second query should be able to use BKA. Run the query with and without BKA and measure the impact.

      Attachments

        1. dbt3-q22.txt
          8 kB
          Sergei Petrunia
        2. 04-opt1_explain.sql_1_results.txt
          5 kB
          Axel Schwenke
        3. 04-opt2_explain.sql_1_results.txt
          5 kB
          Axel Schwenke
        4. q4-mrr-bka-real.png
          96 kB
          Sergei Petrunia
        5. q4-mrr-real-no-exists2in.png
          98 kB
          Sergei Petrunia
        6. q4-mrr-real-no-exists2in-io-only.png
          37 kB
          Sergei Petrunia

        Activity

          The Q4 variant I am using:

          select
          	o_orderpriority,
          	count(*) as order_count
          from
          	orders
          where
          	o_orderdate >= '1995-06-06'
          	and o_orderdate < date_add('1995-06-06', interval 3 month)
          	and exists (
          		select
          			*
          		from
          			lineitem
          		where
          			l_orderkey = o_orderkey
          			and l_commitdate < l_receiptdate
          	)
          group by
          	o_orderpriority
          order by
          	o_orderpriority;

          psergei Sergei Petrunia added a comment - The Q4 variant I am using: select o_orderpriority, count(*) as order_count from orders where o_orderdate >= '1995-06-06' and o_orderdate < date_add('1995-06-06', interval 3 month) and exists ( select * from lineitem where l_orderkey = o_orderkey and l_commitdate < l_receiptdate ) group by o_orderpriority order by o_orderpriority;

          Query Q22, EXPLAINs with exists_to_in=on|off.

          The query uses WHERE .... AND NOT EXISTS(SELECT * FROM orders WHERE o_custkey = c_custkey).

          It seems, that the best way run the subquery is to use index lookup on "o_custkey = c_custkey". This is used for both exists_to_in=ON and exists_to_in=OFF.
          BKA is not possible in either case, because the subquery uses NOT EXISTS. It is not a semi-join.

          psergei Sergei Petrunia added a comment - Query Q22, EXPLAINs with exists_to_in=on|off. The query uses WHERE .... AND NOT EXISTS(SELECT * FROM orders WHERE o_custkey = c_custkey). It seems, that the best way run the subquery is to use index lookup on "o_custkey = c_custkey". This is used for both exists_to_in=ON and exists_to_in=OFF. BKA is not possible in either case, because the subquery uses NOT EXISTS. It is not a semi-join.
          axel Axel Schwenke added a comment -

          Results from running DBT3/Q4/SF30 on MariaDB-10.0.10 with exits-to-in either on or off. Execution times:

          exists_to_in=off -> 284, 276, 280 (sec)
          exists_to_in=on -> 261, 255, 257 (sec)

          EXPLAIN plans attached.

          axel Axel Schwenke added a comment - Results from running DBT3/Q4/SF30 on MariaDB-10.0.10 with exits-to-in either on or off. Execution times: exists_to_in=off -> 284, 276, 280 (sec) exists_to_in=on -> 261, 255, 257 (sec) EXPLAIN plans attached.

          Thanks.

          Looking at the EXPLAINs. The query plans have the same structure. Exists-to-in conversion has enabled BKA for table lineitem. However, it didn't bring a lot of speedup.

          psergei Sergei Petrunia added a comment - Thanks. Looking at the EXPLAINs. The query plans have the same structure. Exists-to-in conversion has enabled BKA for table lineitem. However, it didn't bring a lot of speedup.

          Graphic trace of query's io when running with scale=1, cold buffer cache.

          psergei Sergei Petrunia added a comment - Graphic trace of query's io when running with scale=1, cold buffer cache.

          IO trace for exists_to_in=off. Things to note
          1. there is no BKA (reads go back and forth)
          2 the data seems to be highly correlated - lookups in lineitem are linear.

          psergei Sergei Petrunia added a comment - IO trace for exists_to_in=off. Things to note 1. there is no BKA (reads go back and forth) 2 the data seems to be highly correlated - lookups in lineitem are linear.

          People

            axel Axel Schwenke
            psergei Sergei Petrunia
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.