Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-37011

Using FirstMatch() in SEMI JOIN maybe a better choice

    XMLWordPrintable

Details

    • Bug
    • Status: Open (View Workflow)
    • Minor
    • Resolution: Unresolved
    • 11.8.2
    • 11.8
    • Optimizer
    • None

    Description

      Hi, MariaDB developers,

      Thank you for taking the time to review this report. I'd like to suggest an optimization opportunity regarding SEMI JOIN operations.

      SEMI JOINs often perform better with FirstMatch(), yet the optimizer of MariaDB does not choose it. Since these joins are frequent, prioritizing this strategy could yield broad speedups.

      You can reproduce it as follows. As you can see, the SEMI JOIN query consumes 42.779 sec, even though there is only one row in t1;

      CREATE TABLE t1(c1 INT8);
      CREATE TABLE t2(c2 INT8);
      INSERT INTO t2 SELECT seq FROM seq_1_to_10000000;
       
      -- FistMatch at the beginning value
      INSERT INTO t1 VALUES(1);
      MariaDB [test]> SELECT * FROM t1 WHERE EXISTS (SELECT * FROM t2 WHERE c1=c2);
      +------+
      | c3   |
      +------+
      |    1 |
      +------+
      1 row in set (42.779 sec)
       
      -- FistMatch at the middle value
      DELETE FROM t1;
      INSERT INTO t1 VALUES(5000000);
      MariaDB [test]> SELECT * FROM t1 WHERE EXISTS (SELECT * FROM t2 WHERE c1=c2);
      +---------+
      | c1      |
      +---------+
      | 5000000 |
      +---------+
      1 row in set (41.751 sec)
       
      -- FistMatch at the ending value
      DELETE FROM t1;
      INSERT INTO t1 VALUES(10000000);
      MariaDB [test]> SELECT * FROM t1 WHERE EXISTS (SELECT * FROM t2 WHERE c1=c2);
      +----------+
      | c1       |
      +----------+
      | 10000000 |
      +----------+
      1 row in set (41.898 sec)
       
      MariaDB [test]> explain SELECT * FROM t1 WHERE EXISTS (SELECT * FROM t2 WHERE c1=c2) \G
      *************************** 1. row ***************************
                 id: 1
        select_type: PRIMARY
              table: t1
               type: ALL
      possible_keys: NULL
                key: NULL
            key_len: NULL
                ref: NULL
               rows: 1
              Extra: 
      *************************** 2. row ***************************
                 id: 1
        select_type: PRIMARY
              table: <subquery2>
               type: eq_ref
      possible_keys: distinct_key
                key: distinct_key
            key_len: 8
                ref: func
               rows: 1
              Extra: 
      *************************** 3. row ***************************
                 id: 2
        select_type: MATERIALIZED
              table: t2
               type: ALL
      possible_keys: NULL
                key: NULL
            key_len: NULL
                ref: NULL
               rows: 9728224
              Extra: 
      3 rows in set (0.011 sec)
      

      The ANTI JOIN is a positive case, which may has adopted FirstMatch(). As shown below, the time consumed varies depending on the position of the match. Although the query plan does not specify whether FirstMatch() is used, I presume that it is used.

      -- FistMatch at the beginning value
      INSERT INTO t1 VALUES(1);
      SELECT * FROM t1 WHERE NOT EXISTS (SELECT * FROM t2 WHERE c1=c2);
      Empty set (0.012 sec)
       
      -- FistMatch at the middle value
      DELETE FROM t1;
      INSERT INTO t1 VALUES(5000000);
      SELECT * FROM t1 WHERE NOT EXISTS (SELECT * FROM t2 WHERE c1=c2);
      Empty set (1.135 sec)
       
      -- FistMatch at the ending value
      DELETE FROM t1;
      INSERT INTO t1 VALUES(10000000);
      SELECT * FROM t1 WHERE NOT EXISTS (SELECT * FROM t2 WHERE c1=c2);
      Empty set (2.284 sec)
       
       
      explain SELECT * FROM t1 WHERE NOT EXISTS (SELECT * FROM t2 WHERE c1=c2);
      *************************** 1. row ***************************
                 id: 1
        select_type: PRIMARY
              table: t1
               type: ALL
      possible_keys: NULL
                key: NULL
            key_len: NULL
                ref: NULL
               rows: 1
              Extra: Using where
      *************************** 2. row ***************************
                 id: 2
        select_type: DEPENDENT SUBQUERY
              table: t2
               type: ALL
      possible_keys: NULL
                key: NULL
            key_len: NULL
                ref: NULL
               rows: 9728224
              Extra: Using where
      2 rows in set (0.001 sec)
      

      Thus, my expected behavior is that the SEMI JOIN query should consume less time. For example, other DBMSs like MySQL has adopted FirstMatch() in SEMI JOIN, which only consumes 1/10 time of MairaDB when executing the same SEMI JOIN query. I am sorry that I used this comparison to illustrate this report. In my opinion, both MariaDB and MySQL are among the best RDBMSs.

      mysql> SELECT * FROM t1 WHERE EXISTS (SELECT * FROM t2 WHERE c1=c2);
      +------+
      | c1   |
      +------+
      |    1 |
      +------+
      1 row in set (4.060 sec)
      mysql> explain SELECT * FROM t1 WHERE EXISTS (SELECT * FROM t2 WHERE c1=c2) \G
      *************************** 1. row ***************************
                 id: 1
        select_type: SIMPLE
              table: t1
         partitions: NULL
               type: ALL
      possible_keys: NULL
                key: NULL
            key_len: NULL
                ref: NULL
               rows: 1
           filtered: 100.00
              Extra: NULL
      *************************** 2. row ***************************
                 id: 1
        select_type: SIMPLE
              table: t2
         partitions: NULL
               type: ALL
      possible_keys: NULL
                key: NULL
            key_len: NULL
                ref: NULL
               rows: 9746243
           filtered: 10.00
              Extra: Using where; FirstMatch(t1); Using join buffer (hash join)
      2 rows in set, 2 warnings (0.001 sec)
      

      Attachments

        Activity

          People

            psergei Sergei Petrunia
            jinhui lai jinhui lai
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.