Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-582

LP:929732 - semijoin much slower than in_to_exists

    XMLWordPrintable

Details

    Description

      I originally commented on this bug: https://bugs.launchpad.net/maria/+bug/806894

      but now I'm not so sure that is my issue.

      With semijoin=on, I am seeing more than an order of magnitude of performance drop vs. in_to_exists.

      I will upload my dataset to the ftp server and followup afterwards. For now, query is:

      SELECT  count(*)
      FROM     v
               LEFT JOIN c
               ON       v.cid = c.cid
      WHERE    v.t        >= '2012-01-31 05:00:00'
      AND      v.t        <= '2012-02-08 07:59:59'
      AND      v.did     = '208'
      AND      c.pid          = '3124'
      AND      v.cid IN
               ( SELECT c.cid
               FROM    c
               WHERE   c.pid            = '3124'
               AND     c.s = 0
               AND
                       (
                               (
                                       c.cid IN
                                       ( /*Inner query 1*/ SELECT v.cid
                                       FROM    v
                                       WHERE   v.did      = 208
                                       AND     v.t         >= '2012-01-31 05:00:00'
                                       AND     v.t         <= '2012-02-08 07:59:59'
                                       )
                               )
                       AND
                               (
                                       c.cid IN
                                       ( /*Inner query 2*/ SELECT v.cid
                                       FROM    v
                                       WHERE   v.did      = 208
                                       AND     v.t         >= '2012-01-31 05:00:00'
                                       AND     v.t         <= '2012-02-08 07:59:59'
                                       )
                               )
                       AND
                               (
                                       c.cid IN
                                       ( /*Inner query 3*/ SELECT v.cid
                                       FROM    v
                                       WHERE  v.did      = 208
                                       AND     v.t         >= '2012-01-31 05:00:00'
                                       AND     v.t         <= '2012-02-08 07:59:59'
                                       )
                               )
                       )
               );

      mysql> show variables like 'optimizer_switch';
      +------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
      | Variable_name    | Value                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
      +------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
      | optimizer_switch | index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=on,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=off,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=on,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on |
      +------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
      1 row in set (0.02 sec)
       
      mysql> select version();
      +----------------------+
      | version()            |
      +----------------------+
      | 5.3.4-MariaDB-rc-log |
      +----------------------+
      1 row in set (0.00 sec)

      (revno: 3411)

      explain with no semijoin:

      +----+--------------------+-------+-----------------+-----------------+---------+---------+------------+------+-----------------------------------------------------------+
      | id | select_type        | table | type            | possible_keys   | key     | key_len | ref        | rows | Extra                                                     |
      +----+--------------------+-------+-----------------+-----------------+---------+---------+------------+------+-----------------------------------------------------------+
      |  1 | PRIMARY            | v     | range           | cid,did         | did     | 17      | NULL       |  275 | Using where; Using index; Using temporary; Using filesort |
      |  1 | PRIMARY            | c     | eq_ref          | PRIMARY         | PRIMARY | 8       | test.v.cid |    1 | Using where                                               |
      |  2 | DEPENDENT SUBQUERY | c     | unique_subquery | PRIMARY         | PRIMARY | 8       | func       |    1 | Using where                                               |
      |  5 | DEPENDENT SUBQUERY | v     | index_subquery  | PRIMARY,cid,did | cid     | 16      | func,const |    1 | Using index; Using where                                  |
      |  4 | DEPENDENT SUBQUERY | v     | index_subquery  | PRIMARY,cid,did | cid     | 16      | func,const |    1 | Using index; Using where                                  |
      |  3 | DEPENDENT SUBQUERY | v     | index_subquery  | PRIMARY,cid,did | cid     | 16      | func,const |    1 | Using index; Using where                                  |
      +----+--------------------+-------+-----------------+-----------------+---------+---------+------------+------+-----------------------------------------------------------+

      result:

      +----------+
      | count(*) |
      +----------+
      |      275 |
      +----------+
      1 row in set (0.01 sec)

      explain w/ semijoin (set session optimizer_switch='semijoin=on')

      +----+-------------+-------+--------+---------------+---------+---------+------------------+------+-------------------------------------------+
      | id | select_type | table | type   | possible_keys | key     | key_len | ref              | rows | Extra                                     |
      +----+-------------+-------+--------+---------------+---------+---------+------------------+------+-------------------------------------------+
      |  1 | PRIMARY     | v     | range  | cid,did       | did     | 17      | NULL             |  275 | Using where; Using index                  |
      |  1 | PRIMARY     | v     | ref    | cid,did       | cid     | 16      | test.v.cid,const |    1 | Using where; Using index; Start temporary |
      |  1 | PRIMARY     | v     | ref    | cid,did       | cid     | 16      | test.v.cid,const |    1 | Using where; Using index                  |
      |  1 | PRIMARY     | v     | ref    | cid,did       | cid     | 16      | test.v.cid,const |    1 | Using where; Using index                  |
      |  1 | PRIMARY     | c     | eq_ref | PRIMARY       | PRIMARY | 8       | test.v.cid       |    1 | Using where                               |
      |  1 | PRIMARY     | c     | eq_ref | PRIMARY       | PRIMARY | 8       | test.v.cid       |    1 | Using where; End temporary                |
      +----+-------------+-------+--------+---------------+---------+---------+------------------+------+-------------------------------------------+

      result:

      +----------+
      | count(*) |
      +----------+
      |      275 |
      +----------+
      1 row in set (0.51 sec)

      A couple of things I found:

      With semijoin off and more tables added as left joins in Inner query 1|2|3, the query takes approx the same times (or maybe linear increase). With semijoin on and more tables added as left joins to Inner query 1|2|3, query takes much more time (30s in my tests, so not linear)

      On my server with flashcache ssds backed stores and much higher end CPU (Core i7), the semijoin query takes .49s. With aws instance with trashy ebs and 2006 era AMD processors, semijoin varient takes 2.5s. The CPU/IO does not affect query times with semijoin=off. The result is always fast (0.01s).

      Attachments

        Issue Links

          Activity

            People

              psergei Sergei Petrunia
              fimbulvetr Dan Vande More
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.