Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-17382

Hash join algorithm should not be used to join materialized derived table / view by equality

Details

    Description

      When a materialized derived table is joined using an equality predicate such that one of his parts is a field of the materialized table then hash join algorithm should not be used, because in this case a hash key is created for the table by which it is supposed to be accessed.
      No we see that is not so, for example, for the following test case:

      CREATE TABLE t1 (
        id int NOT NULL,
        amount decimal DEFAULT NULL,
      PRIMARY KEY (id)
      )  ENGINE=MyISAM;
       
      CREATE TABLE t2 (
        id int NOT NULL,
        name varchar(50) DEFAULT NULL,
      PRIMARY KEY (id)
      ) ENGINE=MyISAM;
       
      INSERT INTO t1 VALUES
      (1, 10.0000), (2, 20.0000), (3, 30.0000), (4, 40.0000),
      (5, NULL), (6, NULL), (7, 70.0000), (8, 80.0000);
       
      INSERT INTO t2 VALUES 
      (1,'A'), (2,'B'), (3,'C'), (4,'D'), (5, NULL), (6, NULL),
      (7,'E'), (8,'F'), (9,'G'), (10,'H'), (11, NULL), (12, NULL);
       
      set join_cache_level=4;
       
      EXPLAIN
      SELECT t2.id,t2.name,t.total_amt 
        FROM  t2
              LEFT JOIN
              (SELECT id, sum(amount) total_amt FROM t1 GROUP BY id) AS t
              ON t2.id=t.id
        WHERE t2.id < 3;
      

      j
      We see that hash join algorithm is applied for this query:

      MariaDB [test]> EXPLAIN
          -> SELECT t2.id,t2.name,t.total_amt 
          ->   FROM  t2
          ->         LEFT JOIN
          ->         (SELECT id, sum(amount) total_amt FROM t1 GROUP BY id) AS t
          ->         ON t2.id=t.id
          ->   WHERE t2.id < 3;
      +------+-------------+------------+----------+---------------+------------+---------+------------+------+-------------------------------------+
      | id   | select_type | table      | type     | possible_keys | key        | key_len | ref        | rows | Extra                               |
      +------+-------------+------------+----------+---------------+------------+---------+------------+------+-------------------------------------+
      |    1 | PRIMARY     | t2         | range    | PRIMARY       | PRIMARY    | 4       | NULL       |    2 | Using where                         |
      |    1 | PRIMARY     | <derived2> | hash_ALL | key0          | #hash#key0 | 5       | test.t2.id |    8 | Using join buffer (flat, BNLH join) |
      |    2 | DERIVED     | t1         | index    | NULL          | PRIMARY    | 4       | NULL       |    8 |                                     |
      +------+-------------+------------+----------+---------------+------------+---------+------------+------+-------------------------------------+
      

      Attachments

        Issue Links

          Activity

            igor Igor Babaev (Inactive) created issue -
            igor Igor Babaev (Inactive) made changes -
            Field Original Value New Value
            Fix Version/s 5.5 [ 15800 ]
            igor Igor Babaev (Inactive) made changes -
            igor Igor Babaev (Inactive) made changes -
            Description When a materialized derived table is joined using an equality predicate such that one of his parts is a field of the materialized table then hash join algorithm should not be used, because in this case a hash key is created for the table by which it is supposed to be accessed.
            No we see that is not so, for example, for the following test case:
            {code:sql}
            CREATE TABLE t1 (
              id int NOT NULL,
              amount decimal DEFAULT NULL,
            PRIMARY KEY (id)
            ) ENGINE=InnoDB;

            CREATE TABLE t2 (
              id int NOT NULL,
              name varchar(50) DEFAULT NULL,
            PRIMARY KEY (id)
            ) ENGINE=InnoDB;

            INSERT INTO t1 VALUES
            (1, 10.0000), (2, 20.0000), (3, 30.0000), (4, 40.0000),
            (5, NULL), (6, NULL), (7, 70.0000), (8, 80.0000);

            INSERT INTO t2 VALUES
            (1,'A'), (2,'B'), (3,'C'), (4,'D'), (5, NULL), (6, NULL),
            (7,'E'), (8,'F'), (9,'G'), (10,'H'), (11, NULL), (12, NULL);

            set join_cache_level=4;

            EXPLAIN
            SELECT t2.id,t2.name,t.total_amt
              FROM t2
                    LEFT JOIN
                    (SELECT id, sum(amount) total_amt FROM t1 GROUP BY id) AS t
                    ON t2.id=t.id
              WHERE t2.id < 3;
            {code}j
            We see that hash join algorithm is applied for this query:
            {noformat}
            MariaDB [test]> EXPLAIN
                -> SELECT t2.id,t2.name,t.total_amt
                -> FROM t2
                -> LEFT JOIN
                -> (SELECT id, sum(amount) total_amt FROM t1 GROUP BY id) AS t
                -> ON t2.id=t.id
                -> WHERE t2.id < 3;
            +------+-------------+------------+----------+---------------+------------+---------+------------+------+-------------------------------------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
            +------+-------------+------------+----------+---------------+------------+---------+------------+------+-------------------------------------+
            | 1 | PRIMARY | t2 | range | PRIMARY | PRIMARY | 4 | NULL | 2 | Using where |
            | 1 | PRIMARY | <derived2> | hash_ALL | key0 | #hash#key0 | 5 | test.t2.id | 8 | Using join buffer (flat, BNLH join) |
            | 2 | DERIVED | t1 | index | NULL | PRIMARY | 4 | NULL | 8 | |
            +------+-------------+------------+----------+---------------+------------+---------+------------+------+-------------------------------------+
            {noformat}

            When a materialized derived table is joined using an equality predicate such that one of his parts is a field of the materialized table then hash join algorithm should not be used, because in this case a hash key is created for the table by which it is supposed to be accessed.
            No we see that is not so, for example, for the following test case:
            {code:sql}
            CREATE TABLE t1 (
              id int NOT NULL,
              amount decimal DEFAULT NULL,
            PRIMARY KEY (id)
            ) ENGINE=MyISAM;

            CREATE TABLE t2 (
              id int NOT NULL,
              name varchar(50) DEFAULT NULL,
            PRIMARY KEY (id)
            ) ENGINE=MyISAM;

            INSERT INTO t1 VALUES
            (1, 10.0000), (2, 20.0000), (3, 30.0000), (4, 40.0000),
            (5, NULL), (6, NULL), (7, 70.0000), (8, 80.0000);

            INSERT INTO t2 VALUES
            (1,'A'), (2,'B'), (3,'C'), (4,'D'), (5, NULL), (6, NULL),
            (7,'E'), (8,'F'), (9,'G'), (10,'H'), (11, NULL), (12, NULL);

            set join_cache_level=4;

            EXPLAIN
            SELECT t2.id,t2.name,t.total_amt
              FROM t2
                    LEFT JOIN
                    (SELECT id, sum(amount) total_amt FROM t1 GROUP BY id) AS t
                    ON t2.id=t.id
              WHERE t2.id < 3;
            {code}j
            We see that hash join algorithm is applied for this query:
            {noformat}
            MariaDB [test]> EXPLAIN
                -> SELECT t2.id,t2.name,t.total_amt
                -> FROM t2
                -> LEFT JOIN
                -> (SELECT id, sum(amount) total_amt FROM t1 GROUP BY id) AS t
                -> ON t2.id=t.id
                -> WHERE t2.id < 3;
            +------+-------------+------------+----------+---------------+------------+---------+------------+------+-------------------------------------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
            +------+-------------+------------+----------+---------------+------------+---------+------------+------+-------------------------------------+
            | 1 | PRIMARY | t2 | range | PRIMARY | PRIMARY | 4 | NULL | 2 | Using where |
            | 1 | PRIMARY | <derived2> | hash_ALL | key0 | #hash#key0 | 5 | test.t2.id | 8 | Using join buffer (flat, BNLH join) |
            | 2 | DERIVED | t1 | index | NULL | PRIMARY | 4 | NULL | 8 | |
            +------+-------------+------------+----------+---------------+------------+---------+------------+------+-------------------------------------+
            {noformat}

            igor Igor Babaev (Inactive) made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            igor Igor Babaev (Inactive) made changes -
            Summary Hash join algorithm should not be used for equi-join of materialized derived table or view Hash join algorithm should not be used to join materialized derived table / view by equality

            A fix for this problem was pushed into 5.5 and cherry-picked into 10.0,10.1,10.2,10.3.

            igor Igor Babaev (Inactive) added a comment - A fix for this problem was pushed into 5.5 and cherry-picked into 10.0,10.1,10.2,10.3.
            igor Igor Babaev (Inactive) made changes -
            Fix Version/s 5.5.62 [ 22915 ]
            Fix Version/s 5.5 [ 15800 ]
            Resolution Fixed [ 1 ]
            Status In Progress [ 3 ] Closed [ 6 ]
            Richard Richard Stracke made changes -
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 89935 ] MariaDB v4 [ 133709 ]

            People

              igor Igor Babaev (Inactive)
              igor Igor Babaev (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.