Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-17493

Partition pruning doesn't work for nested outer joins

Details

    Description

      Consider this example

      create table ten(a int);
      insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
      create table t1 (a int, b int, c int);
      insert into t1 select a,a,a from ten;
      create table t2 (a int, b int, c int);
      insert into t2 select a,a,a from ten;
       
      create table tp (
        part_id int,
        a int
      ) partition by list (part_id) (
        partition p0 values in (0),
        partition p1 values in (1),
        partition p2 values in (2),
        partition p3 values in (3),
        partition p4 values in (4)
      );
      insert into tp select mod(a,5), a from ten;
      

      Let's put the partitioned table on the inner side of the outer join, and let the ON expression have a condition that's usable by partition pruning:

       explain partitions  select * from t1 left join tp on (t1.a=tp.a and tp.part_id=3);
      +------+-------------+-------+------------+------+---------------+------+---------+------+------+-------------------------------------------------+
      | id   | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | Extra                                           |
      +------+-------------+-------+------------+------+---------------+------+---------+------+------+-------------------------------------------------+
      |    1 | SIMPLE      | t1    | NULL       | ALL  | NULL          | NULL | NULL    | NULL |   10 |                                                 |
      |    1 | SIMPLE      | tp    | p3         | ALL  | NULL          | NULL | NULL    | NULL |    2 | Using where; Using join buffer (flat, BNL join) |
      +------+-------------+-------+------------+------+---------------+------+---------+------+------+-------------------------------------------------+
      

      tp.part_id=3 gets us tp.partitions=p3. Good so far.

      Now, let's change the outer join to have two tables:

      mysql> explain partitions
          -> select * from t1 left join (tp join t2) on (t1.a=tp.a and tp.a=t2.b and tp.part_id=3);
      +------+-------------+-------+----------------+------+---------------+------+---------+------+------+--------------------------------------------------------+
      | id   | select_type | table | partitions     | type | possible_keys | key  | key_len | ref  | rows | Extra                                                  |
      +------+-------------+-------+----------------+------+---------------+------+---------+------+------+--------------------------------------------------------+
      |    1 | SIMPLE      | t1    | NULL           | ALL  | NULL          | NULL | NULL    | NULL |   10 |                                                        |
      |    1 | SIMPLE      | tp    | p0,p1,p2,p3,p4 | ALL  | NULL          | NULL | NULL    | NULL |   10 | Using where; Using join buffer (flat, BNL join)        |
      |    1 | SIMPLE      | t2    | NULL           | ALL  | NULL          | NULL | NULL    | NULL |   10 | Using where; Using join buffer (incremental, BNL join) |
      +------+-------------+-------+----------------+------+---------------+------+---------+------+------+--------------------------------------------------------+
      

      now, tp.partitions includes all partitions even if the ON expression has a condition that restricts it to one.

      Attachments

        Issue Links

          Activity

            Just to be 100% sure, let's see that the regular range optimizer is able to use the ON expression in this case:

            mysql> alter table tp add key(part_id);
            Query OK, 0 rows affected (0.28 sec)
            Records: 0  Duplicates: 0  Warnings: 0
            

            mysql> explain partitions select * from t1 left join (tp force index(part_id) join t2) on (t1.a=tp.a and tp.a=t2.b and tp.part_id=3);
            +------+-------------+-------+----------------+------+---------------+---------+---------+-------+------+-------------+
            | id   | select_type | table | partitions     | type | possible_keys | key     | key_len | ref   | rows | Extra       |
            +------+-------------+-------+----------------+------+---------------+---------+---------+-------+------+-------------+
            |    1 | SIMPLE      | t1    | NULL           | ALL  | NULL          | NULL    | NULL    | NULL  |   10 |             |
            |    1 | SIMPLE      | tp    | p0,p1,p2,p3,p4 | ref  | part_id       | part_id | 5       | const |    1 | Using where |
            |    1 | SIMPLE      | t2    | NULL           | ALL  | NULL          | NULL    | NULL    | NULL  |   10 | Using where |
            +------+-------------+-------+----------------+------+---------------+---------+---------+-------+------+-------------+
            3 rows in set (0.01 sec)
            

            Indeed, table tp now uses ref access on part_id.

            psergei Sergei Petrunia added a comment - Just to be 100% sure, let's see that the regular range optimizer is able to use the ON expression in this case: mysql> alter table tp add key(part_id); Query OK, 0 rows affected (0.28 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> explain partitions select * from t1 left join (tp force index(part_id) join t2) on (t1.a=tp.a and tp.a=t2.b and tp.part_id=3); +------+-------------+-------+----------------+------+---------------+---------+---------+-------+------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+----------------+------+---------------+---------+---------+-------+------+-------------+ | 1 | SIMPLE | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 10 | | | 1 | SIMPLE | tp | p0,p1,p2,p3,p4 | ref | part_id | part_id | 5 | const | 1 | Using where | | 1 | SIMPLE | t2 | NULL | ALL | NULL | NULL | NULL | NULL | 10 | Using where | +------+-------------+-------+----------------+------+---------------+---------+---------+-------+------+-------------+ 3 rows in set (0.01 sec) Indeed, table tp now uses ref access on part_id.
            psergei Sergei Petrunia added a comment - - edited

            ... Well, actually it's not the range optimizer. The execution does not enter SQL_SELECT::test_quick_select.
            If I change the query to be unusable by ref access, table tp will use full table scan:

            explain partitions 
            select * 
            from 
              t1 left join (tp force index (part_id) join t2) 
                  on (t1.a=tp.a and tp.a=t2.b and tp.part_id in (2,3));
            +------+-------------+-------+----------------+------+---------------+------+---------+------+------+--------------------------------------------------------+
            | id   | select_type | table | partitions     | type | possible_keys | key  | key_len | ref  | rows | Extra                                                  |
            +------+-------------+-------+----------------+------+---------------+------+---------+------+------+--------------------------------------------------------+
            |    1 | SIMPLE      | t1    | NULL           | ALL  | NULL          | NULL | NULL    | NULL |   10 |                                                        |
            |    1 | SIMPLE      | tp    | p0,p1,p2,p3,p4 | ALL  | part_id       | NULL | NULL    | NULL |   10 | Using where; Using join buffer (flat, BNL join)        |
            |    1 | SIMPLE      | t2    | NULL           | ALL  | NULL          | NULL | NULL    | NULL |   10 | Using where; Using join buffer (incremental, BNL join) |
            +------+-------------+-------+----------------+------+---------------+------+---------+------+------+--------------------------------------------------------+
            

            tp.possible_keys has part_id, though.

            psergei Sergei Petrunia added a comment - - edited ... Well, actually it's not the range optimizer. The execution does not enter SQL_SELECT::test_quick_select. If I change the query to be unusable by ref access, table tp will use full table scan: explain partitions select * from t1 left join (tp force index (part_id) join t2) on (t1.a=tp.a and tp.a=t2.b and tp.part_id in (2,3)); +------+-------------+-------+----------------+------+---------------+------+---------+------+------+--------------------------------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+----------------+------+---------------+------+---------+------+------+--------------------------------------------------------+ | 1 | SIMPLE | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 10 | | | 1 | SIMPLE | tp | p0,p1,p2,p3,p4 | ALL | part_id | NULL | NULL | NULL | 10 | Using where; Using join buffer (flat, BNL join) | | 1 | SIMPLE | t2 | NULL | ALL | NULL | NULL | NULL | NULL | 10 | Using where; Using join buffer (incremental, BNL join) | +------+-------------+-------+----------------+------+---------------+------+---------+------+------+--------------------------------------------------------+ tp.possible_keys has part_id, though.

            People

              psergei Sergei Petrunia
              psergei Sergei Petrunia
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.