[MDEV-10946] Partition pruning not working with LEFT JOIN Created: 2016-10-03  Updated: 2024-01-08

Status: Stalled
Project: MariaDB Server
Component/s: Optimizer, Partitioning
Affects Version/s: 10.1.16, 10.1.18, 10.2.2, 10.5
Fix Version/s: 10.6

Type: Bug Priority: Minor
Reporter: Valerii Kravchuk Assignee: Sergei Petrunia
Resolution: Unresolved Votes: 1
Labels: optimizer-feature, upstream

Issue Links:
Relates
relates to MDEV-17493 Partition pruning doesn't work for ne... Closed
relates to MDEV-8359 WHERE condition referring to inner ta... Stalled
Sprint: 10.2.3-2, 10.2.4-4, 10.2.4-1

 Description   

Partition pruning does not happen in case of LEFT JOIN in the following simple case:

drop table a, b, c;
create table a (
a_id bigint auto_increment not null,
d date,
b_id bigint,
c_id bigint,
primary key (a_id, d),
key(b_id, d),
key(c_id, d)
)
/*!50500 PARTITION BY RANGE COLUMNS(d)
(PARTITION p160709 VALUES LESS THAN ('2016-07-10') ENGINE = InnoDB,
PARTITION p160710 VALUES LESS THAN ('2016-07-11') ENGINE = InnoDB,
PARTITION p160711 VALUES LESS THAN ('2016-07-12') ENGINE = InnoDB,
PARTITION p160712 VALUES LESS THAN ('2016-07-13') ENGINE = InnoDB,
PARTITION p9999 VALUES LESS THAN (MAXVALUE) ENGINE = InnoDB) */;
 
create table b (
b_id bigint auto_increment not null,
d date,
primary key (b_id, d))
/*!50500 PARTITION BY RANGE COLUMNS(d)
(PARTITION p160709 VALUES LESS THAN ('2016-07-10') ENGINE = InnoDB,
PARTITION p160710 VALUES LESS THAN ('2016-07-11') ENGINE = InnoDB,
PARTITION p160711 VALUES LESS THAN ('2016-07-12') ENGINE = InnoDB,
PARTITION p160712 VALUES LESS THAN ('2016-07-13') ENGINE = InnoDB,
PARTITION p9999 VALUES LESS THAN (MAXVALUE) ENGINE = InnoDB) */;
 
create table c (
c_id bigint auto_increment not null,
d date,
primary key (c_id, d))
/*!50500 PARTITION BY RANGE COLUMNS(d)
(PARTITION p160709 VALUES LESS THAN ('2016-07-10') ENGINE = InnoDB,
PARTITION p160710 VALUES LESS THAN ('2016-07-11') ENGINE = InnoDB,
PARTITION p160711 VALUES LESS THAN ('2016-07-12') ENGINE = InnoDB,
PARTITION p160712 VALUES LESS THAN ('2016-07-13') ENGINE = InnoDB,
PARTITION p9999 VALUES LESS THAN (MAXVALUE) ENGINE = InnoDB) */;
 
insert into c(d) values ( date_add('2016-07-13', INTERVAL -floor(rand()*5) DAY));
insert into c(d) values ( date_add('2016-07-13', INTERVAL -floor(rand()*5) DAY));
replace into c(d) select date_add('2016-07-13', INTERVAL -floor(rand()*5) DAY) from c t1, c t2, c t3, c t4, c t5, c t6, c t7, c t8, c t9, c t10, c t11, c t12, c t13, c t14, c t15, c t16, c t17, c t18;
 
insert into b(d) values ( date_add('2016-07-13', INTERVAL -floor(rand()*5) DAY));
insert into b(d) values ( date_add('2016-07-13', INTERVAL -floor(rand()*5) DAY));
replace into b(d) select date_add('2016-07-13', INTERVAL -floor(rand()*5) DAY) from b t1, b t2, b t3, b t4, b t5, b t6, b t7, b t8, b t9, b t10, b t11, b t12, b t13, b t14, b t15, b t16, b t17, b t18;
 
insert into a(d, b_id, c_id) select b.d, b_id, c_id from b LEFT JOIN c on b.d = c.d and mod(c_id, 3) = 0 limit 1000000;

Now with these tables and data we have the following for INNER JOIN:

MariaDB [test]> select version();
+-----------------+
| version()       |
+-----------------+
| 10.1.18-MariaDB |
+-----------------+
1 row in set (0.00 sec)
 
MariaDB [test]> explain partitions select a.a_id , a.b_id, c.c_id from a JOIN b on b.b_id = a.b_id and a.d = b.d INNER JOIN c on a.c_id = c.c_id and a.d = c.d where a.d BETWEEN '2016-07-11' AND '2016-07-11' LIMIT 10;
+------+-------------+-------+------------+--------+---------------+---------+---------+-------------------+------+--------------------------+
| id   | select_type | table | partitions | type   | possible_keys | key     | key_len | ref               | rows | Extra                    |
+------+-------------+-------+------------+--------+---------------+---------+---------+-------------------+------+--------------------------+
|    1 | SIMPLE      | a     | p160711    | ALL    | b_id,c_id     | NULL    | NULL    | NULL              |    2 | Using where              |
|    1 | SIMPLE      | b     | p160711    | eq_ref | PRIMARY       | PRIMARY | 11      | test.a.b_id,const |    1 | Using where; Using index |
|    1 | SIMPLE      | c     | p160711    | eq_ref | PRIMARY       | PRIMARY | 11      | test.a.c_id,const |    1 | Using where; Using index |
+------+-------------+-------+------------+--------+---------------+---------+---------+-------------------+------+--------------------------+
3 rows in set (0.00 sec)

But if we use LEFT JOIN to c:

MariaDB [test]> explain partitions select a.a_id , a.b_id, c.c_id from a JOIN b on b.b_id = a.b_id and a.d = b.d LEFT JOIN c on a.c_id = c.c_id and a.d = c.d where a.d BETWEEN '2016-07-11' AND '2016-07-11' LIMIT 10;
+------+-------------+-------+---------------------------------------+--------+---------------+---------+---------+----------------------+------+--------------------------+
| id   | select_type | table | partitions                            | type   | possible_keys | key     | key_len | ref                  | rows | Extra                    |
+------+-------------+-------+---------------------------------------+--------+---------------+---------+---------+----------------------+------+--------------------------+
|    1 | SIMPLE      | a     | p160711                               | ALL    | b_id          | NULL    | NULL    | NULL                 |    2 | Using where              |
|    1 | SIMPLE      | b     | p160711                               | eq_ref | PRIMARY       | PRIMARY | 11      | test.a.b_id,const    |    1 | Using where; Using index |
|    1 | SIMPLE      | c     | p160709,p160710,p160711,p160712,p9999 | eq_ref | PRIMARY       | PRIMARY | 11      | test.a.c_id,test.a.d |    1 | Using where; Using index |
+------+-------------+-------+---------------------------------------+--------+---------------+---------+---------+----------------------+------+--------------------------+
3 rows in set (0.00 sec)

So, pruning do not happen while accessing c table. There is an easy workaround - just add the condition to ON clause in any of the following ways:

MariaDB [test]> explain partitions select a.a_id , a.b_id, c.c_id from a JOIN b on b.b_id = a.b_id and a.d = b.d LEFT JOIN c on a.c_id = c.c_id and a.d = c.d and c.d BETWEEN '2016-07-11' AND '2016-07-11' where a.d BETWEEN '2016-07-11' AND '2016-07-11' LIMIT 10;
+------+-------------+-------+------------+--------+---------------+---------+---------+-------------------+------+--------------------------+
| id   | select_type | table | partitions | type   | possible_keys | key     | key_len | ref               | rows | Extra                    |
+------+-------------+-------+------------+--------+---------------+---------+---------+-------------------+------+--------------------------+
|    1 | SIMPLE      | a     | p160711    | ALL    | b_id          | NULL    | NULL    | NULL              |    2 | Using where              |
|    1 | SIMPLE      | b     | p160711    | eq_ref | PRIMARY       | PRIMARY | 11      | test.a.b_id,const |    1 | Using where; Using index |
|    1 | SIMPLE      | c     | p160711    | eq_ref | PRIMARY       | PRIMARY | 11      | test.a.c_id,const |    1 | Using where; Using index |
+------+-------------+-------+------------+--------+---------------+---------+---------+-------------------+------+--------------------------+
3 rows in set (0.00 sec)
 
MariaDB [test]> explain partitions select a.a_id , a.b_id, c.c_id from a JOIN b on b.b_id = a.b_id and a.d = b.d LEFT JOIN c on a.c_id = c.c_id and a.d = c.d and a.d BETWEEN '2016-07-11' AND '2016-07-11' where a.d BETWEEN '2016-07-11' AND '2016-07-11' LIMIT 10;
+------+-------------+-------+------------+--------+---------------+---------+---------+-------------------+------+--------------------------+
| id   | select_type | table | partitions | type   | possible_keys | key     | key_len | ref               | rows | Extra                    |
+------+-------------+-------+------------+--------+---------------+---------+---------+-------------------+------+--------------------------+
|    1 | SIMPLE      | a     | p160711    | ALL    | b_id          | NULL    | NULL    | NULL              |    2 | Using where              |
|    1 | SIMPLE      | b     | p160711    | eq_ref | PRIMARY       | PRIMARY | 11      | test.a.b_id,const |    1 | Using where; Using index |
|    1 | SIMPLE      | c     | p160711    | eq_ref | PRIMARY       | PRIMARY | 11      | test.a.c_id,const |    1 | Using where; Using index |
+------+-------------+-------+------------+--------+---------------+---------+---------+-------------------+------+--------------------------+
3 rows in set (0.00 sec)

There are reasons to think that upstream MySQL (5.7.15) is also affected.



 Comments   
Comment by Valerii Kravchuk [ 2016-10-03 ]

Reported for upstream MySQL as http://bugs.mysql.com/bug.php?id=83248. Version 8.0.0 is also affected.

Comment by Sergei Petrunia [ 2016-10-12 ]

So,
partitioning on table c is: PARTITION BY RANGE COLUMNS(d)

The WHERE condition refers to table a:

where a.d BETWEEN '2016-07-11' AND '2016-07-11' 

the ON expression on the LEFT JOIN is:

LEFT JOIN c on a.c_id = c.c_id and a.d = c.d 

and the issue is that the ON expression has a.d = c.d, but this is not used for partition pruning.

Comment by Sergei Petrunia [ 2016-10-12 ]

See also MDEV-8359... it's a similar, but not exactly the same issue.

Comment by Sergei Petrunia [ 2016-10-13 ]

Stripping the testcase to the bare minimum:

create table t10 (a int);
insert into t10 select a from test.ten;

create table t11 (a int, b int, key(b)) partition by hash(a) partitions 4;
insert into t11 select a,a from test.one_k;

explain partitions 
select * from t10 left join t11 on t11.a=t10.a
where 
t10.a in (1,2);
+------+-------------+-------+-------------+------+---------------+------+---------+------+------+-------------------------------------------------+
| id   | select_type | table | partitions  | type | possible_keys | key  | key_len | ref  | rows | Extra                                           |
+------+-------------+-------+-------------+------+---------------+------+---------+------+------+-------------------------------------------------+
|    1 | SIMPLE      | t10   | NULL        | ALL  | NULL          | NULL | NULL    | NULL |   10 | Using where                                     |
|    1 | SIMPLE      | t11   | p0,p1,p2,p3 | ALL  | NULL          | NULL | NULL    | NULL | 1000 | Using where; Using join buffer (flat, BNL join) |
+------+-------------+-------+-------------+------+---------------+------+---------+------+------+-------------------------------------------------+

The above uses 4 partitions.

if one puts the IN(...) into ON it will use just two:

explain partitions 
select * from t10 left join t11 on t11.a=t10.a and t11.a in (1,2)
where 
t10.a in (1,2);
+------+-------------+-------+------------+------+---------------+------+---------+------+------+-------------------------------------------------+
| id   | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | Extra                                           |
+------+-------------+-------+------------+------+---------------+------+---------+------+------+-------------------------------------------------+
|    1 | SIMPLE      | t10   | NULL       | ALL  | NULL          | NULL | NULL    | NULL |   10 | Using where                                     |
|    1 | SIMPLE      | t11   | p1,p2      | ALL  | NULL          | NULL | NULL    | NULL |  500 | Using where; Using join buffer (flat, BNL join) |
+------+-------------+-------+------------+------+---------------+------+---------+------+------+-------------------------------------------------+

and putting IN into the ON clause doesn't change the query semantics. Now, need to generalize this...

Comment by Sergei Petrunia [ 2016-10-13 ]

The proposed fix for MDEV-8359 doesn't fix this issue.

Comment by Sergei Petrunia [ 2016-10-15 ]

Let's consider

SELECT * FROM t1 LEFT JOIN t2 ON t1.a=t2.a
WHERE 
  cond(t1.a) AND ...

The question is, can we use cond(t1.a) and t1.a=t2.a to infer that we don't need rows that do not match cond(t2.a) ?

In general, one can't move conditions from WHERE into ON. (Suppose, it's otherwise. This means we don't care whether an NULL-complemented row was generated, and LEFT JOIN is the same as INNER JOIN. This holds only for certain special cases).

On the other hand, we only need to read rows of table t2 that may form an ON-expression match with a row from t1. Moreover, the row from t1 must have a chance to contribute to SELECT's output.

In the query we're looking at:

  • rows in t1 that do not satisfy cond(t1.a) will not contribute to SELECT's output.
  • The equality t1.a=t2.a projects this restriction onto the inner table:
  • Rows in t2 that do not satisfy cond(t2.a) do not need to be read.

That is,

  • We can look at the WHERE, take a restriction on an OUTER table (not INNER!),
  • Then use LEFT JOIN ON expression's multiple equalities to map this restriction onto the inner table
  • and the mapped restriction can be used to limit which rows are read from INNER tables (again: parts of WHERE that already refer to the INNER table cannot be used)
Comment by Sergei Petrunia [ 2016-10-15 ]

How does this apply to nested joins...

SELECT ...
FROM 
  t1 LEFT JOIN (
    t2 LEFT JOIN t3 ON on_expr2
  ) ON on_expr1
WHERE 
  where_cond

Can we use parts of where_cond to limit rows read for t3?
It seems we can, as long as both on_expr1 and on_expr2 have the equalities for doing so.
This seems to be too complicated.

We can use on_expr1 to limit rows read from t3.

Comment by Arnaud Adant [ 2017-09-18 ]

@psergey : is it difficult to fix ?

Comment by Sergei Petrunia [ 2018-10-19 ]

Have been investigating this MDEV, and found another one: MDEV-17493. It's a different issue but it's worth mentioning. That issue affects upstream, too

Comment by Sergei Petrunia [ 2018-10-22 ]
  • Have been investigating this.
  • Filed MDEV-17493 and MDEV-17518 - these are related shortcomings of outer join optimization
  • They should be done before this MDEV.
  • I have a patch for MDEV-17518.
Comment by Sergei Petrunia [ 2018-10-23 ]

Patches for MDEV-17493 and MDEV-17518 are now available

Comment by Sergei Petrunia [ 2018-11-06 ]

Fixed for MDEV-17493 and MDEV-17518 are now pushed. They were a pre-requisites for a fix for this issue.

Generated at Thu Feb 08 07:46:07 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.