[MDEV-28991] optimizer aware that inner join implies not null Created: 2022-07-01  Updated: 2022-07-08

Status: Open
Project: MariaDB Server
Component/s: Server
Fix Version/s: None

Type: Task Priority: Minor
Reporter: MG Assignee: Sergei Petrunia
Resolution: Unresolved Votes: 0
Labels: optimizer

Attachments: File MDEV-28991_sql_dump.xz    

 Description   

You can reduce Handler_read_next in an INNER JOIN by specifying the column to be joined IS NOT NULL in the where clause, which is implicitly true:

-- 15% execution time reduction with large number of matching rows
select count(*) from t1 inner join t2 on t1.id=t2.a where t2.b=98765;
select count(*) from t1 inner join t2 on t1.id=t2.a where t2.b=98765 and t2.a is not null;

-- 80% execution time reduction with smaller number of matching rows
select count(*) from t1 inner join t2 on t1.id=t2.a where t2.b=54321;
select count(*) from t1 inner join t2 on t1.id=t2.a where t2.b=54321 and t2.a is not null;

I kept buffer pool as 128MB since the tables are pretty small on disk and the results are more pronounced when not all data fits in memory.

I wanted to include a way to generate this with sequence tables (which is what I used) but a dump file would be a lot easier and takes about 5 minutes to load on my pretty small instance. Is there a way to send a 300+MB file?



 Comments   
Comment by MG [ 2022-07-08 ]

The gist is that the explicit IS NOT NULL criteria, on the column to be joined to the left, causes more of the existing index to be used and reduces Handler_read_next.

I had previously uploaded a larger dump where effects were more pronounced but I have a minified dump today with these results:

MariaDB [mdev28991]> explain select count(*) from t1 inner join t2 on t1.id=t2.a where t2.b=54321;
+------+-------------+-------+--------+---------------+---------+---------+----------------+---------+--------------------------+
| id   | select_type | table | type   | possible_keys | key     | key_len | ref            | rows    | Extra                    |
+------+-------------+-------+--------+---------------+---------+---------+----------------+---------+--------------------------+
|    1 | SIMPLE      | t2    | ref    | ba            | ba      | 5       | const          | 1373638 | Using where; Using index |
|    1 | SIMPLE      | t1    | eq_ref | PRIMARY       | PRIMARY | 4       | mdev28991.t2.a | 1       | Using index              |
+------+-------------+-------+--------+---------------+---------+---------+----------------+---------+--------------------------+
2 rows in set (0.001 sec)
 
MariaDB [mdev28991]> select count(*) from t1 inner join t2 on t1.id=t2.a where t2.b=54321;
+----------+
| count(*) |
+----------+
|      765 |
+----------+
1 row in set (0.370 sec)
 
MariaDB [mdev28991]> select count(*) from t1 inner join t2 on t1.id=t2.a where t2.b=54321;
+----------+
| count(*) |
+----------+
|      765 |
+----------+
1 row in set (0.371 sec)
 
MariaDB [mdev28991]> explain select count(*) from t1 inner join t2 on t1.id=t2.a where t2.b=54321 and t2.a is not null;
+------+-------------+-------+--------+---------------+---------+---------+----------------+--------+--------------------------+
| id   | select_type | table | type   | possible_keys | key     | key_len | ref            | rows   | Extra                    |
+------+-------------+-------+--------+---------------+---------+---------+----------------+--------+--------------------------+
|    1 | SIMPLE      | t2    | range  | ba            | ba      | 10      | NULL           | 122522 | Using where; Using index |
|    1 | SIMPLE      | t1    | eq_ref | PRIMARY       | PRIMARY | 4       | mdev28991.t2.a | 1      | Using index              |
+------+-------------+-------+--------+---------------+---------+---------+----------------+--------+--------------------------+
2 rows in set (0.001 sec)
 
MariaDB [mdev28991]> select count(*) from t1 inner join t2 on t1.id=t2.a where t2.b=54321 and t2.a is not null;
+----------+
| count(*) |
+----------+
|      765 |
+----------+
1 row in set (0.161 sec)

Comment by MG [ 2022-07-08 ]

By the way, I am not sure whether it would matter that t1.id is PK (not null), since join wouldn't match those rows:

MariaDB [mdev28991]> alter table t1 drop primary key, change id id int, add index ix_id (`id`); insert into t1 values (null);
Query OK, 2314395 rows affected (19.013 sec)
Records: 2314395  Duplicates: 0  Warnings: 0
 
Query OK, 1 row affected (0.005 sec)
 
MariaDB [mdev28991]> select count(*) from t1 inner join t2 on (t1.id=t2.a) where t2.a is null;
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.865 sec)
 
MariaDB [mdev28991]> select count(*) from t1 inner join t2 on (t1.id=t2.a or (t1.id is null and t2.a is null)) where t2.a is null;
+----------+
| count(*) |
+----------+
|  1500002 |
+----------+
1 row in set (4.818 sec)

Generated at Thu Feb 08 10:05:02 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.