Details
-
Bug
-
Status: Open (View Workflow)
-
Minor
-
Resolution: Unresolved
-
11.8.2
-
None
Description
Hi, MariaDB developers,
Thank you for taking the time to review this report. I'd like to suggest an optimization opportunity regarding SEMI JOIN operations.
SEMI JOINs often perform better with FirstMatch(), yet the optimizer of MariaDB does not choose it. Since these joins are frequent, prioritizing this strategy could yield broad speedups.
You can reproduce it as follows. As you can see, the SEMI JOIN query consumes 42.779 sec, even though there is only one row in t1;
CREATE TABLE t1(c1 INT8); |
CREATE TABLE t2(c2 INT8); |
INSERT INTO t2 SELECT seq FROM seq_1_to_10000000; |
|
|
-- FistMatch at the beginning value
|
INSERT INTO t1 VALUES(1); |
MariaDB [test]> SELECT * FROM t1 WHERE EXISTS (SELECT * FROM t2 WHERE c1=c2); |
+------+ |
| c3 |
|
+------+ |
| 1 |
|
+------+ |
1 row in set (42.779 sec) |
|
|
-- FistMatch at the middle value
|
DELETE FROM t1; |
INSERT INTO t1 VALUES(5000000); |
MariaDB [test]> SELECT * FROM t1 WHERE EXISTS (SELECT * FROM t2 WHERE c1=c2); |
+---------+ |
| c1 |
|
+---------+ |
| 5000000 |
|
+---------+ |
1 row in set (41.751 sec) |
|
|
-- FistMatch at the ending value
|
DELETE FROM t1; |
INSERT INTO t1 VALUES(10000000); |
MariaDB [test]> SELECT * FROM t1 WHERE EXISTS (SELECT * FROM t2 WHERE c1=c2); |
+----------+ |
| c1 |
|
+----------+ |
| 10000000 |
|
+----------+ |
1 row in set (41.898 sec) |
|
|
MariaDB [test]> explain SELECT * FROM t1 WHERE EXISTS (SELECT * FROM t2 WHERE c1=c2) \G |
*************************** 1. row ***************************
|
id: 1
|
select_type: PRIMARY |
table: t1 |
type: ALL |
possible_keys: NULL |
key: NULL |
key_len: NULL |
ref: NULL |
rows: 1 |
Extra:
|
*************************** 2. row ***************************
|
id: 1
|
select_type: PRIMARY |
table: <subquery2> |
type: eq_ref
|
possible_keys: distinct_key
|
key: distinct_key |
key_len: 8
|
ref: func
|
rows: 1 |
Extra:
|
*************************** 3. row ***************************
|
id: 2
|
select_type: MATERIALIZED
|
table: t2 |
type: ALL |
possible_keys: NULL |
key: NULL |
key_len: NULL |
ref: NULL |
rows: 9728224 |
Extra:
|
3 rows in set (0.011 sec) |
The ANTI JOIN is a positive case, which may has adopted FirstMatch(). As shown below, the time consumed varies depending on the position of the match. Although the query plan does not specify whether FirstMatch() is used, I presume that it is used.
-- FistMatch at the beginning value
|
INSERT INTO t1 VALUES(1); |
SELECT * FROM t1 WHERE NOT EXISTS (SELECT * FROM t2 WHERE c1=c2); |
Empty set (0.012 sec) |
|
|
-- FistMatch at the middle value
|
DELETE FROM t1; |
INSERT INTO t1 VALUES(5000000); |
SELECT * FROM t1 WHERE NOT EXISTS (SELECT * FROM t2 WHERE c1=c2); |
Empty set (1.135 sec) |
|
|
-- FistMatch at the ending value
|
DELETE FROM t1; |
INSERT INTO t1 VALUES(10000000); |
SELECT * FROM t1 WHERE NOT EXISTS (SELECT * FROM t2 WHERE c1=c2); |
Empty set (2.284 sec) |
|
|
|
|
explain SELECT * FROM t1 WHERE NOT EXISTS (SELECT * FROM t2 WHERE c1=c2); |
*************************** 1. row ***************************
|
id: 1
|
select_type: PRIMARY |
table: t1 |
type: ALL |
possible_keys: NULL |
key: NULL |
key_len: NULL |
ref: NULL |
rows: 1 |
Extra: Using where |
*************************** 2. row ***************************
|
id: 2
|
select_type: DEPENDENT SUBQUERY
|
table: t2 |
type: ALL |
possible_keys: NULL |
key: NULL |
key_len: NULL |
ref: NULL |
rows: 9728224 |
Extra: Using where |
2 rows in set (0.001 sec) |
Thus, my expected behavior is that the SEMI JOIN query should consume less time. For example, other DBMSs like MySQL has adopted FirstMatch() in SEMI JOIN, which only consumes 1/10 time of MairaDB when executing the same SEMI JOIN query. I am sorry that I used this comparison to illustrate this report. In my opinion, both MariaDB and MySQL are among the best RDBMSs.
mysql> SELECT * FROM t1 WHERE EXISTS (SELECT * FROM t2 WHERE c1=c2); |
+------+ |
| c1 |
|
+------+ |
| 1 |
|
+------+ |
1 row in set (4.060 sec) |
mysql> explain SELECT * FROM t1 WHERE EXISTS (SELECT * FROM t2 WHERE c1=c2) \G |
*************************** 1. row ***************************
|
id: 1
|
select_type: SIMPLE
|
table: t1 |
partitions: NULL |
type: ALL |
possible_keys: NULL |
key: NULL |
key_len: NULL |
ref: NULL |
rows: 1 |
filtered: 100.00
|
Extra: NULL |
*************************** 2. row ***************************
|
id: 1
|
select_type: SIMPLE
|
table: t2 |
partitions: NULL |
type: ALL |
possible_keys: NULL |
key: NULL |
key_len: NULL |
ref: NULL |
rows: 9746243 |
filtered: 10.00
|
Extra: Using where; FirstMatch(t1); Using join buffer (hash join) |
2 rows in set, 2 warnings (0.001 sec) |