[MDEV-16337] Setting join_cache_level=4 changes efficient ref access plan to an inefficient hash join - Jira

Details

Type: Bug
Status: Open (View Workflow)
Priority: Major
Resolution: Unresolved
Affects Version/s: 5.5(EOL), 10.0(EOL), 10.1(EOL), 10.2(EOL), 10.3(EOL)
Fix Version/s: 10.4(EOL)
Component/s: Optimizer
Labels:
None

Description

Creating the dataset

create table ten(a int);

insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);

create table one_k(a int primary key);

insert into one_k select A.a + B.a* 10 + C.a * 100 from ten A, ten B, ten C;

create table t1 (a int, b int, c int, key(a));

create table t10 like t1;

insert into t10 select A.a +1000*B.a, A.a +1000*B.a,A.a +1000*B.a from one_k A, one_k B;

analyze table ten;

analyze table t10;

Good query plan

MariaDB [test]>  set join_cache_level=2;

Query OK, 0 rows affected (0.000 sec)

MariaDB [test]>

MariaDB [test]> explain select * from ten, t10 where t10.a=ten.a;

+------+-------------+-------+------+---------------+------+---------+------------+------+-------------+

| id   | select_type | table | type | possible_keys | key  | key_len | ref        | rows | Extra       |

+------+-------------+-------+------+---------------+------+---------+------------+------+-------------+

|    1 | SIMPLE      | ten   | ALL  | NULL          | NULL | NULL    | NULL       |   10 | Using where |

|    1 | SIMPLE      | t10   | ref  | a             | a    | 5       | test.ten.a |    1 |             |

+------+-------------+-------+------+---------------+------+---------+------------+------+-------------+

2 rows in set (0.001 sec)

These are default settings.
In this case

we read 10 rows from table ten
and for each we make an index lookup into t10 where we expect to find 1 row.

Bad query plan

MariaDB [test]>  set join_cache_level=4;

Query OK, 0 rows affected (0.000 sec)

MariaDB [test]> explain select * from ten, t10 where t10.a=ten.a;

+------+-------------+-------+----------+---------------+---------+---------+------------+--------+-------------------------------------+

| id   | select_type | table | type     | possible_keys | key     | key_len | ref        | rows   | Extra                               |

+------+-------------+-------+----------+---------------+---------+---------+------------+--------+-------------------------------------+

|    1 | SIMPLE      | ten   | ALL      | NULL          | NULL    | NULL    | NULL       |     10 | Using where                         |

|    1 | SIMPLE      | t10   | hash_ALL | a             | #hash#a | 5       | test.ten.a | 997980 | Using join buffer (flat, BNLH join) |

+------+-------------+-------+----------+---------------+---------+---------+------------+--------+-------------------------------------+

2 rows in set (0.001 sec)

For this case

it wants to read 10 rows from table ten
put them into a buffer
create a hash index on the buffer
but then do a full table scan on t10 and read 1M rows

This is obviously very inefficient.

Attachments

Issue Links

relates to

MDEV-16307 Incorrect results when using BNLH join instead of BNL join with views

Closed

MDEV-22383 Use Block Nested Loops Hash Join by default when appropriate

Stalled

MDEV-35855 Make it possible to enable BNL-H join without hitting regressions

Open

Activity

Ascending order - Click to sort in descending order

Varun Gupta (Inactive) created issue - 2018-05-30 14:27

Varun Gupta (Inactive) made changes - 2018-05-30 14:27

Field	Original Value	New Value
Fix Version/s		5.5 [ 15800 ]
Fix Version/s		10.0 [ 16000 ]
Fix Version/s		10.1 [ 16100 ]
Fix Version/s		10.2 [ 14601 ]
Fix Version/s		10.3 [ 22126 ]

Varun Gupta (Inactive) made changes - 2018-05-30 14:40

Description

Creating the dataset

{code:sql}
create table ten(a int);
insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
create table t1 (a int, b int, c int, key(a));
create table t10 like t1;
insert into t10 select A.a +1000*B.a, A.a +1000*B.a,A.a +1000*B.a from one_k A, one_k B;
{code}

{code:sql}
analyze table ten;
analyze table t10;
{code}

Good query plan
{noformat}
MariaDB [test]> set join_cache_level=2;
Query OK, 0 rows affected (0.000 sec)

MariaDB [test]>
MariaDB [test]> explain select * from ten, t10 where t10.a=ten.a;
+------+-------------+-------+------+---------------+------+---------+------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+------+---------------+------+---------+------------+------+-------------+
| 1 | SIMPLE | ten | ALL | NULL | NULL | NULL | NULL | 10 | Using where |
| 1 | SIMPLE | t10 | ref | a | a | 5 | test.ten.a | 1 | |
+------+-------------+-------+------+---------------+------+---------+------------+------+-------------+
2 rows in set (0.001 sec)

These are default settings, we read 10 rows from table ten
and for each we make an index lookup into t10 where we expect to find 1 row.
{noformat}

Bad query plan
{noformat}
MariaDB [test]> set join_cache_level=4;
Query OK, 0 rows affected (0.000 sec)

MariaDB [test]> explain select * from ten, t10 where t10.a=ten.a;
+------+-------------+-------+----------+---------------+---------+---------+------------+--------+-------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+----------+---------------+---------+---------+------------+--------+-------------------------------------+
| 1 | SIMPLE | ten | ALL | NULL | NULL | NULL | NULL | 10 | Using where |
| 1 | SIMPLE | t10 | hash_ALL | a | #hash#a | 5 | test.ten.a | 997980 | Using join buffer (flat, BNLH join) |
+------+-------------+-------+----------+---------------+---------+---------+------------+--------+-------------------------------------+
2 rows in set (0.001 sec)

For this case
* it wants to read 10 rows from table ten;
* put them into a buffer
* create a hash index on the buffer
* but then do a full table scan on t10 and read 1M rows.

This is obviously very inefficient.
{noformat}

Varun Gupta (Inactive) made changes - 2018-05-30 14:42

Description

Creating the dataset

{code:sql}
create table ten(a int);
insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
create table t1 (a int, b int, c int, key(a));
create table t10 like t1;
insert into t10 select A.a +1000*B.a, A.a +1000*B.a,A.a +1000*B.a from one_k A, one_k B;
{code}

{code:sql}
analyze table ten;
analyze table t10;
{code}

Good query plan
{noformat}
MariaDB [test]> set join_cache_level=2;
Query OK, 0 rows affected (0.000 sec)

MariaDB [test]>
MariaDB [test]> explain select * from ten, t10 where t10.a=ten.a;
+------+-------------+-------+------+---------------+------+---------+------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+------+---------------+------+---------+------------+------+-------------+
| 1 | SIMPLE | ten | ALL | NULL | NULL | NULL | NULL | 10 | Using where |
| 1 | SIMPLE | t10 | ref | a | a | 5 | test.ten.a | 1 | |
+------+-------------+-------+------+---------------+------+---------+------------+------+-------------+
2 rows in set (0.001 sec)
{noformat}

These are default settings.
In this case
* we read 10 rows from table ten
* and for each we make an index lookup into t10 where we expect to find 1 row.

Bad query plan
{noformat}
MariaDB [test]> set join_cache_level=4;
Query OK, 0 rows affected (0.000 sec)

MariaDB [test]> explain select * from ten, t10 where t10.a=ten.a;
+------+-------------+-------+----------+---------------+---------+---------+------------+--------+-------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+----------+---------------+---------+---------+------------+--------+-------------------------------------+
| 1 | SIMPLE | ten | ALL | NULL | NULL | NULL | NULL | 10 | Using where |
| 1 | SIMPLE | t10 | hash_ALL | a | #hash#a | 5 | test.ten.a | 997980 | Using join buffer (flat, BNLH join) |
+------+-------------+-------+----------+---------------+---------+---------+------------+--------+-------------------------------------+
2 rows in set (0.001 sec)
{noformat}
For this case
* it wants to read 10 rows from table ten
* put them into a buffer
* create a hash index on the buffer
* but then do a full table scan on t10 and read 1M rows

This is obviously very inefficient.

Varun Gupta (Inactive) made changes - 2018-05-30 14:43

Link

This issue blocks ~~MDEV-15253~~ [ ~~MDEV-15253~~ ]

Varun Gupta (Inactive) made changes - 2018-05-30 14:44

Link

This issue relates to ~~MDEV-16307~~ [ ~~MDEV-16307~~ ]

Varun Gupta (Inactive) made changes - 2018-05-30 14:48

Summary

Picking an efficient plan by changing ref access to hash join

Setting join_cache_level=4 changes efficient ref access plan to an inefficient hash join

Varun Gupta (Inactive) made changes - 2018-06-07 18:53

Link

This issue blocks ~~MDEV-15253~~ [ ~~MDEV-15253~~ ]

Sergei Golubchik made changes - 2019-03-29 12:05

Fix Version/s

10.4 [ 22408 ]

Sergei Petrunia made changes - 2020-05-12 17:23

Link

This issue relates to MDEV-22383 [ MDEV-22383 ]

Julien Fritsch made changes - 2020-12-01 12:29

Fix Version/s

5.5 [ 15800 ]

Julien Fritsch made changes - 2020-12-01 12:34

Fix Version/s

10.0 [ 16000 ]

Julien Fritsch made changes - 2020-12-01 12:39

Fix Version/s

10.1 [ 16100 ]

Julien Fritsch made changes - 2021-03-19 14:16

Assignee

Varun Gupta [ varun ]

Sergei Petrunia [ psergey ]

Sergei Golubchik made changes - 2021-12-06 21:33

Workflow

MariaDB v3 [ 87573 ]

MariaDB v4 [ 140804 ]

Ralf Gebhardt made changes - 2022-08-04 08:43

Fix Version/s

10.2 [ 14601 ]

Julien Fritsch made changes - 2023-04-27 14:26

Fix Version/s

10.3 [ 22126 ]

Sergei Petrunia made changes - 2025-02-18 13:45

Description

Creating the dataset

{code:sql}
create table ten(a int);
insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
create table t1 (a int, b int, c int, key(a));
create table t10 like t1;
insert into t10 select A.a +1000*B.a, A.a +1000*B.a,A.a +1000*B.a from one_k A, one_k B;
{code}

{code:sql}
analyze table ten;
analyze table t10;
{code}

Good query plan
{noformat}
MariaDB [test]> set join_cache_level=2;
Query OK, 0 rows affected (0.000 sec)

MariaDB [test]>
MariaDB [test]> explain select * from ten, t10 where t10.a=ten.a;
+------+-------------+-------+------+---------------+------+---------+------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+------+---------------+------+---------+------------+------+-------------+
| 1 | SIMPLE | ten | ALL | NULL | NULL | NULL | NULL | 10 | Using where |
| 1 | SIMPLE | t10 | ref | a | a | 5 | test.ten.a | 1 | |
+------+-------------+-------+------+---------------+------+---------+------------+------+-------------+
2 rows in set (0.001 sec)
{noformat}

These are default settings.
In this case
* we read 10 rows from table ten
* and for each we make an index lookup into t10 where we expect to find 1 row.

Bad query plan
{noformat}
MariaDB [test]> set join_cache_level=4;
Query OK, 0 rows affected (0.000 sec)

MariaDB [test]> explain select * from ten, t10 where t10.a=ten.a;
+------+-------------+-------+----------+---------------+---------+---------+------------+--------+-------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+----------+---------------+---------+---------+------------+--------+-------------------------------------+
| 1 | SIMPLE | ten | ALL | NULL | NULL | NULL | NULL | 10 | Using where |
| 1 | SIMPLE | t10 | hash_ALL | a | #hash#a | 5 | test.ten.a | 997980 | Using join buffer (flat, BNLH join) |
+------+-------------+-------+----------+---------------+---------+---------+------------+--------+-------------------------------------+
2 rows in set (0.001 sec)
{noformat}
For this case
* it wants to read 10 rows from table ten
* put them into a buffer
* create a hash index on the buffer
* but then do a full table scan on t10 and read 1M rows

This is obviously very inefficient.

Creating the dataset

{code:sql}
create table ten(a int);
insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
create table one_k(a int primary key);
insert into one_k select A.a + B.a* 10 + C.a * 100 from ten A, ten B, ten C;

create table t1 (a int, b int, c int, key(a));
create table t10 like t1;
insert into t10 select A.a +1000*B.a, A.a +1000*B.a,A.a +1000*B.a from one_k A, one_k B;
{code}

{code:sql}
analyze table ten;
analyze table t10;
{code}

Good query plan
{noformat}
MariaDB [test]> set join_cache_level=2;
Query OK, 0 rows affected (0.000 sec)

MariaDB [test]>
MariaDB [test]> explain select * from ten, t10 where t10.a=ten.a;
+------+-------------+-------+------+---------------+------+---------+------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+------+---------------+------+---------+------------+------+-------------+
| 1 | SIMPLE | ten | ALL | NULL | NULL | NULL | NULL | 10 | Using where |
| 1 | SIMPLE | t10 | ref | a | a | 5 | test.ten.a | 1 | |
+------+-------------+-------+------+---------------+------+---------+------------+------+-------------+
2 rows in set (0.001 sec)
{noformat}

These are default settings.
In this case
* we read 10 rows from table ten
* and for each we make an index lookup into t10 where we expect to find 1 row.

Bad query plan
{noformat}
MariaDB [test]> set join_cache_level=4;
Query OK, 0 rows affected (0.000 sec)

MariaDB [test]> explain select * from ten, t10 where t10.a=ten.a;
+------+-------------+-------+----------+---------------+---------+---------+------------+--------+-------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+----------+---------------+---------+---------+------------+--------+-------------------------------------+
| 1 | SIMPLE | ten | ALL | NULL | NULL | NULL | NULL | 10 | Using where |
| 1 | SIMPLE | t10 | hash_ALL | a | #hash#a | 5 | test.ten.a | 997980 | Using join buffer (flat, BNLH join) |
+------+-------------+-------+----------+---------------+---------+---------+------------+--------+-------------------------------------+
2 rows in set (0.001 sec)
{noformat}
For this case
* it wants to read 10 rows from table ten
* put them into a buffer
* create a hash index on the buffer
* but then do a full table scan on t10 and read 1M rows

This is obviously very inefficient.

Sergei Petrunia made changes - 2025-03-17 09:02

Link

This issue relates to MDEV-35855 [ MDEV-35855 ]

People

Assignee:: Sergei Petrunia

Reporter:: Varun Gupta (Inactive)

Votes:: 2 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 2018-05-30 14:27

Updated:: 2025-03-17 09:02

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Details

Description

Attachments

Issue Links

Activity

People

Dates

Git Integration