[MDEV-30514] Different plan and slower query with histogram type JSON_HB Created: 2023-01-30  Updated: 2023-02-14  Resolved: 2023-02-07

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 10.8, 10.9, 10.10, 10.11
Fix Version/s: N/A

Type: Bug Priority: Critical
Reporter: Elena Stepanova Assignee: Sergei Petrunia
Resolution: Won't Fix Votes: 0
Labels: 11.0-sel

Attachments: Text File mdev30514-bb-11.0-try-good-join-order.txt    

 Description   

Set to critical, because in 11.0 JSON_HB is planned to become a default value, so it will become a regression. At least its extent needs to be analyzed.

CREATE DATABASE dbt3;
USE dbt3;
 
set default_storage_engine=Aria;
 
--source include/dbt3_s001.inc
 
ANALYZE TABLE part, partsupp, supplier, lineitem, orders, customer PERSISTENT FOR ALL;
 
--start_timer
SELECT COUNT(*) AS cnt FROM part LEFT JOIN partsupp ON ( p_partkey = ps_partkey ) JOIN supplier ON (s_suppkey = ps_suppkey) LEFT JOIN lineitem ON ( ps_suppkey = l_suppkey ) JOIN orders ON ( l_orderkey = o_orderkey ) JOIN customer ON ( o_custkey = c_custkey ) WHERE ps_partkey < 130 AND ps_comment IS NOT NULL HAVING cnt < 0;
--end_timer
 
EXPLAIN EXTENDED
SELECT COUNT(*) AS cnt FROM part LEFT JOIN partsupp ON ( p_partkey = ps_partkey ) JOIN supplier ON (s_suppkey = ps_suppkey) LEFT JOIN lineitem ON ( ps_suppkey = l_suppkey ) JOIN orders ON ( l_orderkey = o_orderkey ) JOIN customer ON ( o_custkey = c_custkey ) WHERE ps_partkey < 130 AND ps_comment IS NOT NULL HAVING cnt < 0;
 
DROP DATABASE dbt3;

With DOUBLE_PREC_HB, the query takes 0.14 sec on my machine, on 10.8.6 release build. With JSON_HB it is 1.8 sec.

Plans:

10.8.6 with DOUBLE_PREC_HB

+------+-------------+----------+--------+--------------------------------------------------------+-------------+---------+---------------------------------------------+------+----------+--------------------------------------------------------------+
| id   | select_type | table    | type   | possible_keys                                          | key         | key_len | ref                                         | rows | filtered | Extra                                                        |
+------+-------------+----------+--------+--------------------------------------------------------+-------------+---------+---------------------------------------------+------+----------+--------------------------------------------------------------+
|    1 | SIMPLE      | supplier | index  | PRIMARY                                                | PRIMARY     | 4       | NULL                                        | 10   |   100.00 | Using index                                                  |
|    1 | SIMPLE      | lineitem | ref    | PRIMARY,i_l_suppkey,i_l_orderkey,i_l_orderkey_quantity | i_l_suppkey | 5       | test.supplier.s_suppkey                     | 600  |   100.00 |                                                              |
|    1 | SIMPLE      | orders   | eq_ref | PRIMARY,i_o_custkey                                    | PRIMARY     | 4       | test.lineitem.l_orderkey                    | 1    |   100.00 | Using where                                                  |
|    1 | SIMPLE      | customer | eq_ref | PRIMARY                                                | PRIMARY     | 4       | test.orders.o_custkey                       | 1    |   100.00 | Using index                                                  |
|    1 | SIMPLE      | part     | range  | PRIMARY                                                | PRIMARY     | 4       | NULL                                        | 129  |   100.00 | Using where; Using index; Using join buffer (flat, BNL join) |
|    1 | SIMPLE      | partsupp | eq_ref | PRIMARY,i_ps_partkey,i_ps_suppkey                      | PRIMARY     | 8       | test.part.p_partkey,test.supplier.s_suppkey | 1    |    99.22 | Using where                                                  |
+------+-------------+----------+--------+--------------------------------------------------------+-------------+---------+---------------------------------------------+------+----------+--------------------------------------------------------------+
| Note  | 1003 | select count(0) AS `cnt` from `test`.`part` join `test`.`partsupp` join `test`.`supplier` join `test`.`lineitem` join `test`.`orders` join `test`.`customer` where `test`.`customer`.`c_custkey` = `test`.`orders`.`o_custkey` and `test`.`orders`.`o_orderkey` = `test`.`lineitem`.`l_orderkey` and `test`.`lineitem`.`l_suppkey` = `test`.`supplier`.`s_suppkey` and `test`.`partsupp`.`ps_suppkey` = `test`.`supplier`.`s_suppkey` and `test`.`partsupp`.`ps_partkey` = `test`.`part`.`p_partkey` and `test`.`part`.`p_partkey` < 130 and `test`.`partsupp`.`ps_comment` is not null |

10.8.6 with JSON_HB

+------+-------------+----------+--------+--------------------------------------------------------+--------------+---------+--------------------------+------+----------+-------------+
| id   | select_type | table    | type   | possible_keys                                          | key          | key_len | ref                      | rows | filtered | Extra       |
+------+-------------+----------+--------+--------------------------------------------------------+--------------+---------+--------------------------+------+----------+-------------+
|    1 | SIMPLE      | supplier | index  | PRIMARY                                                | PRIMARY      | 4       | NULL                     | 10   |   100.00 | Using index |
|    1 | SIMPLE      | partsupp | ref    | PRIMARY,i_ps_partkey,i_ps_suppkey                      | i_ps_suppkey | 4       | test.supplier.s_suppkey  | 70   |    40.00 | Using where |
|    1 | SIMPLE      | part     | eq_ref | PRIMARY                                                | PRIMARY      | 4       | test.partsupp.ps_partkey | 1    |   100.00 | Using index |
|    1 | SIMPLE      | lineitem | ref    | PRIMARY,i_l_suppkey,i_l_orderkey,i_l_orderkey_quantity | i_l_suppkey  | 5       | test.supplier.s_suppkey  | 600  |   100.00 |             |
|    1 | SIMPLE      | orders   | eq_ref | PRIMARY,i_o_custkey                                    | PRIMARY      | 4       | test.lineitem.l_orderkey | 1    |   100.00 | Using where |
|    1 | SIMPLE      | customer | eq_ref | PRIMARY                                                | PRIMARY      | 4       | test.orders.o_custkey    | 1    |   100.00 | Using index |
+------+-------------+----------+--------+--------------------------------------------------------+--------------+---------+--------------------------+------+----------+-------------+
| Note  | 1003 | select count(0) AS `cnt` from `test`.`part` join `test`.`partsupp` join `test`.`supplier` join `test`.`lineitem` join `test`.`orders` join `test`.`customer` where `test`.`customer`.`c_custkey` = `test`.`orders`.`o_custkey` and `test`.`orders`.`o_orderkey` = `test`.`lineitem`.`l_orderkey` and `test`.`partsupp`.`ps_suppkey` = `test`.`supplier`.`s_suppkey` and `test`.`lineitem`.`l_suppkey` = `test`.`supplier`.`s_suppkey` and `test`.`part`.`p_partkey` = `test`.`partsupp`.`ps_partkey` and `test`.`partsupp`.`ps_partkey` < 130 and `test`.`partsupp`.`ps_comment` is not null |



 Comments   
Comment by Sergei Petrunia [ 2023-02-07 ]

On the current bb-11.0 tree, I get the same join order with JSON_HB and DOUBLE_PREC_HB.
It matches the join order that the report says 10.8 produces with JSON_HB:

+------+-------------+----------+--------+--------------------------------------------------------+--------------+---------+-----------------------------+------+----------+-------------+
| id   | select_type | table    | type   | possible_keys                                          | key          | key_len | ref                         | rows | filtered | Extra       |
+------+-------------+----------+--------+--------------------------------------------------------+--------------+---------+-----------------------------+------+----------+-------------+
|    1 | SIMPLE      | supplier | index  | PRIMARY                                                | PRIMARY      | 4       | NULL                        | 10   |   100.00 | Using index |
|    1 | SIMPLE      | partsupp | ref    | PRIMARY,i_ps_partkey,i_ps_suppkey                      | i_ps_suppkey | 4       | dbt3old.supplier.s_suppkey  | 70   |    39.69 | Using where |
|    1 | SIMPLE      | part     | eq_ref | PRIMARY                                                | PRIMARY      | 4       | dbt3old.partsupp.ps_partkey | 1    |   100.00 | Using index |
|    1 | SIMPLE      | lineitem | ref    | PRIMARY,i_l_suppkey,i_l_orderkey,i_l_orderkey_quantity | i_l_suppkey  | 5       | dbt3old.supplier.s_suppkey  | 600  |   100.00 |             |
|    1 | SIMPLE      | orders   | eq_ref | PRIMARY,i_o_custkey                                    | PRIMARY      | 4       | dbt3old.lineitem.l_orderkey | 1    |   100.00 | Using where |
|    1 | SIMPLE      | customer | eq_ref | PRIMARY                                                | PRIMARY      | 4       | dbt3old.orders.o_custkey    | 1    |   100.00 | Using index |
+------+-------------+----------+--------+--------------------------------------------------------+--------------+---------+-----------------------------+------+----------+-------------+

It takes about 10 seconds on my machine on debug build.

Comment by Sergei Petrunia [ 2023-02-07 ]

I could reproduce the issue on 10.8.

Comment by Sergei Petrunia [ 2023-02-07 ]

Conditions that could be affected by histogram selectivity:

  • partsupp.ps_partkey < 130 – not using histograms, because there is an index. Range analyzer provides the estimate.
  • partsupp.ps_comment is not null – Does use histograms..

DOUBLE_PREC_HB: selectivity_from_histogram": 0.9921875
JSON_HB: selectivity_from_histogram": 1

In reality, the table has 700 rows and 0 rows have "ps_comment is null".

(in both cases, mysql.column_stats.nulls_ratio=0.000, I'm not sure where DOUBLE_PREC_HB gets 0.9921875 value from... 1/ (1- 0.9921875) = 128...

In any case, if the estimate is off by 1/128th, this doesn't look like a meaningful reason for 10x increase in query speed..

Comment by Sergei Petrunia [ 2023-02-07 ]

Trying the good join order on bb-11.0: mdev30514-bb-11.0-try-good-join-order.txt

It is also faster than what the optimizer chooses by default.
However, table "part" uses the join buffer. The optimizer doesn't take the use of join buffer into account (either in 11.0 or in 10.8, it's a long known deficiency in best_access_path).

If I set join_cache_level=0 to make the execution match what the optimizer estimates, the "good" join order becomes slower than the join order that the optimizer picks by default.

Generated at Thu Feb 08 10:16:48 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.