[MDEV-26750] Estimation for filtered rows is far off with JSON_HB histogram Created: 2021-10-02  Updated: 2022-01-19  Resolved: 2021-10-18

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: N/A
Fix Version/s: 10.7.1

Type: Bug Priority: Critical
Reporter: Elena Stepanova Assignee: Sergei Petrunia
Resolution: Fixed Votes: 0
Labels: None
Environment:

preview-10.7-MDEV-26519-json-histograms c548019b


Issue Links:
Problem/Incident
is caused by MDEV-21130 Histograms: use JSON as on-disk format Closed

 Description   

drop table if exists t1;
 
create table t1 (c char(8)) engine=MyISAM;
 
insert into t1 values ('1x');
insert into t1 values ('1x');
insert into t1 values ('1xx');
insert into t1 values ('0xx');
insert into t1 select * from t1;
insert into t1 select * from t1;
 
set histogram_type= SINGLE_PREC_HB;
analyze table t1 persistent for all;
analyze
select c from t1 where c > '1';
 
set histogram_type= DOUBLE_PREC_HB;
analyze table t1 persistent for all;
analyze
select c from t1 where c > '1';
 
set histogram_type= JSON_HB;
analyze table t1 persistent for all;
analyze
select c from t1 where c > '1';
 
# Cleanup
drop table t1;

SINGLE_PREC_HB

+------+-------------+-------+------+---------------+------+---------+------+------+--------+----------+------------+-------------+
| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | r_rows | filtered | r_filtered | Extra       |
+------+-------------+-------+------+---------------+------+---------+------+------+--------+----------+------------+-------------+
|    1 | SIMPLE      | t1    | ALL  | NULL          | NULL | NULL    | NULL | 16   | 16.00  |    74.90 |      75.00 | Using where |
+------+-------------+-------+------+---------------+------+---------+------+------+--------+----------+------------+-------------+

DOUBLE_PREC_HB

+------+-------------+-------+------+---------------+------+---------+------+------+--------+----------+------------+-------------+
| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | r_rows | filtered | r_filtered | Extra       |
+------+-------------+-------+------+---------------+------+---------+------+------+--------+----------+------------+-------------+
|    1 | SIMPLE      | t1    | ALL  | NULL          | NULL | NULL    | NULL | 16   | 16.00  |    75.00 |      75.00 | Using where |
+------+-------------+-------+------+---------------+------+---------+------+------+--------+----------+------------+-------------+

JSON_HB

+------+-------------+-------+------+---------------+------+---------+------+------+--------+----------+------------+-------------+
| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | r_rows | filtered | r_filtered | Extra       |
+------+-------------+-------+------+---------------+------+---------+------+------+--------+----------+------------+-------------+
|    1 | SIMPLE      | t1    | ALL  | NULL          | NULL | NULL    | NULL | 16   | 16.00  |    33.33 |      75.00 | Using where |
+------+-------------+-------+------+---------------+------+---------+------+------+--------+----------+------------+-------------+

Results from preview-10.7-MDEV-26519-json-histograms c548019b



 Comments   
Comment by Elena Stepanova [ 2021-10-02 ]

Raised priority on procedural reasons

Comment by Sergei Petrunia [ 2021-10-18 ]

There are some general considerations about observed histogram precision, but let's tackle this particular case, first.

Data distribution:

MariaDB [j4]> select c, count(*) from t1 group by c;
+------+----------+
| c    | count(*) |
+------+----------+
| 0xx  |        4 |
| 1x   |        8 |
| 1xx  |        4 |
+------+----------+

The histogram seems adequate:

{
  "histogram_hb_v2": [
    {
      "start": "0xx",
      "size": 0.25,
      "ndv": 1
    },
    {
      "start": "1x",
      "size": 0.5,
      "ndv": 1
    },
    {
      "start": "1xx",
      "end": "1xx",
      "size": 0.25,
      "ndv": 1
    }
  ]
}

Comment by Sergei Petrunia [ 2021-10-18 ]

.. but things go wrong at the phase where we are computing the selectivity.

Generated at Thu Feb 08 09:47:39 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.