[MDEV-26750] Estimation for filtered rows is far off with JSON_HB histogram - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Affects Version/s: N/A
Fix Version/s: 10.7.1
Component/s: Optimizer
Labels:
None
Environment:
preview-10.7-~~MDEV-26519~~-json-histograms c548019b

Description

drop table if exists t1;

create table t1 (c char(8)) engine=MyISAM;

insert into t1 values ('1x');

insert into t1 values ('1x');

insert into t1 values ('1xx');

insert into t1 values ('0xx');

insert into t1 select * from t1;

insert into t1 select * from t1;

set histogram_type= SINGLE_PREC_HB;

analyze table t1 persistent for all;

analyze

select c from t1 where c > '1';

set histogram_type= DOUBLE_PREC_HB;

analyze table t1 persistent for all;

analyze

select c from t1 where c > '1';

set histogram_type= JSON_HB;

analyze table t1 persistent for all;

analyze

select c from t1 where c > '1';

# Cleanup

drop table t1;

SINGLE_PREC_HB
+------+-------------+-------+------+---------------+------+---------+------+------+--------+----------+------------+-------------+
\| id \| select_type \| table \| type \| possible_keys \| key \| key_len \| ref \| rows \| r_rows \| filtered \| r_filtered \| Extra \|
+------+-------------+-------+------+---------------+------+---------+------+------+--------+----------+------------+-------------+
\| 1 \| SIMPLE \| t1 \| ALL \| NULL \| NULL \| NULL \| NULL \| 16 \| 16.00 \| 74.90 \| 75.00 \| Using where \|
+------+-------------+-------+------+---------------+------+---------+------+------+--------+----------+------------+-------------+

DOUBLE_PREC_HB
+------+-------------+-------+------+---------------+------+---------+------+------+--------+----------+------------+-------------+
\| id \| select_type \| table \| type \| possible_keys \| key \| key_len \| ref \| rows \| r_rows \| filtered \| r_filtered \| Extra \|
+------+-------------+-------+------+---------------+------+---------+------+------+--------+----------+------------+-------------+
\| 1 \| SIMPLE \| t1 \| ALL \| NULL \| NULL \| NULL \| NULL \| 16 \| 16.00 \| 75.00 \| 75.00 \| Using where \|
+------+-------------+-------+------+---------------+------+---------+------+------+--------+----------+------------+-------------+

JSON_HB
+------+-------------+-------+------+---------------+------+---------+------+------+--------+----------+------------+-------------+
\| id \| select_type \| table \| type \| possible_keys \| key \| key_len \| ref \| rows \| r_rows \| filtered \| r_filtered \| Extra \|
+------+-------------+-------+------+---------------+------+---------+------+------+--------+----------+------------+-------------+
\| 1 \| SIMPLE \| t1 \| ALL \| NULL \| NULL \| NULL \| NULL \| 16 \| 16.00 \| 33.33 \| 75.00 \| Using where \|
+------+-------------+-------+------+---------------+------+---------+------+------+--------+----------+------------+-------------+

Results from preview-10.7-~~MDEV-26519~~-json-histograms c548019b

Attachments

Issue Links

is caused by

MDEV-21130 Histograms: use JSON as on-disk format

Closed

Activity

Descending order - Click to sort in ascending order

Sergei Petrunia added a comment - 2021-10-18 13:32

.. but things go wrong at the phase where we are computing the selectivity.

Sergei Petrunia added a comment - 2021-10-18 13:32 .. but things go wrong at the phase where we are computing the selectivity.

Sergei Petrunia added a comment - 2021-10-18 12:20

There are some general considerations about observed histogram precision, but let's tackle this particular case, first.

Data distribution:

MariaDB [j4]> select c, count(*) from t1 group by c;

+------+----------+

| c    | count(*) |

+------+----------+

| 0xx  |        4 |

| 1x   |        8 |

| 1xx  |        4 |

+------+----------+

The histogram seems adequate:

  "histogram_hb_v2": [

      "start": "0xx",

      "size": 0.25,

      "ndv": 1

},

      "start": "1x",

      "size": 0.5,

      "ndv": 1

},

      "start": "1xx",

      "end": "1xx",

      "size": 0.25,

      "ndv": 1

Sergei Petrunia added a comment - 2021-10-18 12:20 There are some general considerations about observed histogram precision, but let's tackle this particular case, first. Data distribution: MariaDB [j4]> select c, count(*) from t1 group by c; +------+----------+ | c | count(*) | +------+----------+ | 0xx | 4 | | 1x | 8 | | 1xx | 4 | +------+----------+ The histogram seems adequate: { "histogram_hb_v2": [ { "start": "0xx", "size": 0.25, "ndv": 1 }, { "start": "1x", "size": 0.5, "ndv": 1 }, { "start": "1xx", "end": "1xx", "size": 0.25, "ndv": 1 } ] }

Elena Stepanova added a comment - 2021-10-02 22:38

Raised priority on procedural reasons

Elena Stepanova added a comment - 2021-10-02 22:38 Raised priority on procedural reasons

People

Assignee:: Sergei Petrunia

Reporter:: Elena Stepanova

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 2021-10-02 22:37

Updated:: 2022-01-19 15:10

Resolved:: 2021-10-18 13:32

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server