Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-26750

Estimation for filtered rows is far off with JSON_HB histogram

Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Fixed
    • N/A
    • 10.7.1
    • Optimizer
    • None
    • preview-10.7-MDEV-26519-json-histograms c548019b

    Description

      drop table if exists t1;
       
      create table t1 (c char(8)) engine=MyISAM;
       
      insert into t1 values ('1x');
      insert into t1 values ('1x');
      insert into t1 values ('1xx');
      insert into t1 values ('0xx');
      insert into t1 select * from t1;
      insert into t1 select * from t1;
       
      set histogram_type= SINGLE_PREC_HB;
      analyze table t1 persistent for all;
      analyze
      select c from t1 where c > '1';
       
      set histogram_type= DOUBLE_PREC_HB;
      analyze table t1 persistent for all;
      analyze
      select c from t1 where c > '1';
       
      set histogram_type= JSON_HB;
      analyze table t1 persistent for all;
      analyze
      select c from t1 where c > '1';
       
      # Cleanup
      drop table t1;
      

      SINGLE_PREC_HB

      +------+-------------+-------+------+---------------+------+---------+------+------+--------+----------+------------+-------------+
      | id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | r_rows | filtered | r_filtered | Extra       |
      +------+-------------+-------+------+---------------+------+---------+------+------+--------+----------+------------+-------------+
      |    1 | SIMPLE      | t1    | ALL  | NULL          | NULL | NULL    | NULL | 16   | 16.00  |    74.90 |      75.00 | Using where |
      +------+-------------+-------+------+---------------+------+---------+------+------+--------+----------+------------+-------------+
      

      DOUBLE_PREC_HB

      +------+-------------+-------+------+---------------+------+---------+------+------+--------+----------+------------+-------------+
      | id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | r_rows | filtered | r_filtered | Extra       |
      +------+-------------+-------+------+---------------+------+---------+------+------+--------+----------+------------+-------------+
      |    1 | SIMPLE      | t1    | ALL  | NULL          | NULL | NULL    | NULL | 16   | 16.00  |    75.00 |      75.00 | Using where |
      +------+-------------+-------+------+---------------+------+---------+------+------+--------+----------+------------+-------------+
      

      JSON_HB

      +------+-------------+-------+------+---------------+------+---------+------+------+--------+----------+------------+-------------+
      | id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | r_rows | filtered | r_filtered | Extra       |
      +------+-------------+-------+------+---------------+------+---------+------+------+--------+----------+------------+-------------+
      |    1 | SIMPLE      | t1    | ALL  | NULL          | NULL | NULL    | NULL | 16   | 16.00  |    33.33 |      75.00 | Using where |
      +------+-------------+-------+------+---------------+------+---------+------+------+--------+----------+------------+-------------+
      

      Results from preview-10.7-MDEV-26519-json-histograms c548019b

      Attachments

        Issue Links

          Activity

            .. but things go wrong at the phase where we are computing the selectivity.

            psergei Sergei Petrunia added a comment - .. but things go wrong at the phase where we are computing the selectivity.

            There are some general considerations about observed histogram precision, but let's tackle this particular case, first.

            Data distribution:

            MariaDB [j4]> select c, count(*) from t1 group by c;
            +------+----------+
            | c    | count(*) |
            +------+----------+
            | 0xx  |        4 |
            | 1x   |        8 |
            | 1xx  |        4 |
            +------+----------+
            

            The histogram seems adequate:

            {
              "histogram_hb_v2": [
                {
                  "start": "0xx",
                  "size": 0.25,
                  "ndv": 1
                },
                {
                  "start": "1x",
                  "size": 0.5,
                  "ndv": 1
                },
                {
                  "start": "1xx",
                  "end": "1xx",
                  "size": 0.25,
                  "ndv": 1
                }
              ]
            }
            

            psergei Sergei Petrunia added a comment - There are some general considerations about observed histogram precision, but let's tackle this particular case, first. Data distribution: MariaDB [j4]> select c, count(*) from t1 group by c; +------+----------+ | c | count(*) | +------+----------+ | 0xx | 4 | | 1x | 8 | | 1xx | 4 | +------+----------+ The histogram seems adequate: { "histogram_hb_v2": [ { "start": "0xx", "size": 0.25, "ndv": 1 }, { "start": "1x", "size": 0.5, "ndv": 1 }, { "start": "1xx", "end": "1xx", "size": 0.25, "ndv": 1 } ] }

            Raised priority on procedural reasons

            elenst Elena Stepanova added a comment - Raised priority on procedural reasons

            People

              psergei Sergei Petrunia
              elenst Elena Stepanova
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.