Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-26750

Estimation for filtered rows is far off with JSON_HB histogram

Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Fixed
    • N/A
    • 10.7.1
    • Optimizer
    • None
    • preview-10.7-MDEV-26519-json-histograms c548019b

    Description

      drop table if exists t1;
       
      create table t1 (c char(8)) engine=MyISAM;
       
      insert into t1 values ('1x');
      insert into t1 values ('1x');
      insert into t1 values ('1xx');
      insert into t1 values ('0xx');
      insert into t1 select * from t1;
      insert into t1 select * from t1;
       
      set histogram_type= SINGLE_PREC_HB;
      analyze table t1 persistent for all;
      analyze
      select c from t1 where c > '1';
       
      set histogram_type= DOUBLE_PREC_HB;
      analyze table t1 persistent for all;
      analyze
      select c from t1 where c > '1';
       
      set histogram_type= JSON_HB;
      analyze table t1 persistent for all;
      analyze
      select c from t1 where c > '1';
       
      # Cleanup
      drop table t1;
      

      SINGLE_PREC_HB

      +------+-------------+-------+------+---------------+------+---------+------+------+--------+----------+------------+-------------+
      | id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | r_rows | filtered | r_filtered | Extra       |
      +------+-------------+-------+------+---------------+------+---------+------+------+--------+----------+------------+-------------+
      |    1 | SIMPLE      | t1    | ALL  | NULL          | NULL | NULL    | NULL | 16   | 16.00  |    74.90 |      75.00 | Using where |
      +------+-------------+-------+------+---------------+------+---------+------+------+--------+----------+------------+-------------+
      

      DOUBLE_PREC_HB

      +------+-------------+-------+------+---------------+------+---------+------+------+--------+----------+------------+-------------+
      | id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | r_rows | filtered | r_filtered | Extra       |
      +------+-------------+-------+------+---------------+------+---------+------+------+--------+----------+------------+-------------+
      |    1 | SIMPLE      | t1    | ALL  | NULL          | NULL | NULL    | NULL | 16   | 16.00  |    75.00 |      75.00 | Using where |
      +------+-------------+-------+------+---------------+------+---------+------+------+--------+----------+------------+-------------+
      

      JSON_HB

      +------+-------------+-------+------+---------------+------+---------+------+------+--------+----------+------------+-------------+
      | id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | r_rows | filtered | r_filtered | Extra       |
      +------+-------------+-------+------+---------------+------+---------+------+------+--------+----------+------------+-------------+
      |    1 | SIMPLE      | t1    | ALL  | NULL          | NULL | NULL    | NULL | 16   | 16.00  |    33.33 |      75.00 | Using where |
      +------+-------------+-------+------+---------------+------+---------+------+------+--------+----------+------------+-------------+
      

      Results from preview-10.7-MDEV-26519-json-histograms c548019b

      Attachments

        Issue Links

          Activity

            Raised priority on procedural reasons

            elenst Elena Stepanova added a comment - Raised priority on procedural reasons

            There are some general considerations about observed histogram precision, but let's tackle this particular case, first.

            Data distribution:

            MariaDB [j4]> select c, count(*) from t1 group by c;
            +------+----------+
            | c    | count(*) |
            +------+----------+
            | 0xx  |        4 |
            | 1x   |        8 |
            | 1xx  |        4 |
            +------+----------+
            

            The histogram seems adequate:

            {
              "histogram_hb_v2": [
                {
                  "start": "0xx",
                  "size": 0.25,
                  "ndv": 1
                },
                {
                  "start": "1x",
                  "size": 0.5,
                  "ndv": 1
                },
                {
                  "start": "1xx",
                  "end": "1xx",
                  "size": 0.25,
                  "ndv": 1
                }
              ]
            }
            

            psergei Sergei Petrunia added a comment - There are some general considerations about observed histogram precision, but let's tackle this particular case, first. Data distribution: MariaDB [j4]> select c, count(*) from t1 group by c; +------+----------+ | c | count(*) | +------+----------+ | 0xx | 4 | | 1x | 8 | | 1xx | 4 | +------+----------+ The histogram seems adequate: { "histogram_hb_v2": [ { "start": "0xx", "size": 0.25, "ndv": 1 }, { "start": "1x", "size": 0.5, "ndv": 1 }, { "start": "1xx", "end": "1xx", "size": 0.25, "ndv": 1 } ] }

            .. but things go wrong at the phase where we are computing the selectivity.

            psergei Sergei Petrunia added a comment - .. but things go wrong at the phase where we are computing the selectivity.

            People

              psergei Sergei Petrunia
              elenst Elena Stepanova
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.