Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-26125

Histograms: limited histogram precision causes imprecise estimates

Details

    Description

      This is a known limitation but I'm filing a testcase for the record.

      Generate a 1-millon row population with column "Country" having value distribution according to the "world" database:

      source mysql-test/include/world_schema.inc 
      source mysql-test/include/world.inc 
       
      create table generated_pop (
        pk int auto_increment primary key,
        Country char(52)
      );
      

      set @total_pop= (select Sum(population) from Country) - 1;
       
      create table tmp1 as 
      SELECT 
        name, 
        SUM(Population) over (order by Name) as cumulative_pop
      FROM Country;
      

      create table ten(a int primary key);
      insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
       
      create table one_k(a int primary key);
      insert into one_k select A.a + B.a* 10 + C.a * 100 from ten A, ten B, ten C;
      

      create table rand_numbers as
      select 
        rand() * @total_pop as col
      from 
        one_k A, one_k B;
       
      insert into generated_pop (Country)
       select 
        (select name from tmp1 
         where cumulative_pop > col 
         order by cumulative_pop limit 1
        )
      from rand_numbers;
      

      analyze table generated_pop persistent for all;
      

      mysql> select count(*) from generated_pop where country='China';
      +----------+
      | count(*) |
      +----------+
      |   210306 |
      +----------+
      1 row in set (3.00 sec)
       
      mysql> select count(*) from generated_pop where country='Chile';
      +----------+
      | count(*) |
      +----------+
      |     2479 |
      +----------+
      1 row in set (3.49 sec)
      

      As expected, China has a much larger population than Chile.

      But histogram doesn't allow the optimizer to find this out:

      mysql> explain extended select  * from generated_pop where country='China';
      +------+-------------+---------------+------+---------------+------+---------+------+---------+----------+-------------+
      | id   | select_type | table         | type | possible_keys | key  | key_len | ref  | rows    | filtered | Extra       |
      +------+-------------+---------------+------+---------------+------+---------+------+---------+----------+-------------+
      |    1 | SIMPLE      | generated_pop | ALL  | NULL          | NULL | NULL    | NULL | 1000000 |    21.09 | Using where |
      +------+-------------+---------------+------+---------------+------+---------+------+---------+----------+-------------+
      1 row in set, 1 warning (0.01 sec)
       
      mysql> explain extended select  * from generated_pop where country='Chile';
      +------+-------------+---------------+------+---------------+------+---------+------+---------+----------+-------------+
      | id   | select_type | table         | type | possible_keys | key  | key_len | ref  | rows    | filtered | Extra       |
      +------+-------------+---------------+------+---------------+------+---------+------+---------+----------+-------------+
      |    1 | SIMPLE      | generated_pop | ALL  | NULL          | NULL | NULL    | NULL | 1000000 |    21.09 | Using where |
      +------+-------------+---------------+------+---------------+------+---------+------+---------+----------+-------------+
      1 row in set, 1 warning (0.00 sec)
      

      Attachments

        Issue Links

          Activity

            MySQL 8, for comparison:
            (not exactly the same data due to different seed for RAND, real numbers are: 210369 in China and 2509 in Chile)

            mysql> explain select  * from generated_pop where country='China'\G
            *************************** 1. row ***************************
                       id: 1
              select_type: SIMPLE
                    table: generated_pop
               partitions: NULL
                     type: ALL
            possible_keys: NULL
                      key: NULL
                  key_len: NULL
                      ref: NULL
                     rows: 917136
                 filtered: 21.07
                    Extra: Using where
            1 row in set, 1 warning (0.00 sec)
             
            mysql> explain select  * from generated_pop where country='Chile'\G
            *************************** 1. row ***************************
                       id: 1
              select_type: SIMPLE
                    table: generated_pop
               partitions: NULL
                     type: ALL
            possible_keys: NULL
                      key: NULL
                  key_len: NULL
                      ref: NULL
                     rows: 917136
                 filtered: 0.00
                    Extra: Using where
            1 row in set, 1 warning (0.00 sec)
            

            psergei Sergei Petrunia added a comment - MySQL 8, for comparison: (not exactly the same data due to different seed for RAND, real numbers are: 210369 in China and 2509 in Chile) mysql> explain select * from generated_pop where country='China'\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: generated_pop partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 917136 filtered: 21.07 Extra: Using where 1 row in set, 1 warning (0.00 sec)   mysql> explain select * from generated_pop where country='Chile'\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: generated_pop partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 917136 filtered: 0.00 Extra: Using where 1 row in set, 1 warning (0.00 sec)
            rxrog Michael Okoko added a comment -

            Using the build from PR-1854, this is the output using JSON histograms:

            MariaDB [pop_test]> set histogram_type=json;
            Query OK, 0 rows affected (0.000 sec)
             
            MariaDB [pop_test]> analyze table generated_pop persistent for all;
            +------------------------+---------+----------+-----------------------------------------+
            | Table                  | Op      | Msg_type | Msg_text                                |
            +------------------------+---------+----------+-----------------------------------------+
            | pop_test.generated_pop | analyze | status   | Engine-independent statistics collected |
            | pop_test.generated_pop | analyze | status   | OK                                      |
            +------------------------+---------+----------+-----------------------------------------+
            2 rows in set (6.719 sec)
             
            MariaDB [pop_test]> explain extended select  * from generated_pop where country='Chile';
            +------+-------------+---------------+------+---------------+------+---------+------+---------+----------+-------------+
            | id   | select_type | table         | type | possible_keys | key  | key_len | ref  | rows    | filtered | Extra       |
            +------+-------------+---------------+------+---------------+------+---------+------+---------+----------+-------------+
            |    1 | SIMPLE      | generated_pop | ALL  | NULL          | NULL | NULL    | NULL | 1000000 |     0.78 | Using where |
            +------+-------------+---------------+------+---------------+------+---------+------+---------+----------+-------------+
            1 row in set, 1 warning (0.001 sec)
             
            MariaDB [pop_test]> explain extended select  * from generated_pop where country='China';
            +------+-------------+---------------+------+---------------+------+---------+------+---------+----------+-------------+
            | id   | select_type | table         | type | possible_keys | key  | key_len | ref  | rows    | filtered | Extra       |
            +------+-------------+---------------+------+---------------+------+---------+------+---------+----------+-------------+
            |    1 | SIMPLE      | generated_pop | ALL  | NULL          | NULL | NULL    | NULL | 1000000 |    21.18 | Using where |
            +------+-------------+---------------+------+---------------+------+---------+------+---------+----------+-------------+
            1 row in set, 1 warning (0.001 sec)
             
            MariaDB [pop_test]>
            

            rxrog Michael Okoko added a comment - Using the build from PR-1854 , this is the output using JSON histograms: MariaDB [pop_test]> set histogram_type=json; Query OK, 0 rows affected (0.000 sec)   MariaDB [pop_test]> analyze table generated_pop persistent for all; +------------------------+---------+----------+-----------------------------------------+ | Table | Op | Msg_type | Msg_text | +------------------------+---------+----------+-----------------------------------------+ | pop_test.generated_pop | analyze | status | Engine-independent statistics collected | | pop_test.generated_pop | analyze | status | OK | +------------------------+---------+----------+-----------------------------------------+ 2 rows in set (6.719 sec)   MariaDB [pop_test]> explain extended select * from generated_pop where country='Chile'; +------+-------------+---------------+------+---------------+------+---------+------+---------+----------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +------+-------------+---------------+------+---------------+------+---------+------+---------+----------+-------------+ | 1 | SIMPLE | generated_pop | ALL | NULL | NULL | NULL | NULL | 1000000 | 0.78 | Using where | +------+-------------+---------------+------+---------------+------+---------+------+---------+----------+-------------+ 1 row in set, 1 warning (0.001 sec)   MariaDB [pop_test]> explain extended select * from generated_pop where country='China'; +------+-------------+---------------+------+---------------+------+---------+------+---------+----------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +------+-------------+---------------+------+---------------+------+---------+------+---------+----------+-------------+ | 1 | SIMPLE | generated_pop | ALL | NULL | NULL | NULL | NULL | 1000000 | 21.18 | Using where | +------+-------------+---------------+------+---------------+------+---------+------+---------+----------+-------------+ 1 row in set, 1 warning (0.001 sec)   MariaDB [pop_test]>

            Re-trying with the latest 10.8 tree with JSON_HB Histograms:

            MariaDB [pop1]> analyze select * from generated_pop where country='China';
            +------+-------------+---------------+------+---------------+------+---------+------+---------+------------+----------+------------+-------------+
            | id   | select_type | table         | type | possible_keys | key  | key_len | ref  | rows    | r_rows     | filtered | r_filtered | Extra       |
            +------+-------------+---------------+------+---------------+------+---------+------+---------+------------+----------+------------+-------------+
            |    1 | SIMPLE      | generated_pop | ALL  | NULL          | NULL | NULL    | NULL | 1000000 | 1000000.00 |    21.04 |      21.04 | Using where |
            +------+-------------+---------------+------+---------------+------+---------+------+---------+------------+----------+------------+-------------+
            1 row in set (0.164 sec)
            

            MariaDB [pop1]> analyze select * from generated_pop where country='Chile';
            +------+-------------+---------------+------+---------------+------+---------+------+---------+------------+----------+------------+-------------+
            | id   | select_type | table         | type | possible_keys | key  | key_len | ref  | rows    | r_rows     | filtered | r_filtered | Extra       |
            +------+-------------+---------------+------+---------------+------+---------+------+---------+------------+----------+------------+-------------+
            |    1 | SIMPLE      | generated_pop | ALL  | NULL          | NULL | NULL    | NULL | 1000000 | 1000000.00 |     0.13 |       0.25 | Using where |
            +------+-------------+---------------+------+---------------+------+---------+------+---------+------------+----------+------------+-------------+
            1 row in set (0.157 sec)
            

            MariaDB [pop1]> analyze select * from generated_pop where country='Sweden';
            +------+-------------+---------------+------+---------------+------+---------+------+---------+------------+----------+------------+-------------+
            | id   | select_type | table         | type | possible_keys | key  | key_len | ref  | rows    | r_rows     | filtered | r_filtered | Extra       |
            +------+-------------+---------------+------+---------------+------+---------+------+---------+------------+----------+------------+-------------+
            |    1 | SIMPLE      | generated_pop | ALL  | NULL          | NULL | NULL    | NULL | 1000000 | 1000000.00 |     0.08 |       0.15 | Using where |
            +------+-------------+---------------+------+---------------+------+---------+------+---------+------------+----------+------------+-------------+
            1 row in set (0.157 sec)
            

            psergei Sergei Petrunia added a comment - Re-trying with the latest 10.8 tree with JSON_HB Histograms: MariaDB [pop1]> analyze select * from generated_pop where country='China'; +------+-------------+---------------+------+---------------+------+---------+------+---------+------------+----------+------------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | r_rows | filtered | r_filtered | Extra | +------+-------------+---------------+------+---------------+------+---------+------+---------+------------+----------+------------+-------------+ | 1 | SIMPLE | generated_pop | ALL | NULL | NULL | NULL | NULL | 1000000 | 1000000.00 | 21.04 | 21.04 | Using where | +------+-------------+---------------+------+---------------+------+---------+------+---------+------------+----------+------------+-------------+ 1 row in set (0.164 sec) MariaDB [pop1]> analyze select * from generated_pop where country='Chile'; +------+-------------+---------------+------+---------------+------+---------+------+---------+------------+----------+------------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | r_rows | filtered | r_filtered | Extra | +------+-------------+---------------+------+---------------+------+---------+------+---------+------------+----------+------------+-------------+ | 1 | SIMPLE | generated_pop | ALL | NULL | NULL | NULL | NULL | 1000000 | 1000000.00 | 0.13 | 0.25 | Using where | +------+-------------+---------------+------+---------------+------+---------+------+---------+------------+----------+------------+-------------+ 1 row in set (0.157 sec) MariaDB [pop1]> analyze select * from generated_pop where country='Sweden'; +------+-------------+---------------+------+---------------+------+---------+------+---------+------------+----------+------------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | r_rows | filtered | r_filtered | Extra | +------+-------------+---------------+------+---------------+------+---------+------+---------+------------+----------+------------+-------------+ | 1 | SIMPLE | generated_pop | ALL | NULL | NULL | NULL | NULL | 1000000 | 1000000.00 | 0.08 | 0.15 | Using where | +------+-------------+---------------+------+---------------+------+---------+------+---------+------------+----------+------------+-------------+ 1 row in set (0.157 sec)

            Fixed in 10.8 by MDEV-21130, JSON_HB histograms.

            psergei Sergei Petrunia added a comment - Fixed in 10.8 by MDEV-21130 , JSON_HB histograms.

            People

              psergei Sergei Petrunia
              psergei Sergei Petrunia
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.