Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-26125

Histograms: limited histogram precision causes imprecise estimates

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Open (View Workflow)
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 10.2, 10.3, 10.4, 10.5, 10.6
    • Fix Version/s: 10.7
    • Component/s: Optimizer
    • Labels:
      None

      Description

      This is a known limitation but I'm filing a testcase for the record.

      Generate a 1-millon row population with column "Country" having value distribution according to the "world" database:

      source mysql-test/include/world_schema.inc 
      source mysql-test/include/world.inc 
       
      create table generated_pop (
        pk int auto_increment primary key,
        Country char(52)
      );
      

      set @total_pop= (select Sum(population) from Country) - 1;
       
      create table tmp1 as 
      SELECT 
        name, 
        SUM(Population) over (order by Name) as cumulative_pop
      FROM Country;
      

      create table ten(a int primary key);
      insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
       
      create table one_k(a int primary key);
      insert into one_k select A.a + B.a* 10 + C.a * 100 from ten A, ten B, ten C;
      

      create table rand_numbers as
      select 
        rand() * @total_pop as col
      from 
        one_k A, one_k B;
       
      insert into generated_pop (Country)
       select 
        (select name from tmp1 
         where cumulative_pop > col 
         order by cumulative_pop limit 1
        )
      from rand_numbers;
      

      analyze table generated_pop persistent for all;
      

      mysql> select count(*) from generated_pop where country='China';
      +----------+
      | count(*) |
      +----------+
      |   210306 |
      +----------+
      1 row in set (3.00 sec)
       
      mysql> select count(*) from generated_pop where country='Chile';
      +----------+
      | count(*) |
      +----------+
      |     2479 |
      +----------+
      1 row in set (3.49 sec)
      

      As expected, China has a much larger population than Chile.

      But histogram doesn't allow the optimizer to find this out:

      mysql> explain extended select  * from generated_pop where country='China';
      +------+-------------+---------------+------+---------------+------+---------+------+---------+----------+-------------+
      | id   | select_type | table         | type | possible_keys | key  | key_len | ref  | rows    | filtered | Extra       |
      +------+-------------+---------------+------+---------------+------+---------+------+---------+----------+-------------+
      |    1 | SIMPLE      | generated_pop | ALL  | NULL          | NULL | NULL    | NULL | 1000000 |    21.09 | Using where |
      +------+-------------+---------------+------+---------------+------+---------+------+---------+----------+-------------+
      1 row in set, 1 warning (0.01 sec)
       
      mysql> explain extended select  * from generated_pop where country='Chile';
      +------+-------------+---------------+------+---------------+------+---------+------+---------+----------+-------------+
      | id   | select_type | table         | type | possible_keys | key  | key_len | ref  | rows    | filtered | Extra       |
      +------+-------------+---------------+------+---------------+------+---------+------+---------+----------+-------------+
      |    1 | SIMPLE      | generated_pop | ALL  | NULL          | NULL | NULL    | NULL | 1000000 |    21.09 | Using where |
      +------+-------------+---------------+------+---------------+------+---------+------+---------+----------+-------------+
      1 row in set, 1 warning (0.00 sec)
      

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              Unassigned
              Reporter:
              psergei Sergei Petrunia
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated:

                  Git Integration