Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-4364

Histograms show the same selectivity for col=rare_value and col=frequent_value

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Not a Bug
    • Affects Version/s: None
    • Fix Version/s: 10.0.10
    • Component/s: None
    • Labels:

      Description

      Create the dataset as specified by MDEV-4363.

      MariaDB [j10]> set use_stat_tables='preferably';
      Query OK, 0 rows affected (0.00 sec)
       
      MariaDB [j10]> set optimizer_use_condition_selectivity=4;
      Query OK, 0 rows affected, 1 warning (0.00 sec)

      The point of this test is to check skewed data distributions.
      The value of 178 is very frequent:

      MariaDB [j10]> select (select  count(*) from t5 where col2 = 178 ) /(select  count(*) from t5);
      +--------------------------------------------------------------------------+
      | (select  count(*) from t5 where col2 = 178 ) /(select  count(*) from t5) |
      +--------------------------------------------------------------------------+
      |                                                                   0.3301 |
      +--------------------------------------------------------------------------+
      1 row in set (0.18 sec)

      What does EXPLAIN think:

      MariaDB [j10]> explain extended select count(*) from t5 where col2 = 178 ;
      +------+-------------+-------+------+---------------+------+---------+------+-------+----------+-------------+
      | id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows  | filtered | Extra       |
      +------+-------------+-------+------+---------------+------+---------+------+-------+----------+-------------+
      |    1 | SIMPLE      | t5    | ALL  | NULL          | NULL | NULL    | NULL | 10000 |     3.60 | Using where |
      +------+-------------+-------+------+---------------+------+---------+------+-------+----------+-------------+
      1 row in set, 1 warning (0.00 sec)

      it thinks that selectivity is 3.6%.

      If i try it with a regular, rare value:

       
      MariaDB [j10]> select (select  count(*) from t5 where col2 = 179 ) /(select  count(*) from t5);
      +--------------------------------------------------------------------------+
      | (select  count(*) from t5 where col2 = 179 ) /(select  count(*) from t5) |
      +--------------------------------------------------------------------------+
      |                                                                   0.0001 |
      +--------------------------------------------------------------------------+
      1 row in set (0.20 sec)
       
      MariaDB [j10]> explain extended select count(*) from t5 where col2 = 179 ;
      +------+-------------+-------+------+---------------+------+---------+------+-------+----------+-------------+
      | id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows  | filtered | Extra       |
      +------+-------------+-------+------+---------------+------+---------+------+-------+----------+-------------+
      |    1 | SIMPLE      | t5    | ALL  | NULL          | NULL | NULL    | NULL | 10000 |     3.60 | Using where |
      +------+-------------+-------+------+---------------+------+---------+------+-------+----------+-------------+
      1 row in set, 1 warning (0.00 sec)

      It looks like histograms do not allow to distinguish between a frequent and a rare value?

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              psergey Sergei Petrunia
              Reporter:
              psergey Sergei Petrunia
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: