Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-4364

Histograms show the same selectivity for col=rare_value and col=frequent_value

    XMLWordPrintable

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Not a Bug
    • None
    • 10.0.10
    • None

    Description

      Create the dataset as specified by MDEV-4363.

      MariaDB [j10]> set use_stat_tables='preferably';
      Query OK, 0 rows affected (0.00 sec)
       
      MariaDB [j10]> set optimizer_use_condition_selectivity=4;
      Query OK, 0 rows affected, 1 warning (0.00 sec)

      The point of this test is to check skewed data distributions.
      The value of 178 is very frequent:

      MariaDB [j10]> select (select  count(*) from t5 where col2 = 178 ) /(select  count(*) from t5);
      +--------------------------------------------------------------------------+
      | (select  count(*) from t5 where col2 = 178 ) /(select  count(*) from t5) |
      +--------------------------------------------------------------------------+
      |                                                                   0.3301 |
      +--------------------------------------------------------------------------+
      1 row in set (0.18 sec)

      What does EXPLAIN think:

      MariaDB [j10]> explain extended select count(*) from t5 where col2 = 178 ;
      +------+-------------+-------+------+---------------+------+---------+------+-------+----------+-------------+
      | id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows  | filtered | Extra       |
      +------+-------------+-------+------+---------------+------+---------+------+-------+----------+-------------+
      |    1 | SIMPLE      | t5    | ALL  | NULL          | NULL | NULL    | NULL | 10000 |     3.60 | Using where |
      +------+-------------+-------+------+---------------+------+---------+------+-------+----------+-------------+
      1 row in set, 1 warning (0.00 sec)

      it thinks that selectivity is 3.6%.

      If i try it with a regular, rare value:

       
      MariaDB [j10]> select (select  count(*) from t5 where col2 = 179 ) /(select  count(*) from t5);
      +--------------------------------------------------------------------------+
      | (select  count(*) from t5 where col2 = 179 ) /(select  count(*) from t5) |
      +--------------------------------------------------------------------------+
      |                                                                   0.0001 |
      +--------------------------------------------------------------------------+
      1 row in set (0.20 sec)
       
      MariaDB [j10]> explain extended select count(*) from t5 where col2 = 179 ;
      +------+-------------+-------+------+---------------+------+---------+------+-------+----------+-------------+
      | id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows  | filtered | Extra       |
      +------+-------------+-------+------+---------------+------+---------+------+-------+----------+-------------+
      |    1 | SIMPLE      | t5    | ALL  | NULL          | NULL | NULL    | NULL | 10000 |     3.60 | Using where |
      +------+-------------+-------+------+---------------+------+---------+------+-------+----------+-------------+
      1 row in set, 1 warning (0.00 sec)

      It looks like histograms do not allow to distinguish between a frequent and a rare value?

      Attachments

        Issue Links

          Activity

            People

              psergei Sergei Petrunia
              psergei Sergei Petrunia
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.