Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-17761

Odd optimizer choice with ORDER BY LIMIT and condition selectivity

    Details

      Description

      The ORDER BY...LIMIT optimizer displays strange effects when one adds
      histogram and enables use_condition_selectivity.

      create table ten(a int);
      insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
      create table one_k(a int);
      insert into one_k select A.a + B.a* 10 + C.a * 100 from ten A, ten B, ten C;
      create table ten_k(a int);
      insert into ten_k select A.a + 1000 *B.a from one_k A, ten B;
       
      create table t12 (
        a int,
        b int,
        c int,
        filler1 char(255),
        filler2 char(255),
        key(a)
      );
      insert into t12 select a,a,a, a,a from ten_k;
      

      With current default settings and @@optimizer_use_condition_selectivity=1:

      mysql> explain extended select * from t12 where b < 5000;
      +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+
      | id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
      +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+
      |    1 | SIMPLE      | t12   | ALL  | NULL          | NULL | NULL    | NULL | 9646 |   100.00 | Using where |
      +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+
      

      Ok, filtered=100%, the optimizer has no clue about selectivity.

      Now, the ORDER BY ... LIMIT query:

      mysql> explain extended select * from t12 where b < 5000 order by a limit 600;
      +------+-------------+-------+------+---------------+------+---------+------+------+----------+-----------------------------+
      | id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                       |
      +------+-------------+-------+------+---------------+------+---------+------+------+----------+-----------------------------+
      |    1 | SIMPLE      | t12   | ALL  | NULL          | NULL | NULL    | NULL | 9646 |   100.00 | Using where; Using filesort |
      +------+-------------+-------+------+---------------+------+---------+------+------+----------+-----------------------------+
      

      In order to read 600 rows it will use filesort. (if one uses a lower LIMIT value, e.g. "LIMIT 400", the optimizer will use the
      index).

      Now, let's give optimizer a clue about the condition selectivity:

      set histogram_size=100;
      set use_stat_tables=preferably;
      set optimizer_use_condition_selectivity=4;
      analyze table t12 persistent for columns(b) indexes ();
      

      Now, the optimizer knows about condition selectivity:

      mysql> explain extended select * from t12 where b < 5000 ;
      +------+-------------+-------+------+---------------+------+---------+------+-------+----------+-------------+
      | id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows  | filtered | Extra       |
      +------+-------------+-------+------+---------------+------+---------+------+-------+----------+-------------+
      |    1 | SIMPLE      | t12   | ALL  | NULL          | NULL | NULL    | NULL | 10000 |    50.50 | Using where |
      +------+-------------+-------+------+---------------+------+---------+------+-------+----------+-------------+
      

      The query plan for the ORDER BY...LIMIT query becomes:

      mysql> explain extended select * from t12 where b < 5000 order by a limit 600;
      +------+-------------+-------+-------+---------------+------+---------+------+------+----------+-------------+
      | id   | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
      +------+-------------+-------+-------+---------------+------+---------+------+------+----------+-------------+
      |    1 | SIMPLE      | t12   | index | NULL          | a    | 5       | NULL |  600 |   100.00 | Using where |
      +------+-------------+-------+-------+---------------+------+---------+------+------+----------+-------------+
      

      The odd parts about this are:
      (Minor) filtered=100%, although the optimizer has information about the condition selectivity.

      (Major) Why did the query plan change from using filesort to using an index?
      selectivity=50% which means we'll need to scan 2x more rows before we find
      #LIMIT matching rows.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                psergey Sergei Petrunia
                Reporter:
                psergey Sergei Petrunia
              • Votes:
                1 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated: