Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-4144

simple subquery causes full scan instead of range scan

Details

    Description

      Test case:

      --source include/have_innodb.inc
       
      CREATE TABLE t (
        id int not null auto_increment,
        x int not null,
        primary key(id)
      )engine=innodb;
       
      insert into t (x) values(0),(0),(0);
      insert into t (x) select 0 from t as t1,t as t2;
      insert into t (x) select 0 from t as t1,t as t2;
      insert into t (x) select 0 from t as t1,t as t2;
       
      SELECT (SELECT MAX(id) - 1000 FROM t) INTO @a;
      FLUSH STATUS;
      SELECT x FROM t WHERE id > @a ORDER BY x LIMIT 1;
      SHOW STATUS LIKE 'handler_read%';
      FLUSH STATUS;
      SELECT x FROM t WHERE id > (SELECT MAX(id) - 1000 FROM t) ORDER BY x LIMIT 1;
      SHOW STATUS LIKE 'handler_read%';
      DROP TABLE t;

      output on mariadb (tested 5.3.5, 5.5.28, 5.5.29 and 10.0.0):

      mysql [localhost] {msandbox} (test) > SELECT x FROM t WHERE id > @a ORDER BY x LIMIT 1;
      +---+
      | x |
      +---+
      | 0 |
      +---+
      1 row in set (0.00 sec)
       
      mysql [localhost] {msandbox} (test) > SHOW STATUS LIKE 'handler_read%';
      +--------------------------+-------+
      | Variable_name            | Value |
      +--------------------------+-------+
      | Handler_read_first       | 0     |
      | Handler_read_key         | 1     |
      | Handler_read_last        | 0     |
      | Handler_read_next        | 1000  |
      | Handler_read_prev        | 0     |
      | Handler_read_rnd         | 0     |
      | Handler_read_rnd_deleted | 0     |
      | Handler_read_rnd_next    | 0     |
      +--------------------------+-------+
      8 rows in set (0.00 sec)
       
      mysql [localhost] {msandbox} (test) > FLUSH STATUS;
      Query OK, 0 rows affected (0.00 sec)
       
      mysql [localhost] {msandbox} (test) > SELECT x FROM t WHERE id > (SELECT MAX(id) - 1000 FROM t) ORDER BY x DESC LIMIT 1;
      +---+
      | x |
      +-- +
      | 0 |
      +---+
      1 row in set (0.00 sec)
       
      mysql [localhost] {msandbox} (test) > SHOW STATUS LIKE 'handler_read%';
      +--------------------------+-------+
      | Variable_name            | Value |
      +--------------------------+-------+
      | Handler_read_first       | 0     |
      | Handler_read_key         | 0     |
      | Handler_read_last        | 1     |
      | Handler_read_next        | 0     |
      | Handler_read_prev        | 0     |
      | Handler_read_rnd         | 0     |
      | Handler_read_rnd_deleted | 0     |
      | Handler_read_rnd_next    | 24493 |
      +--------------------------+-------+

      output on mysql (tested 5.0 and 5.1):

      mysql> SELECT x FROM t WHERE id > @a ORDER BY x LIMIT 1;
      +---+
      | x |
      +---+
      | 0 |
      +---+
      1 row in set (0.00 sec)
       
      mysql> SHOW STATUS LIKE 'handler_read%';
      +-----------------------+-------+
      | Variable_name         | Value |
      +-----------------------+-------+
      | Handler_read_first    | 0     |
      | Handler_read_key      | 3     |
      | Handler_read_next     | 1000  |
      | Handler_read_prev     | 0     |
      | Handler_read_rnd      | 0     |
      | Handler_read_rnd_next | 0     |
      +-----------------------+-------+
      6 rows in set (0.00 sec)
       
      mysql> FLUSH STATUS;
      Query OK, 0 rows affected (0.00 sec)
       
      mysql> SELECT x FROM t WHERE id > (SELECT MAX(id) - 1000 FROM t) ORDER BY x LIMIT 1;
      +---+
      | x |
      +---+
      | 0 |
      +---+
      1 row in set (0.00 sec)
       
      mysql> SHOW STATUS LIKE 'handler_read%';
      +-----------------------+-------+
      | Variable_name         | Value |
      +-----------------------+-------+
      | Handler_read_first    | 0     |
      | Handler_read_key      | 5     |
      | Handler_read_next     | 1000  |
      | Handler_read_prev     | 0     |
      | Handler_read_rnd      | 0     |
      | Handler_read_rnd_next | 0     |
      +-----------------------+-------+

      Attachments

        Activity

          Agree.. The subquery is considered expensive, and range optimizer doesn't consider the "id > (SELECT ...)" predicate.

          psergei Sergei Petrunia added a comment - Agree.. The subquery is considered expensive, and range optimizer doesn't consider the "id > (SELECT ...)" predicate.

          What is weird, is why is this subquery considered expensive. The code in item_subselect.cc: Item_subselect::is_expensive() looks as if it was written to be able to distinguish between the expensive and in-expensive subqueries. Yet, debugging shows that execution in Item_subselect::is_expensive() returns here:

          /* If a subquery is not optimized we cannot estimate its cost. */
          if (!cur_join->join_tab)
          return true;

          I'm wondering whether Patryk's addition may hit the subqueries that are not-yet-optimized (Do those subqueries may have cur_join->table_count == cur_join->const_tables?).

          Will discuss with Timour (his code).

          psergei Sergei Petrunia added a comment - What is weird, is why is this subquery considered expensive. The code in item_subselect.cc: Item_subselect::is_expensive() looks as if it was written to be able to distinguish between the expensive and in-expensive subqueries. Yet, debugging shows that execution in Item_subselect::is_expensive() returns here: /* If a subquery is not optimized we cannot estimate its cost. */ if (!cur_join->join_tab) return true; I'm wondering whether Patryk's addition may hit the subqueries that are not-yet-optimized (Do those subqueries may have cur_join->table_count == cur_join->const_tables?). Will discuss with Timour (his code).

          There's actually 'optimized' flag in join so maybe it should be something like this:

          — sql/item_subselect.cc 2013-03-17 10:41:25 +0000
          +++ sql/item_subselect.cc 2013-03-27 14:03:06 +0000
          @@ -548,8 +548,11 @@
          continue;

          /* If a subquery is not optimized we cannot estimate its cost. */
          + if (!cur_join->optimized)
          + return true;
          +
          if (!cur_join->join_tab)

          • return true;
            + continue;

          if (sl->first_inner_unit())
          {

          pomyk Patryk Pomykalski added a comment - There's actually 'optimized' flag in join so maybe it should be something like this: — sql/item_subselect.cc 2013-03-17 10:41:25 +0000 +++ sql/item_subselect.cc 2013-03-27 14:03:06 +0000 @@ -548,8 +548,11 @@ continue; /* If a subquery is not optimized we cannot estimate its cost. */ + if (!cur_join->optimized) + return true; + if (!cur_join->join_tab) return true; + continue; if (sl->first_inner_unit()) {

          Patryk, you are very close to the solution, however JOIN::optimized is set in the very beginning of JOIN::optimize.
          As a result it may happen that JOIN::optimize didn't produce an executable plan. At the same time, one of the
          ideas behind ::is_expensive() is to tell the optimizer that an expression is not expensive, and thus it can be evaluated
          during optimization. So this method should rule out subquery plans that cannot or should not be evaluated during optimization.

          There are much more extensive flags that describe the state of a query plan in a development branch of a 10.0 feature,
          however these cannot be backported to a GA version.

          timour Timour Katchaounov (Inactive) added a comment - Patryk, you are very close to the solution, however JOIN::optimized is set in the very beginning of JOIN::optimize. As a result it may happen that JOIN::optimize didn't produce an executable plan. At the same time, one of the ideas behind ::is_expensive() is to tell the optimizer that an expression is not expensive, and thus it can be evaluated during optimization. So this method should rule out subquery plans that cannot or should not be evaluated during optimization. There are much more extensive flags that describe the state of a query plan in a development branch of a 10.0 feature, however these cannot be backported to a GA version.

          Pushed to 5.5.31

          timour Timour Katchaounov (Inactive) added a comment - Pushed to 5.5.31

          People

            timour Timour Katchaounov (Inactive)
            pomyk Patryk Pomykalski
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.