Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-6041

ORDER BY+subqueries: subquery_table.key=outer_table.col is not recongized as binding

Details

    Description

      Create a test dataset:

      create table t1 (a int);
      insert into t1 select * from test.one_k;
       
      create table tsubq(
        id int primary key,
        key1 int,
        col1 int,
        key(key1)
      ) engine=innodb;
       
      insert into tsubq 
        select A.a + B.a*1000, A.a, 123456 from test.one_k A, test.one_k B;

      Then, check the plan:

      explain select 
         (SELECT 
            concat(id, '-', key1, '-', col1)
          FROM tsubq
          WHERE tsubq.key1 = t1.a
          ORDER BY tsubq.id ASC LIMIT 1)
      from 
        t1;

      +------+--------------------+-------+-------+---------------+---------+---------+------+------+-------------+
      | id   | select_type        | table | type  | possible_keys | key     | key_len | ref  | rows | Extra       |
      +------+--------------------+-------+-------+---------------+---------+---------+------+------+-------------+
      |    1 | PRIMARY            | t1    | ALL   | NULL          | NULL    | NULL    | NULL | 1000 |             |
      |    2 | DEPENDENT SUBQUERY | tsubq | index | key1          | PRIMARY | 4       | NULL |    1 | Using where |
      +------+--------------------+-------+-------+---------------+---------+---------+------+------+-------------+

      The subquery uses "index" access, which is very inefficient. The estimate for #rows seems to come from the LIMIT clause and is very wrong in this case.

      The table is InnoDB (with extended keys). The index KEY(key1) is actually KEY(key1, id). The query has a restriction on key1 which makes it constant (tsubq.key1 = t1.a). After that, ORDER BY tsubq.id is achieved automatically.

      The problem seems to be specifically with references to outside of subquery. If I use a constant instead, the query plan is able to use key1:

      explain select     (SELECT        concat(id, '-', key1, '-', col1)     FROM tsubq     WHERE tsubq.key1 = 333     ORDER BY tsubq.id ASC LIMIT 1) from    t1;
      +------+-------------+-------+------+---------------+------+---------+-------+------+-------------+
      | id   | select_type | table | type | possible_keys | key  | key_len | ref   | rows | Extra       |
      +------+-------------+-------+------+---------------+------+---------+-------+------+-------------+
      |    1 | PRIMARY     | t1    | ALL  | NULL          | NULL | NULL    | NULL  | 1000 |             |
      |    2 | SUBQUERY    | tsubq | ref  | key1          | key1 | 5       | const |  999 | Using where |
      +------+-------------+-------+------+---------------+------+---------+-------+------+-------------+

      Attachments

        Issue Links

          Activity

            psergei Sergei Petrunia created issue -
            psergei Sergei Petrunia made changes -
            Field Original Value New Value
            Assignee Sergei Petrunia [ psergey ]

            Initial investigation:

            The optimizer fails to detect that sorting is not needed, because TABLE::const_key_parts[ key1] = 0.

            I'm wondering if the fix is to make const_key_parts=1 for cases like this.

            TABLE::const_key_parts is modified by

            • reset by make_join_statistics
            • set by sort_and_filter_keyuse
            • update_const_equal_items sets too
            • TABLE::update_const_key_parts() for UPDATE.

            Checked by

            • test_if_order_by_key()

            Suggestion: change the meaning of "const_key_parts" to be "including
            OUTER_REF_TABLE_BIT".

            psergei Sergei Petrunia added a comment - Initial investigation: The optimizer fails to detect that sorting is not needed, because TABLE::const_key_parts[ key1] = 0. I'm wondering if the fix is to make const_key_parts=1 for cases like this. TABLE::const_key_parts is modified by reset by make_join_statistics set by sort_and_filter_keyuse update_const_equal_items sets too TABLE::update_const_key_parts() for UPDATE. Checked by test_if_order_by_key() Suggestion: change the meaning of "const_key_parts" to be "including OUTER_REF_TABLE_BIT".
            psergei Sergei Petrunia made changes -
            Description {noformat}
            create table t1 (a int);
            insert into t1 select * from test.one_k;

            create table tsubq(
              id int primary key,
              key1 int,
              col1 int,
              key(key1)
            );

            insert into tsubq
              select A.a + B.a*1000, A.a, 123456 from test.one_k A, test.one_k B;
            {noformat}

            {noformat}
            explain select
               (SELECT
                  concat(id, '-', key1, '-', col1)
                FROM tsubq
                WHERE tsubq.key1 = t1.a
                ORDER BY tsubq.id ASC LIMIT 1)
            from
              t1;
            {noformat}


            {noformat}
            +------+--------------------+-------+-------+---------------+---------+---------+------+------+-------------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
            +------+--------------------+-------+-------+---------------+---------+---------+------+------+-------------+
            | 1 | PRIMARY | t1 | ALL | NULL | NULL | NULL | NULL | 1000 | |
            | 2 | DEPENDENT SUBQUERY | tsubq | index | key1 | PRIMARY | 4 | NULL | 1 | Using where |
            +------+--------------------+-------+-------+---------------+---------+---------+------+------+-------------+
            {noformat}

            The above is wrong. Note also a very wrong estimate for #rows.

            If I change outer reference to constant, the EXPLAIN is more reasonable:

            {noformat}
            explain select (SELECT concat(id, '-', key1, '-', col1) FROM tsubq WHERE tsubq.key1 = 333 ORDER BY tsubq.id ASC LIMIT 1) from t1;
            +------+-------------+-------+------+---------------+------+---------+-------+------+-------------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
            +------+-------------+-------+------+---------------+------+---------+-------+------+-------------+
            | 1 | PRIMARY | t1 | ALL | NULL | NULL | NULL | NULL | 1000 | |
            | 2 | SUBQUERY | tsubq | ref | key1 | key1 | 5 | const | 999 | Using where |
            +------+-------------+-------+------+---------------+------+---------+-------+------+-------------+
            {noformat}
            Create a test dataset:
            {noformat}
            create table t1 (a int);
            insert into t1 select * from test.one_k;

            create table tsubq(
              id int primary key,
              key1 int,
              col1 int,
              key(key1)
            );

            insert into tsubq
              select A.a + B.a*1000, A.a, 123456 from test.one_k A, test.one_k B;
            {noformat}

            Then, check the plan:
            {noformat}
            explain select
               (SELECT
                  concat(id, '-', key1, '-', col1)
                FROM tsubq
                WHERE tsubq.key1 = t1.a
                ORDER BY tsubq.id ASC LIMIT 1)
            from
              t1;
            {noformat}


            {noformat}
            +------+--------------------+-------+-------+---------------+---------+---------+------+------+-------------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
            +------+--------------------+-------+-------+---------------+---------+---------+------+------+-------------+
            | 1 | PRIMARY | t1 | ALL | NULL | NULL | NULL | NULL | 1000 | |
            | 2 | DEPENDENT SUBQUERY | tsubq | index | key1 | PRIMARY | 4 | NULL | 1 | Using where |
            +------+--------------------+-------+-------+---------------+---------+---------+------+------+-------------+
            {noformat}

            The subquery uses "index" access, which is very inefficient. The estimate for #rows seems to come from the LIMIT clause and is very wrong in this case.
             
            The query is not using a plan

            If I change outer reference to constant, the EXPLAIN is more reasonable:

            {noformat}
            explain select (SELECT concat(id, '-', key1, '-', col1) FROM tsubq WHERE tsubq.key1 = 333 ORDER BY tsubq.id ASC LIMIT 1) from t1;
            +------+-------------+-------+------+---------------+------+---------+-------+------+-------------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
            +------+-------------+-------+------+---------------+------+---------+-------+------+-------------+
            | 1 | PRIMARY | t1 | ALL | NULL | NULL | NULL | NULL | 1000 | |
            | 2 | SUBQUERY | tsubq | ref | key1 | key1 | 5 | const | 999 | Using where |
            +------+-------------+-------+------+---------------+------+---------+-------+------+-------------+
            {noformat}
            psergei Sergei Petrunia made changes -
            Description Create a test dataset:
            {noformat}
            create table t1 (a int);
            insert into t1 select * from test.one_k;

            create table tsubq(
              id int primary key,
              key1 int,
              col1 int,
              key(key1)
            );

            insert into tsubq
              select A.a + B.a*1000, A.a, 123456 from test.one_k A, test.one_k B;
            {noformat}

            Then, check the plan:
            {noformat}
            explain select
               (SELECT
                  concat(id, '-', key1, '-', col1)
                FROM tsubq
                WHERE tsubq.key1 = t1.a
                ORDER BY tsubq.id ASC LIMIT 1)
            from
              t1;
            {noformat}


            {noformat}
            +------+--------------------+-------+-------+---------------+---------+---------+------+------+-------------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
            +------+--------------------+-------+-------+---------------+---------+---------+------+------+-------------+
            | 1 | PRIMARY | t1 | ALL | NULL | NULL | NULL | NULL | 1000 | |
            | 2 | DEPENDENT SUBQUERY | tsubq | index | key1 | PRIMARY | 4 | NULL | 1 | Using where |
            +------+--------------------+-------+-------+---------------+---------+---------+------+------+-------------+
            {noformat}

            The subquery uses "index" access, which is very inefficient. The estimate for #rows seems to come from the LIMIT clause and is very wrong in this case.
             
            The query is not using a plan

            If I change outer reference to constant, the EXPLAIN is more reasonable:

            {noformat}
            explain select (SELECT concat(id, '-', key1, '-', col1) FROM tsubq WHERE tsubq.key1 = 333 ORDER BY tsubq.id ASC LIMIT 1) from t1;
            +------+-------------+-------+------+---------------+------+---------+-------+------+-------------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
            +------+-------------+-------+------+---------------+------+---------+-------+------+-------------+
            | 1 | PRIMARY | t1 | ALL | NULL | NULL | NULL | NULL | 1000 | |
            | 2 | SUBQUERY | tsubq | ref | key1 | key1 | 5 | const | 999 | Using where |
            +------+-------------+-------+------+---------------+------+---------+-------+------+-------------+
            {noformat}
            Create a test dataset:
            {noformat}
            create table t1 (a int);
            insert into t1 select * from test.one_k;

            create table tsubq(
              id int primary key,
              key1 int,
              col1 int,
              key(key1)
            ) engine=innodb;

            insert into tsubq
              select A.a + B.a*1000, A.a, 123456 from test.one_k A, test.one_k B;
            {noformat}

            Then, check the plan:
            {noformat}
            explain select
               (SELECT
                  concat(id, '-', key1, '-', col1)
                FROM tsubq
                WHERE tsubq.key1 = t1.a
                ORDER BY tsubq.id ASC LIMIT 1)
            from
              t1;
            {noformat}


            {noformat}
            +------+--------------------+-------+-------+---------------+---------+---------+------+------+-------------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
            +------+--------------------+-------+-------+---------------+---------+---------+------+------+-------------+
            | 1 | PRIMARY | t1 | ALL | NULL | NULL | NULL | NULL | 1000 | |
            | 2 | DEPENDENT SUBQUERY | tsubq | index | key1 | PRIMARY | 4 | NULL | 1 | Using where |
            +------+--------------------+-------+-------+---------------+---------+---------+------+------+-------------+
            {noformat}

            The subquery uses "index" access, which is very inefficient. The estimate for #rows seems to come from the LIMIT clause and is very wrong in this case.

            The table is InnoDB (with extended keys). The index KEY(key1) is actually KEY(key1, id). The query has a restriction on key1 which makes it constant (tsubq.key1 = t1.a). After that, ORDER BY tsubq.id is achieved automatically.

            The problem seems to be specifically with references to outside of subquery. If I use a constant instead, the query plan is able to use key1:

            {noformat}
            explain select (SELECT concat(id, '-', key1, '-', col1) FROM tsubq WHERE tsubq.key1 = 333 ORDER BY tsubq.id ASC LIMIT 1) from t1;
            +------+-------------+-------+------+---------------+------+---------+-------+------+-------------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
            +------+-------------+-------+------+---------------+------+---------+-------+------+-------------+
            | 1 | PRIMARY | t1 | ALL | NULL | NULL | NULL | NULL | 1000 | |
            | 2 | SUBQUERY | tsubq | ref | key1 | key1 | 5 | const | 999 | Using where |
            +------+-------------+-------+------+---------------+------+---------+-------+------+-------------+
            {noformat}

            Committed a fix for the above suggestion. I'll also investigate why did it pick query plan with 'index' (as opposed to query plan with 'ref' + 'Using filesort').

            psergei Sergei Petrunia added a comment - Committed a fix for the above suggestion. I'll also investigate why did it pick query plan with 'index' (as opposed to query plan with 'ref' + 'Using filesort').

            Trying the original testcase on different vesions of MySQL/MariaDB:

            5.3.13-MariaDB-log

            id      select_type     table   type    possible_keys   key     key_len ref     rows    Extra
            1       PRIMARY t1      ALL     NULL    NULL    NULL    NULL    700
            2       DEPENDENT SUBQUERY      tsubq   index   key1    PRIMARY 4       NULL    1       Using where

            5.5.37-MariaDB-log

            id      select_type     table   type    possible_keys   key     key_len ref     rows    Extra
            1       PRIMARY t1      ALL     NULL    NULL    NULL    NULL    1351
            2       DEPENDENT SUBQUERY      tsubq   index   key1    PRIMARY 4       NULL    1       Using where

            5.6.16-debug-log

            id      select_type     table   type    possible_keys   key     key_len ref     rows    Extra
            1       PRIMARY t1      ALL     NULL    NULL    NULL    NULL    1000    NULL
            2       DEPENDENT SUBQUERY      tsubq   ref     key1    key1    5       test.t1.a       493     Using where; Using filesort

            MySQL 5.5.32, MySQL 5.5.37

            id      select_type     table   type    possible_keys   key     key_len ref     rows    Extra
            1       PRIMARY t1      ALL     NULL    NULL    NULL    NULL    1157
            2       DEPENDENT SUBQUERY      tsubq   index   key1    PRIMARY 4       NULL    1       Using where

            psergei Sergei Petrunia added a comment - Trying the original testcase on different vesions of MySQL/MariaDB: 5.3.13-MariaDB-log id select_type table type possible_keys key key_len ref rows Extra 1 PRIMARY t1 ALL NULL NULL NULL NULL 700 2 DEPENDENT SUBQUERY tsubq index key1 PRIMARY 4 NULL 1 Using where 5.5.37-MariaDB-log id select_type table type possible_keys key key_len ref rows Extra 1 PRIMARY t1 ALL NULL NULL NULL NULL 1351 2 DEPENDENT SUBQUERY tsubq index key1 PRIMARY 4 NULL 1 Using where 5.6.16-debug-log id select_type table type possible_keys key key_len ref rows Extra 1 PRIMARY t1 ALL NULL NULL NULL NULL 1000 NULL 2 DEPENDENT SUBQUERY tsubq ref key1 key1 5 test.t1.a 493 Using where; Using filesort MySQL 5.5.32, MySQL 5.5.37 id select_type table type possible_keys key key_len ref rows Extra 1 PRIMARY t1 ALL NULL NULL NULL NULL 1157 2 DEPENDENT SUBQUERY tsubq index key1 PRIMARY 4 NULL 1 Using where
            psergei Sergei Petrunia made changes -
            Affects Version/s 5.3.12 [ 12000 ]
            Affects Version/s 5.5.36 [ 14600 ]
            psergei Sergei Petrunia made changes -
            Fix Version/s 10.0.11 [ 15200 ]
            psergei Sergei Petrunia made changes -
            Labels upstream, upstream-5.5
            psergei Sergei Petrunia made changes -
            Labels upstream, upstream-5.5 upstream upstream-5.5
            psergei Sergei Petrunia made changes -
            Status Open [ 1 ] In Progress [ 3 ]

            Looking at the above: MySQL 5.5 and MariaDB 5.3/5.5 had the same "old" query plan. MySQL 5.6 has a "new" query plan. We need to find out what has caused the difference.

            psergei Sergei Petrunia added a comment - Looking at the above: MySQL 5.5 and MariaDB 5.3/5.5 had the same "old" query plan. MySQL 5.6 has a "new" query plan. We need to find out what has caused the difference.

            The change in 5.6 was caused by this patch (it is for 5.7 but it was backported to 5.6):

            revno: 4502
            revision-id: chaithra.gopalareddy@oracle.com-20120918093140-2ouxgmh0so1j92jw
            committer: Chaithra Gopalareddy <chaithra.gopalareddy@oracle.com>
            branch nick: mysql-trunk
            timestamp: Tue 2012-09-18 15:01:40 +0530
            message:
            Bug#14338686: MYSQL IS GENERATING DIFFERENT AND SLOWER
            (IN NEWER VERSIONS) EXECUTION PLAN
            PROBLEM:
            While checking for an index to sort for the order by clause
            in this query
            "SELECT datestamp FROM contractStatusHistory WHERE
            contract_id = contracts.id ORDER BY datestamp asc limit 1;"
            ...

            psergei Sergei Petrunia added a comment - The change in 5.6 was caused by this patch (it is for 5.7 but it was backported to 5.6): revno: 4502 revision-id: chaithra.gopalareddy@oracle.com-20120918093140-2ouxgmh0so1j92jw committer: Chaithra Gopalareddy <chaithra.gopalareddy@oracle.com> branch nick: mysql-trunk timestamp: Tue 2012-09-18 15:01:40 +0530 message: Bug#14338686: MYSQL IS GENERATING DIFFERENT AND SLOWER (IN NEWER VERSIONS) EXECUTION PLAN PROBLEM: While checking for an index to sort for the order by clause in this query "SELECT datestamp FROM contractStatusHistory WHERE contract_id = contracts.id ORDER BY datestamp asc limit 1;" ...

            The patch has no testcase but looks reasonable.

            psergei Sergei Petrunia added a comment - The patch has no testcase but looks reasonable.
            psergei Sergei Petrunia made changes -

            Will address the issue with cost calculations in MDEV-6081.

            psergei Sergei Petrunia added a comment - Will address the issue with cost calculations in MDEV-6081 .

            Fix pushed to 10.0

            psergei Sergei Petrunia added a comment - Fix pushed to 10.0
            psergei Sergei Petrunia made changes -
            Resolution Fixed [ 1 ]
            Status In Progress [ 3 ] Closed [ 6 ]
            serg Sergei Golubchik made changes -
            Workflow defaullt [ 38410 ] MariaDB v2 [ 43782 ]
            ratzpo Rasmus Johansson (Inactive) made changes -
            Workflow MariaDB v2 [ 43782 ] MariaDB v3 [ 62972 ]
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 62972 ] MariaDB v4 [ 147757 ]

            People

              psergei Sergei Petrunia
              psergei Sergei Petrunia
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.