Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-13275

Query optimizer not picking optimal ORDER BY PRIMARY in case of INNER JOIN on PRIMARY like it does for similar WHERE condition

Details

    Description

      Order by a column that isn't the first column in the join list requires a temporary table.

      A query that INNER JOINs two tables on their Primary keys with a WHERE clause on a particular key value properly optimizes to target the first table in the join order for the WHERE clause, presumably based on the logical equivalence of the values implied by the INNER JOIN:

      Query second table's id (PRIMARY):

      SELECT first.id FROM first INNER JOIN second ON first.id = second.id WHERE second.id > '5E1215B77BB14DA4DD7C9D4DDD26501A';
      

      Optimized query (from EXPLAIN EXTENDED warning) points the WHERE clause to the first table:

      select `db`.`first`.`id` AS `id` from `db`.`first` join `db`.`second` where ((`db`.`second`.`id` = `db`.`first`.`id`) and (`db`.`first`.`id` > '5E1215B77BB14DA4DD7C9D4DDD26501A'))
      

      The same thing does not happen for the ORDER BY, and so changing which table's Primary key we order by makes the difference between requiring a temporary table and not:

      This query (ORDER BY first) is basically just an index retrieval:

      > ANALYZE SELECT first.id FROM first INNER JOIN second ON first.id = second.id ORDER BY first.id;
      +------+-------------+--------+--------+---------------+---------+---------+------------------------+------+--------+----------+------------+-------------+
      | id   | select_type | table  | type   | possible_keys | key     | key_len | ref                    | rows | r_rows | filtered | r_filtered | Extra       |
      +------+-------------+--------+--------+---------------+---------+---------+------------------------+------+--------+----------+------------+-------------+
      |    1 | SIMPLE      | first  | index  | PRIMARY       | PRIMARY | 96      | NULL                   |  100 | 100.00 |   100.00 |     100.00 | Using index |
      |    1 | SIMPLE      | second | eq_ref | PRIMARY       | PRIMARY | 96      | huff_20170614.first.id |    1 |   1.00 |   100.00 |     100.00 | Using index |
      +------+-------------+--------+--------+---------------+---------+---------+------------------------+------+--------+----------+------------+-------------+
      

      But if you change the ORDER BY to the second table, it now requires a temporary table:

      > ANALYZE SELECT first.id FROM first INNER JOIN second ON first.id = second.id ORDER BY second.id;
      +------+-------------+--------+--------+---------------+---------+---------+------------------------+------+--------+----------+------------+----------------------------------------------+
      | id   | select_type | table  | type   | possible_keys | key     | key_len | ref                    | rows | r_rows | filtered | r_filtered | Extra                                        |
      +------+-------------+--------+--------+---------------+---------+---------+------------------------+------+--------+----------+------------+----------------------------------------------+
      |    1 | SIMPLE      | first  | index  | PRIMARY       | PRIMARY | 96      | NULL                   |  100 | 100.00 |   100.00 |     100.00 | Using index; Using temporary; Using filesort |
      |    1 | SIMPLE      | second | eq_ref | PRIMARY       | PRIMARY | 96      | huff_20170614.first.id |    1 |   1.00 |   100.00 |     100.00 | Using index                                  |
      +------+-------------+--------+--------+---------------+---------+---------+------------------------+------+--------+----------+------------+----------------------------------------------+
      

      In our real-life example, the table was ~2GB and tmp_table_size was 64MB, so this meant that the temp table actually went to disk.

      Notable status counter differences:

      Handler_commit	1	1
      Handler_read_first	1	1
      Handler_read_key	906	906
      Handler_read_next	906	906
      Handler_read_rnd	0	906
      Handler_read_rnd_next	0	907
      Handler_tmp_write	0	906
      

      Attachments

        Issue Links

          Activity

            Ok, so `orderby_uses_equalities` is unable use this equality because of its datatype.

            psergei Sergei Petrunia added a comment - Ok, so `orderby_uses_equalities` is unable use this equality because of its datatype.

            A little background about equality propagation. There are two kinds of equality propagation.
            1. Full substitution - this allows to make inferences like X=Y AND cond(X) -> X=Y AND cond(Y).
            2. Substitution for the purpose of comparison (S4PC) allows to make inferences like X=Y and X < expr -> X=Y and Y < expr

            Full substitution is often unusable for [VAR]CHAR. The common reasons are case-insensitive comparisons ( 'a'='A' ) and padding ('A' = 'A '). However, they don't prevent S4PC.

            My point is that orderby_uses_equalities should use S4PC. While this bug looks like it is relying on Full Substitution instead.

            psergei Sergei Petrunia added a comment - A little background about equality propagation. There are two kinds of equality propagation. 1. Full substitution - this allows to make inferences like X=Y AND cond(X) -> X=Y AND cond(Y) . 2. Substitution for the purpose of comparison (S4PC) allows to make inferences like X=Y and X < expr -> X=Y and Y < expr Full substitution is often unusable for [VAR] CHAR. The common reasons are case-insensitive comparisons ( 'a'='A' ) and padding ('A' = 'A '). However, they don't prevent S4PC. My point is that orderby_uses_equalities should use S4PC. While this bug looks like it is relying on Full Substitution instead.

            Example of S4PC in action:

            MariaDB [test]> show create table t1;
            +-------+-----------------------------------------------------------------------------------------------------------+
            | Table | Create Table                                                                                              |
            +-------+-----------------------------------------------------------------------------------------------------------+
            | t1    | CREATE TABLE `t1` (
              `id` char(32) NOT NULL,
              PRIMARY KEY (`id`)
            ) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
            +-------+-----------------------------------------------------------------------------------------------------------+
            1 row in set (0.000 sec)
            

            10:45

            MariaDB [test]> show create table t2;
            +-------+-----------------------------------------------------------------------------------------------------------+
            | Table | Create Table                                                                                              |
            +-------+-----------------------------------------------------------------------------------------------------------+
            | t2    | CREATE TABLE `t2` (
              `id` char(32) NOT NULL,
              PRIMARY KEY (`id`)
            ) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
            +-------+-----------------------------------------------------------------------------------------------------------+
            1 row in set (0.000 sec)
            

            analyze format=json select * from t1,t2 where t1.id > 'a' and t2.id=t1.id
            | {
              "query_block": {
                "select_id": 1,
                "r_loops": 1,
                "r_total_time_ms": 0.5222,
                "table": {
                  "table_name": "t2",
                  "access_type": "range",
                  "possible_keys": ["PRIMARY"],
                  "key": "PRIMARY",
                  "key_length": "32",
                  "used_key_parts": ["id"],
                  "r_loops": 1,
                  "rows": 3,
                  "r_rows": 3,
                  "r_total_time_ms": 0.1872,
                  "filtered": 100,
                  "r_filtered": 100,
                  "attached_condition": "t2.`id` > 'a'",
                  "using_index": true
                },
                "table": {
                  "table_name": "t1",
                  "access_type": "eq_ref",
                  "possible_keys": ["PRIMARY"],
                  "key": "PRIMARY",
                  "key_length": "32",
                  "used_key_parts": ["id"],
                  "ref": ["test.t2.id"],
                  "r_loops": 3,
                  "rows": 1,
                  "r_rows": 1,
                  "r_total_time_ms": 0.1543,
                  "filtered": 100,
                  "r_filtered": 100,
                  "using_index": true
                }
              }
            } |
            

            psergei Sergei Petrunia added a comment - Example of S4PC in action: MariaDB [test]> show create table t1; +-------+-----------------------------------------------------------------------------------------------------------+ | Table | Create Table | +-------+-----------------------------------------------------------------------------------------------------------+ | t1 | CREATE TABLE `t1` ( `id` char(32) NOT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 | +-------+-----------------------------------------------------------------------------------------------------------+ 1 row in set (0.000 sec) 10:45 MariaDB [test]> show create table t2; +-------+-----------------------------------------------------------------------------------------------------------+ | Table | Create Table | +-------+-----------------------------------------------------------------------------------------------------------+ | t2 | CREATE TABLE `t2` ( `id` char(32) NOT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 | +-------+-----------------------------------------------------------------------------------------------------------+ 1 row in set (0.000 sec) analyze format=json select * from t1,t2 where t1.id > 'a' and t2.id=t1.id | { "query_block": { "select_id": 1, "r_loops": 1, "r_total_time_ms": 0.5222, "table": { "table_name": "t2", "access_type": "range", "possible_keys": ["PRIMARY"], "key": "PRIMARY", "key_length": "32", "used_key_parts": ["id"], "r_loops": 1, "rows": 3, "r_rows": 3, "r_total_time_ms": 0.1872, "filtered": 100, "r_filtered": 100, "attached_condition": "t2.`id` > 'a'", "using_index": true }, "table": { "table_name": "t1", "access_type": "eq_ref", "possible_keys": ["PRIMARY"], "key": "PRIMARY", "key_length": "32", "used_key_parts": ["id"], "ref": ["test.t2.id"], "r_loops": 3, "rows": 1, "r_rows": 1, "r_total_time_ms": 0.1543, "filtered": 100, "r_filtered": 100, "using_index": true } } } |

            varun, please find the code that does S4PC and then let's

            • verify that orderby_uses_equalities uses full substitution instead
            • discuss whether it could use S4PC also.

            (TODO: does orderby_uses_equalities handle cases where full substitution applies while S4PC does not? e.g. {{ t1.datetime_col=t2.datetime_col ORDER BY MONTH(t1.datetime_col)}} ? )

            psergei Sergei Petrunia added a comment - varun , please find the code that does S4PC and then let's verify that orderby_uses_equalities uses full substitution instead discuss whether it could use S4PC also. (TODO: does orderby_uses_equalities handle cases where full substitution applies while S4PC does not? e.g. {{ t1.datetime_col=t2.datetime_col ORDER BY MONTH(t1.datetime_col)}} ? )

            For S4PC we will not have a problem even if we have a function in the order by list.
            lets take a query for example:

            select * from t1,t2 where t1.id = t2.id order by length(t1.id)

            so the substitution of t1.id with t2.id would not happen inside the length(..) function. In the optimization orderby_uses_equalites we do substitution only for the top level items(not functional items).

            varun Varun Gupta (Inactive) added a comment - For S4PC we will not have a problem even if we have a function in the order by list. lets take a query for example: select * from t1,t2 where t1.id = t2.id order by length(t1.id) so the substitution of t1.id with t2.id would not happen inside the length(..) function. In the optimization orderby_uses_equalites we do substitution only for the top level items(not functional items).

            People

              psergei Sergei Petrunia
              joey.mart Joey Mart (Inactive)
              Votes:
              2 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.