[MDEV-4144] simple subquery causes full scan instead of range scan Created: 2013-02-06  Updated: 2013-04-02  Resolved: 2013-03-29

Status: Closed
Project: MariaDB Server
Component/s: None
Affects Version/s: 10.0.0, 5.5.29, 5.3.12
Fix Version/s: 10.0.2, 5.5.31, 5.3.13

Type: Bug Priority: Major
Reporter: Patryk Pomykalski Assignee: Timour Katchaounov (Inactive)
Resolution: Fixed Votes: 0
Labels: optimizer


 Description   

Test case:

--source include/have_innodb.inc
 
CREATE TABLE t (
  id int not null auto_increment,
  x int not null,
  primary key(id)
)engine=innodb;
 
insert into t (x) values(0),(0),(0);
insert into t (x) select 0 from t as t1,t as t2;
insert into t (x) select 0 from t as t1,t as t2;
insert into t (x) select 0 from t as t1,t as t2;
 
SELECT (SELECT MAX(id) - 1000 FROM t) INTO @a;
FLUSH STATUS;
SELECT x FROM t WHERE id > @a ORDER BY x LIMIT 1;
SHOW STATUS LIKE 'handler_read%';
FLUSH STATUS;
SELECT x FROM t WHERE id > (SELECT MAX(id) - 1000 FROM t) ORDER BY x LIMIT 1;
SHOW STATUS LIKE 'handler_read%';
DROP TABLE t;

output on mariadb (tested 5.3.5, 5.5.28, 5.5.29 and 10.0.0):

mysql [localhost] {msandbox} (test) > SELECT x FROM t WHERE id > @a ORDER BY x LIMIT 1;
+---+
| x |
+---+
| 0 |
+---+
1 row in set (0.00 sec)
 
mysql [localhost] {msandbox} (test) > SHOW STATUS LIKE 'handler_read%';
+--------------------------+-------+
| Variable_name            | Value |
+--------------------------+-------+
| Handler_read_first       | 0     |
| Handler_read_key         | 1     |
| Handler_read_last        | 0     |
| Handler_read_next        | 1000  |
| Handler_read_prev        | 0     |
| Handler_read_rnd         | 0     |
| Handler_read_rnd_deleted | 0     |
| Handler_read_rnd_next    | 0     |
+--------------------------+-------+
8 rows in set (0.00 sec)
 
mysql [localhost] {msandbox} (test) > FLUSH STATUS;
Query OK, 0 rows affected (0.00 sec)
 
mysql [localhost] {msandbox} (test) > SELECT x FROM t WHERE id > (SELECT MAX(id) - 1000 FROM t) ORDER BY x DESC LIMIT 1;
+---+
| x |
+-- +
| 0 |
+---+
1 row in set (0.00 sec)
 
mysql [localhost] {msandbox} (test) > SHOW STATUS LIKE 'handler_read%';
+--------------------------+-------+
| Variable_name            | Value |
+--------------------------+-------+
| Handler_read_first       | 0     |
| Handler_read_key         | 0     |
| Handler_read_last        | 1     |
| Handler_read_next        | 0     |
| Handler_read_prev        | 0     |
| Handler_read_rnd         | 0     |
| Handler_read_rnd_deleted | 0     |
| Handler_read_rnd_next    | 24493 |
+--------------------------+-------+

output on mysql (tested 5.0 and 5.1):

mysql> SELECT x FROM t WHERE id > @a ORDER BY x LIMIT 1;
+---+
| x |
+---+
| 0 |
+---+
1 row in set (0.00 sec)
 
mysql> SHOW STATUS LIKE 'handler_read%';
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| Handler_read_first    | 0     |
| Handler_read_key      | 3     |
| Handler_read_next     | 1000  |
| Handler_read_prev     | 0     |
| Handler_read_rnd      | 0     |
| Handler_read_rnd_next | 0     |
+-----------------------+-------+
6 rows in set (0.00 sec)
 
mysql> FLUSH STATUS;
Query OK, 0 rows affected (0.00 sec)
 
mysql> SELECT x FROM t WHERE id > (SELECT MAX(id) - 1000 FROM t) ORDER BY x LIMIT 1;
+---+
| x |
+---+
| 0 |
+---+
1 row in set (0.00 sec)
 
mysql> SHOW STATUS LIKE 'handler_read%';
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| Handler_read_first    | 0     |
| Handler_read_key      | 5     |
| Handler_read_next     | 1000  |
| Handler_read_prev     | 0     |
| Handler_read_rnd      | 0     |
| Handler_read_rnd_next | 0     |
+-----------------------+-------+



 Comments   
Comment by Patryk Pomykalski [ 2013-02-06 ]

forgot about explains:

mariadb:
mysql [localhost]

{msandbox}

(test) > explain SELECT x FROM t WHERE id > (SELECT MAX(id) - 1000 FROM t) ORDER BY x DESC LIMIT 1;
---------------------------------------------------------------------------------------+

id select_type table type possible_keys key key_len ref rows Extra

---------------------------------------------------------------------------------------+

1 PRIMARY t ALL PRIMARY NULL NULL NULL 24790 Using where; Using filesort
2 SUBQUERY NULL NULL NULL NULL NULL NULL NULL Select tables optimized away

---------------------------------------------------------------------------------------+

mysql 5.1:
mysql> explain SELECT x FROM t WHERE id > (SELECT MAX(id) - 1000 FROM t) ORDER BY x LIMIT 1;
----------------------------------------------------------------------------------------+

id select_type table type possible_keys key key_len ref rows Extra

----------------------------------------------------------------------------------------+

1 PRIMARY t range PRIMARY PRIMARY 4 NULL 1342 Using where; Using filesort
2 SUBQUERY NULL NULL NULL NULL NULL NULL NULL Select tables optimized away

----------------------------------------------------------------------------------------+

Comment by Patryk Pomykalski [ 2013-02-06 ]

Mysql 5.5.30 works ok too:

mysql [localhost]

{msandbox}

(test) > explain SELECT x FROM t WHERE id > (SELECT MAX(id) - 1000 FROM t) ORDER BY x LIMIT 1;
----------------------------------------------------------------------------------------+

id select_type table type possible_keys key key_len ref rows Extra

----------------------------------------------------------------------------------------+

1 PRIMARY t range PRIMARY PRIMARY 4 NULL 999 Using where; Using filesort
2 SUBQUERY NULL NULL NULL NULL NULL NULL NULL Select tables optimized away

----------------------------------------------------------------------------------------+

Comment by Elena Stepanova [ 2013-02-06 ]

Reproducible as described, with a bigger table as well (I went up to ~800K rows)
Assigned to Sergei for an initial assessment, please reassign if needed.

Comment by Patryk Pomykalski [ 2013-02-15 ]

debugging...

In get_mm_tree the subquery is considered expensive (not optimized).

Comment by Patryk Pomykalski [ 2013-02-15 ]

I'm not sure if this is a proper solution, but it works. Some query plans change.

=== modified file 'sql/item_subselect.cc'
— sql/item_subselect.cc 2013-01-15 18:13:32 +0000
+++ sql/item_subselect.cc 2013-02-15 16:29:44 +0000
@@ -547,6 +547,9 @@
if (!cur_join)
continue;

+ if (cur_join->table_count == cur_join->const_tables)
+ return false;
+
/* If a subquery is not optimized we cannot estimate its cost. */
if (!cur_join->join_tab)
return true;

Comment by Sergei Petrunia [ 2013-02-19 ]

Agree.. The subquery is considered expensive, and range optimizer doesn't consider the "id > (SELECT ...)" predicate.

Comment by Sergei Petrunia [ 2013-02-19 ]

What is weird, is why is this subquery considered expensive. The code in item_subselect.cc: Item_subselect::is_expensive() looks as if it was written to be able to distinguish between the expensive and in-expensive subqueries. Yet, debugging shows that execution in Item_subselect::is_expensive() returns here:

/* If a subquery is not optimized we cannot estimate its cost. */
if (!cur_join->join_tab)
return true;

I'm wondering whether Patryk's addition may hit the subqueries that are not-yet-optimized (Do those subqueries may have cur_join->table_count == cur_join->const_tables?).

Will discuss with Timour (his code).

Comment by Patryk Pomykalski [ 2013-03-27 ]

There's actually 'optimized' flag in join so maybe it should be something like this:

— sql/item_subselect.cc 2013-03-17 10:41:25 +0000
+++ sql/item_subselect.cc 2013-03-27 14:03:06 +0000
@@ -548,8 +548,11 @@
continue;

/* If a subquery is not optimized we cannot estimate its cost. */
+ if (!cur_join->optimized)
+ return true;
+
if (!cur_join->join_tab)

  • return true;
    + continue;

if (sl->first_inner_unit())
{

Comment by Timour Katchaounov (Inactive) [ 2013-03-28 ]

Patryk, you are very close to the solution, however JOIN::optimized is set in the very beginning of JOIN::optimize.
As a result it may happen that JOIN::optimize didn't produce an executable plan. At the same time, one of the
ideas behind ::is_expensive() is to tell the optimizer that an expression is not expensive, and thus it can be evaluated
during optimization. So this method should rule out subquery plans that cannot or should not be evaluated during optimization.

There are much more extensive flags that describe the state of a query plan in a development branch of a 10.0 feature,
however these cannot be backported to a GA version.

Comment by Timour Katchaounov (Inactive) [ 2013-03-29 ]

Pushed to 5.5.31

Generated at Thu Feb 08 06:54:06 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.