[MDEV-4189] Poor execution plan for single-table grouping query with limit clause Created: 2013-02-19  Updated: 2019-05-19

Status: Stalled
Project: MariaDB Server
Component/s: None
Affects Version/s: 10.0.1, 5.5.29, 5.1.67, 5.2.14
Fix Version/s: 10.1

Type: Bug Priority: Major
Reporter: Igor Babaev Assignee: Sergei Petrunia
Resolution: Unresolved Votes: 1
Labels: order-by-optimization


 Description   

In many cases, when there is no index compatible with the group list of a query over one table, and, at the same time, a range condition over an index can be extracted from the WHERE clause, the optimizer chooses the execution plan over this range rather than a table scan plan as soon as the query uses a limit clause.
Without limit clause the optimizer chooses a table scan if it turns out to
be cheaper than the range scan.

The problem can be demonstrated on the following test case.

Create and populate table t1 with following commands:

create table t1 (
pk int primary key auto_increment, b int, c int, index (b)
);
 
insert into t1(b,c) values
(1,10), (2,20), (3,30), (4,40), (5,50), (6,60), (7,70), (8,80), (9,90);
insert into t1(b,c) select b,c from t1;
insert into t1(b,c) select b,c from t1;
insert into t1(b,c) select b,c from t1;
insert into t1(b,c) select b,c from t1;

Then you'll see with any MariaDB version/release that the query

  select c from t1 where b != 5 group by c;

is executed with a table scan,
while the query

  select c from t1 where b != 5 group by c limit 2;

is executed with a range scan that is very inefficient here:

MariaDB [test]> explain select c from t1 where b != 5 group by c;
+----+-------------+-------+------+---------------+------+---------+------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra                                        |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------------------------------------+
|  1 | SIMPLE      | t1    | ALL  | b             | NULL | NULL    | NULL |  144 | Using where; Using temporary; Using filesort |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------------------------------------+
 
 
MariaDB [test]> explain select c from t1 where b != 5 group by c limit 2;
+----+-------------+-------+-------+---------------+------+---------+------+------+----------------------------------------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra                                        |
+----+-------------+-------+-------+---------------+------+---------+------+------+----------------------------------------------+
|  1 | SIMPLE      | t1    | range | b             | b    | 5       | NULL |  136 | Using where; Using temporary; Using filesort |
+----+-------------+-------+-------+---------------+------+---------+------+------+----------------------------------------------+



 Comments   
Comment by Sergei Petrunia [ 2013-02-25 ]

We have also agreed that current code is wrong in not taking the presence of GROUP BY into account; "limit" parameter of test_quick_select() should be the same as it was during the regular range analysis. (But note the ICP difference I've marked above. Perhaps, this is hitting some bug.)

Comment by Sergei Petrunia [ 2013-05-04 ]

make_join_statistics() calls get_quick_record_count() with join->row_limit
as the last parameter:

...
      records= get_quick_record_count(join->thd, select, s->table,
				      &s->const_keys, join->row_limit);
...

get_quick_record_count passes that number to test_quick_select() as its
'limit' argument:

...
    if ((error= select->test_quick_select(thd, *(key_map *)keys,(table_map) 0,
                                          limit, 0, FALSE)) == 1)
...

That is, the idea "always pass the same limit parameter to test_quick_select()"
is implemented by this patch:

=== modified file 'sql/sql_select.cc'
--- sql/sql_select.cc	2013-05-03 20:56:50 +0000
+++ sql/sql_select.cc	2013-05-04 18:49:25 +0000
@@ -8803,10 +8803,7 @@ make_join_select(JOIN *join,SQL_SELECT *
 	    if (sel->test_quick_select(thd, tab->keys,
 				       ((used_tables & ~ current_map) |
                                         OUTER_REF_TABLE_BIT),
-				       (join->select_options &
-					OPTION_FOUND_ROWS ?
-					HA_POS_ERROR :
-					join->unit->select_limit_cnt), 0,
+				        join->row_limit, 0,
                                         FALSE) < 0)
             {
 	      /*
@@ -8817,11 +8814,8 @@ make_join_select(JOIN *join,SQL_SELECT *
               if (!*tab->on_expr_ref ||
                   sel->test_quick_select(thd, tab->keys,
                                          used_tables & ~ current_map,
-                                         (join->select_options &
-                                          OPTION_FOUND_ROWS ?
-                                          HA_POS_ERROR :
-                                          join->unit->select_limit_cnt),0,
-                                          FALSE) < 0)
+                                         join->row_limit,0,
+                                         FALSE) < 0)
 		DBUG_RETURN(1);			// Impossible WHERE
             }
             else

Comment by Daniel Black [ 2015-06-22 ]

is this patch worth applying now?

Comment by Sergei Petrunia [ 2019-05-19 ]

In connection with MDEV-15777, had another discussion about the code in question with Monty.
Committed a better comment:
https://github.com/MariaDB/server/commit/91efcc6392cef920aa3697dc9789830ae9cdd379

Generated at Thu Feb 08 06:54:29 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.