[MDEV-7982] EXPLAIN shows meaningless query plan with Distinct Created: 2015-04-13  Updated: 2016-04-17  Resolved: 2016-04-17

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 10.0.17, 5.3.13, 5.5, 10.1
Fix Version/s: 10.2.0

Type: Bug Priority: Major
Reporter: Sergei Petrunia Assignee: Sergei Petrunia
Resolution: Fixed Votes: 0
Labels: explain, verified

Attachments: File BUG-j9m.sql    

 Description   

Take the attached dataset and run this query:

explain  select distinct count(*) from t2 where a between 10 and 20 group by a;
+------+-------------+-------+-------+---------------+------+---------+------+------+---------------------------------------------------------------------+
| id   | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra                                                               |
+------+-------------+-------+-------+---------------+------+---------+------+------+---------------------------------------------------------------------+
|    1 | SIMPLE      | t2    | range | a             | a    | 5       | NULL |   10 | Using where; Using index; Distinct; Using temporary; Using filesort |
+------+-------------+-------+-------+---------------+------+---------+------+------+---------------------------------------------------------------------+

and see an apparently non-sensical output (how can a table use "Distinct" if it
is the only table in the join?)

This is repeatable on 10.0.17, didn't check 5.5



 Comments   
Comment by Sergei Petrunia [ 2015-04-13 ]

Debugging on 10.0, looking at this line of code in JOIN::save_explain_data_intern:

	if (distinct & test_all_bits(used_tables,
                                     join->select_list_used_tables))

select_list_used_tables==0 when we come here... which is weird, JOIN::optimize_inner does call JOIN::eval_select_list_used_tables which should have put a correct value there.

in eval_select_list_used_tables we have:

(gdb) p item
  $11 = (Item *) 0x7fffa4006618
(gdb) p dbug_print_item( item)
  $12 = 0x1842700 "count(0)"
(gdb) p item->used_tables()
  $13 = 0

One can debate what (count)->used_tables() should return, but this is not the point.

The point is, if we run

  SELECT DISTINCT COUNT(*) 

then it is obvious that Distinct optimization is not applicable.
Short-circuiting logic applies when running

  SELECT DISTINCT func(table_column)
  

but not when one runs

SELECT DISTINCT aggregate_func.

(unless maybe some cases with Loose Scan).

Comment by Elena Stepanova [ 2015-04-13 ]

On 5.3 and 5.5, it looks like this:

elenst@wheezy-64:~/git/5.5$ ml test -e "explain  select distinct count(*) from t2 where a between 10 and 20 group by a;"
+------+-------------+-------+-------+---------------+------+---------+------+------+---------------------------------------------------------------------+
| id   | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra                                                               |
+------+-------------+-------+-------+---------------+------+---------+------+------+---------------------------------------------------------------------+
|    1 | SIMPLE      | t2    | range | a             | a    | 5       | NULL |   10 | Using where; Using index; Using temporary; Using filesort; Distinct |
+------+-------------+-------+-------+---------------+------+---------+------+------+---------------------------------------------------------------------+

Comment by Sergei Petrunia [ 2016-04-17 ]

This is fixed in 10.2 tree by post-MDEV-8646 fixes.

Generated at Thu Feb 08 07:23:45 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.