[MDEV-15433] Optimizer does not use group by optimization with distinct Created: 2018-02-27  Updated: 2020-08-25  Resolved: 2018-08-05

Status: Closed
Project: MariaDB Server
Component/s: Optimizer, Storage Engine - InnoDB
Affects Version/s: 10.1.31
Fix Version/s: 10.1.35

Type: Bug Priority: Critical
Reporter: Geoff Montee (Inactive) Assignee: Varun Gupta (Inactive)
Resolution: Fixed Votes: 3
Labels: optimizer

Issue Links:
Relates
relates to MDEV-10880 Assertions `keypart_map' or `prebuilt... Closed

 Description   

This is similar to the following upstream bug that was fixed in MySQL 5.6, but this problem seems to effect all InnoDB tables in MariaDB 10.1, not just partitioned ones:

https://bugs.mysql.com/bug.php?id=60023

For example, run the following test:

CREATE TABLE tbl1 (id INT NOT NULL AUTO_INCREMENT PRIMARY KEY, a INT NOT NULL, KEY(a)) ENGINE=InnoDB;
INSERT INTO tbl1(a) VALUES (1), (2), (3), (4);
INSERT INTO tbl1(a) SELECT a FROM tbl1;
INSERT INTO tbl1(a) SELECT a FROM tbl1;
INSERT INTO tbl1(a) SELECT a FROM tbl1;
INSERT INTO tbl1(a) SELECT a FROM tbl1;
INSERT INTO tbl1(a) SELECT a FROM tbl1;
INSERT INTO tbl1(a) SELECT a FROM tbl1;
INSERT INTO tbl1(a) SELECT a FROM tbl1;
INSERT INTO tbl1(a) SELECT a FROM tbl1;
INSERT INTO tbl1(a) SELECT a FROM tbl1;
INSERT INTO tbl1(a) SELECT a FROM tbl1;
INSERT INTO tbl1(a) SELECT a FROM tbl1;
INSERT INTO tbl1(a) SELECT a FROM tbl1;
INSERT INTO tbl1(a) SELECT a FROM tbl1;
INSERT INTO tbl1(a) SELECT a FROM tbl1;
 
OPTIMIZE TABLE tbl1;
EXPLAIN SELECT DISTINCT a FROM tbl1;
SELECT DISTINCT a FROM tbl1;
EXPLAIN SELECT a FROM tbl1 GROUP BY a;
SELECT a FROM tbl1 GROUP BY a;

Here's the output seen on MariaDB 10.1.31:

MariaDB [db1]> EXPLAIN SELECT DISTINCT a FROM tbl1;
+------+-------------+-------+-------+---------------+------+---------+------+-------+-------------+
| id   | select_type | table | type  | possible_keys | key  | key_len | ref  | rows  | Extra       |
+------+-------------+-------+-------+---------------+------+---------+------+-------+-------------+
|    1 | SIMPLE      | tbl1  | index | NULL          | a    | 4       | NULL | 65895 | Using index |
+------+-------------+-------+-------+---------------+------+---------+------+-------+-------------+
1 row in set (0.00 sec)
 
MariaDB [db1]> SELECT DISTINCT a FROM tbl1;
+---+
| a |
+---+
| 1 |
| 2 |
| 3 |
| 4 |
+---+
4 rows in set (0.02 sec)
 
MariaDB [db1]> EXPLAIN SELECT a FROM tbl1 GROUP BY a;
+------+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
| id   | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra                    |
+------+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
|    1 | SIMPLE      | tbl1  | range | NULL          | a    | 4       | NULL |    7 | Using index for group-by |
+------+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
1 row in set (0.00 sec)
 
MariaDB [db1]> SELECT a FROM tbl1 GROUP BY a;
+---+
| a |
+---+
| 1 |
| 2 |
| 3 |
| 4 |
+---+
4 rows in set (0.00 sec)

Compare this to the output seen on 5.6.38:

mysql> EXPLAIN SELECT DISTINCT a FROM tbl1;
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra                    |
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
|  1 | SIMPLE      | tbl1  | range | a             | a    | 4       | NULL |    7 | Using index for group-by |
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
1 row in set (0.00 sec)
 
mysql> SELECT DISTINCT a FROM tbl1;
+---+
| a |
+---+
| 1 |
| 2 |
| 3 |
| 4 |
+---+
4 rows in set (0.00 sec)
 
mysql> EXPLAIN SELECT a FROM tbl1 GROUP BY a;
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra                    |
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
|  1 | SIMPLE      | tbl1  | range | a             | a    | 4       | NULL |    7 | Using index for group-by |
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
1 row in set (0.00 sec)
 
mysql> SELECT a FROM tbl1 GROUP BY a;
+---+
| a |
+---+
| 1 |
| 2 |
| 3 |
| 4 |
+---+
4 rows in set (0.00 sec)



 Comments   
Comment by Richard Stracke [ 2018-03-02 ]

On MariaDB 10.2 it works with

 
SET SESSION use_stat_tables='NEVER';

On 10.1. a workarround is to combine group by and distinct.

EXPLAIN SELECT DISTINCT a FROM tbl1 GROUP BY a;

Comment by Alice Sherepa [ 2018-03-06 ]

Not reproducible on MariaDB 5.5, 10.2, 10.3 - Using index for group-by in all cases;
on 10.1 - reproducible with Innodb and MyIsam, with 10.0, 10.1, with Innodb, if query "optimize table" was not executed, then-"using index" -for both queries;
with MyISAM/10.1

ANALYZE SELECT DISTINCT a FROM tbl1
+------+-------------+-------+-------+---------------+------+---------+------+-------+----------+----------+------------+-------------+
| id   | select_type | table | type  | possible_keys | key  | key_len | ref  | rows  | r_rows   | filtered | r_filtered | Extra       |
+------+-------------+-------+-------+---------------+------+---------+------+-------+----------+----------+------------+-------------+
|    1 | SIMPLE      | tbl1  | index | NULL          | a    | 4       | NULL | 65536 | 65536.00 |   100.00 |     100.00 | Using index |
+------+-------------+-------+-------+---------------+------+---------+------+-------+----------+----------+------------+-------------+
1 row in set (0.06 sec)
 
ANALYZE SELECT a FROM tbl1 GROUP BY a
+------+-------------+-------+-------+---------------+------+---------+------+------+--------+----------+------------+--------------------------+
| id   | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | r_rows | filtered | r_filtered | Extra                    |
+------+-------------+-------+-------+---------------+------+---------+------+------+--------+----------+------------+--------------------------+
|    1 | SIMPLE      | tbl1  | range | NULL          | a    | 4       | NULL |    5 |   4.00 |   100.00 |     100.00 | Using index for group-by |
+------+-------------+-------+-------+---------------+------+---------+------+------+--------+----------+------------+--------------------------+
1 row in set (0.00 sec)

Innodb/10.1

--source include/have_innodb.inc
 
CREATE TABLE tbl1 (id INT NOT NULL AUTO_INCREMENT PRIMARY KEY, a INT NOT NULL, KEY(a)) ENGINE=Innodb;
INSERT INTO tbl1(a) VALUES (1), (2), (3), (4);
INSERT INTO tbl1(a) SELECT a FROM tbl1;
INSERT INTO tbl1(a) SELECT a FROM tbl1;
INSERT INTO tbl1(a) SELECT a FROM tbl1;
INSERT INTO tbl1(a) SELECT a FROM tbl1;
INSERT INTO tbl1(a) SELECT a FROM tbl1;
INSERT INTO tbl1(a) SELECT a FROM tbl1;
INSERT INTO tbl1(a) SELECT a FROM tbl1;
INSERT INTO tbl1(a) SELECT a FROM tbl1;
INSERT INTO tbl1(a) SELECT a FROM tbl1;
INSERT INTO tbl1(a) SELECT a FROM tbl1;
INSERT INTO tbl1(a) SELECT a FROM tbl1;
INSERT INTO tbl1(a) SELECT a FROM tbl1;
INSERT INTO tbl1(a) SELECT a FROM tbl1;
INSERT INTO tbl1(a) SELECT a FROM tbl1;
 
OPTIMIZE TABLE tbl1;
ANALYZE SELECT DISTINCT a FROM tbl1;
ANALYZE SELECT a FROM tbl1 GROUP BY a;
 
DROP TABLE tbl1;

Sporadic, not using group-by optimization
1)

ANALYZE SELECT DISTINCT a FROM tbl1
+------+-------------+-------+-------+---------------+------+---------+------+-------+----------+----------+------------+-------------+
| id   | select_type | table | type  | possible_keys | key  | key_len | ref  | rows  | r_rows   | filtered | r_filtered | Extra       |
+------+-------------+-------+-------+---------------+------+---------+------+-------+----------+----------+------------+-------------+
|    1 | SIMPLE      | tbl1  | index | NULL          | a    | 4       | NULL | 64244 | 65536.00 |   100.00 |     100.00 | Using index |
+------+-------------+-------+-------+---------------+------+---------+------+-------+----------+----------+------------+-------------+
1 row in set (0.18 sec)
 
ANALYZE SELECT a FROM tbl1 GROUP BY a
+------+-------------+-------+-------+---------------+------+---------+------+------+--------+----------+------------+--------------------------+
| id   | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | r_rows | filtered | r_filtered | Extra                    |
+------+-------------+-------+-------+---------------+------+---------+------+------+--------+----------+------------+--------------------------+
|    1 | SIMPLE      | tbl1  | range | NULL          | a    | 4       | NULL |    7 |   4.00 |   100.00 |     100.00 | Using index for group-by |
+------+-------------+-------+-------+---------------+------+---------+------+------+--------+----------+------------+--------------------------+
1 row in set (0.00 sec)

or sometimes like this:

 ANALYZE SELECT DISTINCT a FROM tbl1;
 id	select_type	table	type	possible_keys	key	key_len	ref	rows	r_rows	filtered	r_filtered	Extra
-1	SIMPLE	tbl1	index	NULL	a	4	NULL	66012	65536.00	100.00	100.00	Using index
+1	SIMPLE	tbl1	index	NULL	a	4	NULL	63323	65536.00	100.00	100.00	Using index
 
 ANALYZE SELECT a FROM tbl1 GROUP BY a;
 id	select_type	table	type	possible_keys	key	key_len	ref	rows	r_rows	filtered	r_filtered	Extra
-1	SIMPLE	tbl1	range	NULL	a	4	NULL	3	4.00	100.00	100.00	Using index for group-by
+1	SIMPLE	tbl1	range	NULL	a	4	NULL	15	4.00	100.00	100.00	Using index for group-by

Comment by Varun Gupta (Inactive) [ 2018-08-02 ]

Patch
http://lists.askmonty.org/pipermail/commits/2018-August/012754.html

this patch reverts the commit for MDEV-10880

commit b76b69cd5fe634d8ddb9406aa2c82ef2a375b4d8
Author: Oleksandr Byelkin <sanja@mariadb.com>
Date:   Tue Jun 20 14:55:30 2017 +0200
 
    MDEV-10880: Assertions `keypart_map' or `prebuilt->search_tuple->n_fields > 0' fail on DISTINCT and GROUP BY constant
    
    add_group_and_distinct_keys() should take into account JOIN::simple_group.

Comment by Varun Gupta (Inactive) [ 2018-08-03 ]

After discussion with igor we have concluded that the fix for MDEV-10880 is not correct, so we will have to revert the fix.

Comment by Varun Gupta (Inactive) [ 2018-08-03 ]

 
When do we use GROUP BY optimization
 
1) MIN , MAX with GROUP BY does GROUP BY optimization
 
MariaDB [test]> explain select max(b) , min(b) from tbl1 group by a;
+------+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
| id   | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra                    |
+------+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
|    1 | SIMPLE      | tbl1  | range | NULL          | a    | 4       | NULL |    7 | Using index for group-by |
+------+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
 
2) DISINCT without GROUP BY does GROUP BY optimization
 
MariaDB [test]> explain select distinct a from tbl1;
+------+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
| id   | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra                    |
+------+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
|    1 | SIMPLE      | tbl1  | range | NULL          | a    | 4       | NULL |    7 | Using index for group-by |
+------+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
 
 
3) With GROUP BY we do GROUP BY optimization 
 
MariaDB [test]> explain select b+4 from tbl1 group by a;
+------+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
| id   | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra                    |
+------+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
|    1 | SIMPLE      | tbl1  | range | NULL          | a    | 4       | NULL |    7 | Using index for group-by |
+------+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
 
 
 
4) With Distinct and GROUP by , we can do GROUP BY optimization only if the rules are followed by the GROUP BY clause
MariaDB [test]> explain select distinct a from tbl1 group by a;
+------+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
| id   | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra                    |
+------+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
|    1 | SIMPLE      | tbl1  | range | NULL          | a    | 4       | NULL |    7 | Using index for group-by |
+------+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
 
 
5) AGG_FUNC(DISTINCT column_list) does GROUP BY optimization.
MariaDB [test]> explain select count(distinct a) from tbl1;
+------+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
| id   | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra                    |
+------+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
|    1 | SIMPLE      | tbl1  | range | NULL          | a    | 4       | NULL |    7 | Using index for group-by |
+------+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+

Comment by Varun Gupta (Inactive) [ 2018-08-03 ]

In MDEV-10880 the query is

SELECT DISTINCT a FROM tbl1 GROUP BY 'foo';

MariaDB [test]> explain SELECT DISTINCT a FROM tbl1 GROUP BY 'foo';
+------+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
| id   | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra                    |
+------+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
|    1 | SIMPLE      | tbl1  | range | NULL          | a    | 0       | NULL |    2 | Using index for group-by |
+------+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+

The key_len= 0 which tells us that the length of the key on the GROUP BY column is 0. For such a case I think we should add a case of not allowing the GROUP BY optimization.

This case comes under the category (4) as mentioned above.

Comment by Varun Gupta (Inactive) [ 2018-08-03 ]

the code in get_best_group_min_max

/*
   Check (GA2) if this is a DISTINCT query.
   If GA2, then Store a new ORDER object in group_fields_array at the
   position of the key part of item_field->field. Thus we get the ORDER
   objects for each field ordered as the corresponding key parts.
   Later group_fields_array of ORDER objects is used to convert the query
   to a GROUP query.
*/
   if ((!join->group && join->select_distinct)

So here we check if we have GROUP BY or not and if we don't have GROUP BY and distinct then we convert DISTINCT to GROUP BY.

So for our case we don't enter the IF condition here and so we don't have any groups over which we will create the group by prefix and that is why we end up with key_len=0. So we should not allow such cases.

Comment by Varun Gupta (Inactive) [ 2018-08-04 ]

Second patch
http://lists.askmonty.org/pipermail/commits/2018-August/012769.html

Generated at Thu Feb 08 08:21:15 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.