[MDEV-13744] Force index for group by is not always honored Created: 2017-09-06  Updated: 2020-08-25  Resolved: 2018-11-02

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 10.1, 10.2, 10.3
Fix Version/s: N/A

Type: Bug Priority: Critical
Reporter: Richard Stracke Assignee: Sergei Petrunia
Resolution: Not a Bug Votes: 2
Labels: None
Environment:

tested on docker container



 Description   

MariaDB optimizer does not seem to honor the "force index for group by" index hint :

To reproduce:

 
drop table if exists t;
create table t(id int auto_increment, ts timestamp not null, primary key(id,ts)) engine=InnoDB, stats_sample_pages = 20;
insert into t(ts) values ('2017-01-01');
insert into t(ts) values ('2017-01-01');
replace into t(ts) select t1.ts from t t1, t t2, t t3, t t4, t t5, t t6, t t7, t t8, t t9, t t10, t t11, t t12, t t13, t t14, t t15;
insert into t(id,ts) select distinct id, '2017-01-02' from t;
select count(distinct id) from t;
analyze table t;
show indexes from t;
explain select id, max(ts) from t group by id;
explain select id, max(ts) from t force index for group by (primary) group by id;

analyze format = json select id, max(ts) from t force index for group by (primary) group by

 
{
  "query_block": {
{
  "query_block": {
    "select_id": 1,
    "r_loops": 1,
    "r_total_time_ms": 28.534,
    "table": {
      "table_name": "t",
      "access_type": "index",
      "key": "PRIMARY",
      "key_length": "8",
      "used_key_parts": ["id", "ts"],
      "r_loops": 1,
      "rows": 66792,
      "r_rows": 65540,
      "r_total_time_ms": 9.2646,
      "filtered": 100,
      "r_filtered": 100,
      "using_index": true
    }
  }
}

Cardinality in show indexes is 66343 for column id and for column ts.

select count(distinct id) from t; # --> 32770

Related Oracle bug reports

https://bugs.mysql.com/bug.php?id=60023
https://bugs.mysql.com/bug.php?id=87670



 Comments   
Comment by Elena Stepanova [ 2017-09-07 ]

MySQL 5.7 shows Using index for group-by with or without the hint:

MySQL 5.7

MySQL [test]> explain select id, max(ts) from t group by id;
+----+-------------+-------+------------+-------+---------------+---------+---------+------+-------+----------+--------------------------+
| id | select_type | table | partitions | type  | possible_keys | key     | key_len | ref  | rows  | filtered | Extra                    |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+-------+----------+--------------------------+
|  1 | SIMPLE      | t     | NULL       | range | PRIMARY       | PRIMARY | 4       | NULL | 32178 |   100.00 | Using index for group-by |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+-------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)

Comment by Sergei Petrunia [ 2018-11-02 ]

It actually does use index for GROUP BY. Reading data through index "PRIMARY" allows to read one GROUP BY group after another, and the optimizer is taking advantage of that.

Compare:

mysql> explain select id, max(ts) from t group by id\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t
         type: index
possible_keys: NULL
          key: PRIMARY
      key_len: 8
          ref: NULL
         rows: 63681
        Extra: Using index
1 row in set (0.00 sec)

Changing the GROUP BY clause so the optimizer doesn't know the index is usable:

mysql> explain select id, max(ts) from t group by id+1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t
         type: index
possible_keys: NULL
          key: PRIMARY
      key_len: 8
          ref: NULL
         rows: 63681
        Extra: Using index; Using temporary; Using filesort
1 row in set (0.00 sec)

Comment by Sergei Petrunia [ 2018-11-02 ]

Maybe, the actual question of this bug report was, why is this query not using Loose Scan to resolve the GROUP BY. This is a valid question. I suspect the cost calculations, but let me check that.

Comment by Sergei Petrunia [ 2018-11-02 ]

The report says

Cardinality in show indexes is 66343 for column id and for column ts.

select count(distinct id) from t; # --> 32770

Well, if there are 32K GROUP BY groups with an average of 2 (TWO) rows per group, Loose Scan will not provide any speedup.

Comment by Sergei Petrunia [ 2018-11-02 ]

And btw if I get the GROUP BY group larger, it will pick Loose Scan:

create table t2 (id int auto_increment, ts timestamp not null, primary key(id,ts)) engine=innodb;
set @a=0;
 
mysql> insert into t2 select (@a:=@a+1), '2010-01-01' from t limit 1000;
Query OK, 1000 rows affected (0.01 sec)
Records: 1000  Duplicates: 0  Warnings: 0
 
mysql> insert into t2 select t2.id, date_add('2012-01-01', interval SIXTY_FOUR.id day) from t2, (select id from t2 order by id limit 64) as SIXTY_FOUR;
Query OK, 64000 rows affected (0.28 sec)
Records: 64000  Duplicates: 0  Warnings: 0
 
analyze table t2;

mysql>  explain select id, max(ts) from t2 group by id;
+------+-------------+-------+-------+---------------+---------+---------+------+------+--------------------------+
| id   | select_type | table | type  | possible_keys | key     | key_len | ref  | rows | Extra                    |
+------+-------------+-------+-------+---------------+---------+---------+------+------+--------------------------+
|    1 | SIMPLE      | t2    | range | NULL          | PRIMARY | 4       | NULL | 1732 | Using index for group-by |
+------+-------------+-------+-------+---------------+---------+---------+------+------+--------------------------+

Comment by Sergei Petrunia [ 2018-11-02 ]

Closing as Not a bug.
Feel free to open if you still think something doesn't work right.
(If you think we should have a hint to force Loose Scan, and actually have a use for it - please file a new MDEV with a feature request)

Generated at Thu Feb 08 08:08:01 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.