[MDEV-24353] Adding GROUP BY slows down a query Created: 2020-12-06  Updated: 2020-12-15  Resolved: 2020-12-11

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 5.5, 10.0, 10.1, 10.3.27, 10.4.17, 10.5.8, 10.2, 10.3, 10.4, 10.5
Fix Version/s: 10.6.0

Type: Bug Priority: Major
Reporter: Jean-François Gagné Assignee: Varun Gupta (Inactive)
Resolution: Fixed Votes: 0
Labels: upstream


 Description   

Hi,

this is to report that MariaDB 10.5.8 is also affected by upstream Bug#101838 [1]. Look at the Description, How to repeat and three first comment to understand what I share below (I mixed-up ORDER BY and GROUP BY in the MySQL bug report, I fixed it later for the title, but I cannot fix it for the first comments, sorry about that).

[1]: https://jira.mariadb.org/browse/MDEV-11852

Note that I am able also able to reproduce on 10.4.17 and 10.3.27.

Many thanks for looking into this,

Jean-François Gagné

mysql [localhost:10508] {msandbox} (test_jfg) > select version();
+----------------+
| version()      |
+----------------+
| 10.5.8-MariaDB |
+----------------+
1 row in set (0.000 sec)
 
mysql [localhost:10508] {msandbox} (test_jfg) > show create table p\G
*************************** 1. row ***************************
       Table: p
Create Table: CREATE TABLE `p` (
  `p` int(10) unsigned NOT NULL,
  `a` int(10) unsigned NOT NULL,
  `c` bigint(20) unsigned NOT NULL,
  PRIMARY KEY (`p`,`a`,`c`)
) ENGINE=InnoDB DEFAULT CHARSET=ascii COLLATE=ascii_bin
1 row in set (0.000 sec)
 
mysql [localhost:10508] {msandbox} (test_jfg) > pager cat > /dev/null
PAGER set to 'cat > /dev/null'
mysql [localhost:10508] {msandbox} (test_jfg) > select MIN(c) from p where p = @p and a = @a group by p, a;select MIN(c) from p where p = @p and a = @a group by p, a;
1 row in set (0.282 sec)
 
1 row in set (0.261 sec)
 
mysql [localhost:10508] {msandbox} (test_jfg) > select MIN(c) from p where p = @p and a = @a;select MIN(c) from p where p = @p and a = @a;
1 row in set (0.000 sec)
 
1 row in set (0.000 sec)
 
mysql [localhost:10508] {msandbox} (test_jfg) > flush status; pager cat > /dev/null; select MIN(c) from p where p = @p and a = @a group by p, a; pager; show session status like "Handler_read%";
Query OK, 0 rows affected (0.000 sec)
 
PAGER set to 'cat > /dev/null'
1 row in set (0.261 sec)
 
Default pager wasn't set, using stdout.
+--------------------------+--------+
| Variable_name            | Value  |
+--------------------------+--------+
| Handler_read_first       | 0      |
| Handler_read_key         | 1      |
| Handler_read_last        | 0      |
| Handler_read_next        | 631283 |
| Handler_read_prev        | 0      |
| Handler_read_retry       | 0      |
| Handler_read_rnd         | 0      |
| Handler_read_rnd_deleted | 0      |
| Handler_read_rnd_next    | 0      |
+--------------------------+--------+
9 rows in set (0.001 sec)
 
mysql [localhost:10508] {msandbox} (test_jfg) > flush status; pager cat > /dev/null; select MIN(c) from p where p = @p and a = @a; pager; show session status like "Handler_read%";
Query OK, 0 rows affected (0.000 sec)
 
PAGER set to 'cat > /dev/null'
1 row in set (0.000 sec)
 
Default pager wasn't set, using stdout.
+--------------------------+-------+
| Variable_name            | Value |
+--------------------------+-------+
| Handler_read_first       | 0     |
| Handler_read_key         | 1     |
| Handler_read_last        | 0     |
| Handler_read_next        | 0     |
| Handler_read_prev        | 0     |
| Handler_read_retry       | 0     |
| Handler_read_rnd         | 0     |
| Handler_read_rnd_deleted | 0     |
| Handler_read_rnd_next    | 0     |
+--------------------------+-------+
9 rows in set (0.001 sec)
 
mysql [localhost:10508] {msandbox} (test_jfg) > flush status; pager cat > /dev/null; select p, a, MIN(c) from p where p = @p and a = @a group by p, a; pager; show session status like "Handler_read%";
Query OK, 0 rows affected (0.000 sec)
 
PAGER set to 'cat > /dev/null'
1 row in set (0.271 sec)
 
Default pager wasn't set, using stdout.
+--------------------------+--------+
| Variable_name            | Value  |
+--------------------------+--------+
| Handler_read_first       | 0      |
| Handler_read_key         | 1      |
| Handler_read_last        | 0      |
| Handler_read_next        | 631283 |
| Handler_read_prev        | 0      |
| Handler_read_retry       | 0      |
| Handler_read_rnd         | 0      |
| Handler_read_rnd_deleted | 0      |
| Handler_read_rnd_next    | 0      |
+--------------------------+--------+
9 rows in set (0.001 sec)
 
mysql [localhost:10508] {msandbox} (test_jfg) > flush status; pager cat > /dev/null; select p, a, MIN(c) from p where p = @p and a = @a; pager; show session status like "Handler_read%";
Query OK, 0 rows affected (0.000 sec)
 
PAGER set to 'cat > /dev/null'
1 row in set (0.255 sec)
 
Default pager wasn't set, using stdout.
+--------------------------+--------+
| Variable_name            | Value  |
+--------------------------+--------+
| Handler_read_first       | 0      |
| Handler_read_key         | 2      |
| Handler_read_last        | 0      |
| Handler_read_next        | 631283 |
| Handler_read_prev        | 0      |
| Handler_read_retry       | 0      |
| Handler_read_rnd         | 0      |
| Handler_read_rnd_deleted | 0      |
| Handler_read_rnd_next    | 0      |
+--------------------------+--------+
9 rows in set (0.001 sec)
 
mysql [localhost:10508] {msandbox} (test_jfg) > flush status; pager cat > /dev/null; select MIN(c) from p where p = @p and a = @a; pager; show session status like "Handler_read%";
Query OK, 0 rows affected (0.000 sec)
 
PAGER set to 'cat > /dev/null'
1 row in set (0.000 sec)
 
Default pager wasn't set, using stdout.
+--------------------------+-------+
| Variable_name            | Value |
+--------------------------+-------+
| Handler_read_first       | 0     |
| Handler_read_key         | 1     |
| Handler_read_last        | 0     |
| Handler_read_next        | 0     |
| Handler_read_prev        | 0     |
| Handler_read_retry       | 0     |
| Handler_read_rnd         | 0     |
| Handler_read_rnd_deleted | 0     |
| Handler_read_rnd_next    | 0     |
+--------------------------+-------+
9 rows in set (0.001 sec)
 
mysql [localhost:10508] {msandbox} (test_jfg) > show create table p_\G
*************************** 1. row ***************************
       Table: p_
Create Table: CREATE TABLE `p_` (
  `p` int(10) unsigned NOT NULL,
  `a` int(10) unsigned NOT NULL,
  `s` varchar(20) NOT NULL,
  `t` varchar(20) NOT NULL,
  `i` int(10) unsigned DEFAULT NULL,
  `c` bigint(20) unsigned NOT NULL,
  PRIMARY KEY (`p`,`a`,`c`),
  KEY `t` (`t`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8
1 row in set (0.000 sec)
 
mysql [localhost:10508] {msandbox} (test_jfg) > flush status; pager cat > /dev/null; select t, p, a, MIN(c) from p_ where p = @p and a = @a and t = @t group by t, p, a; pager; show session status like "Handler_read%";
Query OK, 0 rows affected (0.000 sec)
 
PAGER set to 'cat > /dev/null'
1 row in set (0.454 sec)
 
Default pager wasn't set, using stdout.
+--------------------------+--------+
| Variable_name            | Value  |
+--------------------------+--------+
| Handler_read_first       | 0      |
| Handler_read_key         | 318157 |
| Handler_read_last        | 0      |
| Handler_read_next        | 631283 |
| Handler_read_prev        | 0      |
| Handler_read_retry       | 0      |
| Handler_read_rnd         | 1      |
| Handler_read_rnd_deleted | 0      |
| Handler_read_rnd_next    | 2      |
+--------------------------+--------+
9 rows in set (0.001 sec)
 
mysql [localhost:10508] {msandbox} (test_jfg) > flush status; pager cat > /dev/null; select t, p, a, MIN(c) from p_ where p = @p and a = @a and t = @t; pager; show session status like "Handler_read%";
Query OK, 0 rows affected (0.000 sec)
 
PAGER set to 'cat > /dev/null'
1 row in set (0.320 sec)
 
Default pager wasn't set, using stdout.
+--------------------------+--------+
| Variable_name            | Value  |
+--------------------------+--------+
| Handler_read_first       | 0      |
| Handler_read_key         | 1      |
| Handler_read_last        | 0      |
| Handler_read_next        | 631283 |
| Handler_read_prev        | 0      |
| Handler_read_retry       | 0      |
| Handler_read_rnd         | 0      |
| Handler_read_rnd_deleted | 0      |
| Handler_read_rnd_next    | 0      |
+--------------------------+--------+
9 rows in set (0.001 sec)
 
mysql [localhost:10508] {msandbox} (test_jfg) > flush status; pager cat > /dev/null; select MIN(c) from p_ where p = @p and a = @a and t = @t; pager; show session status like "Handler_read%";
Query OK, 0 rows affected (0.000 sec)
 
PAGER set to 'cat > /dev/null'
1 row in set (0.262 sec)
 
Default pager wasn't set, using stdout.
+--------------------------+--------+
| Variable_name            | Value  |
+--------------------------+--------+
| Handler_read_first       | 0      |
| Handler_read_key         | 1      |
| Handler_read_last        | 0      |
| Handler_read_next        | 631283 |
| Handler_read_prev        | 0      |
| Handler_read_retry       | 0      |
| Handler_read_rnd         | 0      |
| Handler_read_rnd_deleted | 0      |
| Handler_read_rnd_next    | 0      |
+--------------------------+--------+
9 rows in set (0.001 sec)



 Comments   
Comment by Alice Sherepa [ 2020-12-07 ]

Thank you for the report!
I confirm that combination of where + group by for this cases is slower, on 5.5-10.5:

--source include/have_innodb.inc
--source include/have_sequence.inc
 
CREATE TABLE t1 (p int NOT NULL, a int NOT NULL, PRIMARY KEY (p,a));
insert into t1 select 2,seq from seq_0_to_1000;
 
analyze table t1;
 
set profiling=1;
 
select MIN(a) from t1 where p = 2 group by p;
select MIN(a) from t1 where p = 2;
select MIN(a) from t1 group by p;
 
show profiles;
 
analyze format=json select MIN(a) from t1 where p = 2 group by p;
analyze format=json select MIN(a) from t1 where p = 2;  #used min/max optimization
analyze format=json select MIN(a) from t1  group by p;  #used using_index_for_group_by
 
drop table t1;

10.2 f924a3bd6cd70c140f00d

set profiling=1;
select MIN(a) from t1 where p = 2 group by p;
MIN(a)
0
select MIN(a) from t1 where p = 2;
MIN(a)
0
select MIN(a) from t1 group by p;
MIN(a)
0
show profiles;
Query_ID	Duration	Query
1	0.00159900	select MIN(a) from t1 where p = 2 group by p
2	0.00025713	select MIN(a) from t1 where p = 2
3	0.00032281	select MIN(a) from t1 group by p
analyze format=json select MIN(a) from t1 where p = 2 group by p;
ANALYZE
{
  "query_block": {
    "select_id": 1,
    "r_loops": 1,
    "r_total_time_ms": 1.0317,
    "table": {
      "table_name": "t1",
      "access_type": "ref",
      "possible_keys": ["PRIMARY"],
      "key": "PRIMARY",
      "key_length": "4",
      "used_key_parts": ["p"],
      "ref": ["const"],
      "r_loops": 1,
      "rows": 999,
      "r_rows": 1001,
      "r_total_time_ms": 0.6489,
      "filtered": 100,
      "r_filtered": 100,
      "using_index": true
    }
  }
}
analyze format=json select MIN(a) from t1 where p = 2;
ANALYZE
{
  "query_block": {
    "select_id": 1,
    "table": {
      "message": "Select tables optimized away"
    }
  }
}
analyze format=json select MIN(a) from t1  group by p;
ANALYZE
{
  "query_block": {
    "select_id": 1,
    "r_loops": 1,
    "r_total_time_ms": 0.0393,
    "table": {
      "table_name": "t1",
      "access_type": "range",
      "key": "PRIMARY",
      "key_length": "4",
      "used_key_parts": ["p"],
      "r_loops": 1,
      "rows": 2,
      "r_rows": 1,
      "r_total_time_ms": 0.0133,
      "filtered": 100,
      "r_filtered": 100,
      "using_index_for_group_by": true
    }
  }
}

Comment by Varun Gupta (Inactive) [ 2020-12-08 ]

MariaDB [test]> EXPLAIN select MIN(a) from t1 where p = 2 group by p;
+------+-------------+-------+------+---------------+---------+---------+-------+------+-------------+
| id   | select_type | table | type | possible_keys | key     | key_len | ref   | rows | Extra       |
+------+-------------+-------+------+---------------+---------+---------+-------+------+-------------+
|    1 | SIMPLE      | t1    | ref  | PRIMARY       | PRIMARY | 4       | const | 1000 | Using index |
+------+-------------+-------+------+---------------+---------+---------+-------+------+-------------+
1 row in set (0.003 sec)
 
MariaDB [test]> EXPLAIN select MIN(a) from t1 where p = 2;
+------+-------------+-------+------+---------------+------+---------+------+------+------------------------------+
| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra                        |
+------+-------------+-------+------+---------------+------+---------+------+------+------------------------------+
|    1 | SIMPLE      | NULL  | NULL | NULL          | NULL | NULL    | NULL | NULL | Select tables optimized away |
+------+-------------+-------+------+---------------+------+---------+------+------+------------------------------+
1 row in set (0.002 sec)
 
MariaDB [test]> EXPLAIN select MIN(a) from t1 group by p;
+------+-------------+-------+-------+---------------+---------+---------+------+------+--------------------------+
| id   | select_type | table | type  | possible_keys | key     | key_len | ref  | rows | Extra                    |
+------+-------------+-------+-------+---------------+---------+---------+------+------+--------------------------+
|    1 | SIMPLE      | t1    | range | NULL          | PRIMARY | 4       | NULL | 10   | Using index for group-by |
+------+-------------+-------+-------+---------------+---------+---------+------+------+--------------------------+
1 row in set (0.003 sec)

Comment by Sergei Petrunia [ 2020-12-09 ]

Takeaways from yesterday discussion:

The query (both variants, with and without the GROUP BY) can be resolved by doing a single index lookup:

h->index_read_map(key={p=2}, flag=HA_READ_KEY_OR_NEXT)

This will read the row that has p=2 and the minimum value of a.

The query without the GROUP BY is resolved by the opt_sum_query code.
opt_sum_query looks for cases where a grouping function can be resolved by doing just one lookup. It can only handle implicit grouping. It can't handle "WHERE col=const GROUP BY const".

The second opportunity to do efficient handling is group-min-max optimization.
The range optimizer actually does pick the group-min-max as the "best quick select" for the query.
However, the join then discards it in best_access_path in favor of ref access.
It does so, because it uses a heuristic that makes an invalid assumption:

ref access over N keyparts is better than a quick select over the same index that uses the same # key parts.

this is true for range accesses, but not necessarily true for group-min-max quick select.

Comment by Sergei Petrunia [ 2020-12-09 ]

Possible ways to fix:

  • Do not apply the heuristic for group-min-max quick selects.
  • Make opt_sum_query code handle the case where there is just one group (because of WHERE col=... GROUP BY col), and so some aggregate functions can be resolved by doing a single index lookup.
Comment by Varun Gupta (Inactive) [ 2020-12-09 ]

Patch
http://lists.askmonty.org/pipermail/commits/2020-December/014399.html

Comment by Sergei Petrunia [ 2020-12-10 ]

Review input: http://lists.askmonty.org/pipermail/commits/2020-December/014403.html

Ok to push after addressed.

Generated at Thu Feb 08 09:29:21 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.