[MDEV-30605] Wrong result while using index for group-by Created: 2023-02-07  Updated: 2023-04-25  Resolved: 2023-04-18

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 10.5, 10.6, 10.7, 10.8, 10.9, 10.10, 10.11, 11.0
Fix Version/s: 11.1.1, 10.11.3, 11.0.2, 10.5.20, 10.6.13, 10.8.8, 10.9.6, 10.10.4

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Sergei Petrunia
Resolution: Fixed Votes: 0
Labels: 11.0-sel

Issue Links:
Problem/Incident
is caused by MDEV-23634 Select query hanged the server and le... Closed

 Description   

--source include/have_innodb.inc
 
create table t (pk int primary key, a int, key(a)) engine=InnoDB;
insert into t values (1,NULL),(2,8),(3,5),(4,NULL),(5,10);
 
SELECT MIN(pk), a FROM t WHERE pk <> 1 GROUP BY a;
 
# Cleanup
drop table t;

10.5 acd23da4

--source include/have_innodb.inc
 
create table t (pk int primary key, a int, key(a)) engine=InnoDB;
insert into t values (1,NULL),(2,8),(3,5),(4,NULL),(5,10);
 
SELECT MIN(pk), a FROM t WHERE pk <> 1 GROUP BY a;
 
# Cleanup
drop table t;

10.5 acd23da4

SELECT MIN(pk), a FROM t WHERE pk <> 1 GROUP BY a;
MIN(pk)	a
3	5
2	8
5	10
EXPLAIN EXTENDED SELECT MIN(pk), a FROM t WHERE pk <> 1 GROUP BY a;
id	select_type	table	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	SIMPLE	t	range	PRIMARY	a	5	NULL	6	100.00	Using where; Using index for group-by
Warnings:
Note	1003	select min(`test`.`t`.`pk`) AS `MIN(pk)`,`test`.`t`.`a` AS `a` from `test`.`t` where `test`.`t`.`pk` <> 1 group by `test`.`t`.`a`

A row for (4,NULL) is missing.

10.4 doesn't use index for group-by, and the result there is correct.



 Comments   
Comment by Rex Johnston [ 2023-02-20 ]

The issue is in how aggregate rows are read in and filtered.

Thread 14 "mariadbd" hit Breakpoint 4, rr_quick (info=0x7fff9801a8a8) at /home/rex/src/mariadb/server.30605/sql/records.cc:403
403	  while ((tmp= info->select->quick->get_next()))
(gdb) n
408	  return tmp;
(gdb) p info->table
$63 = (TABLE *) 0x7fff98066f88
(gdb) p dbp(info->table)
$64 = 0x5555577e2300 <dbug_print_row_buff> "t(pk,a)=(1,NULL)"
(gdb) cont
Continuing.
 
Thread 14 "mariadbd" hit Breakpoint 4, rr_quick (info=0x7fff9801a8a8) at /home/rex/src/mariadb/server.30605/sql/records.cc:403
403	  while ((tmp= info->select->quick->get_next()))
(gdb) n
[New Thread 0x7fffba7fc700 (LWP 15823)]
408	  return tmp;
(gdb) p dbp(info->table)
$65 = 0x5555577e2300 <dbug_print_row_buff> "t(pk,a)=(3,5)"
(gdb) cont
Continuing.
 
Thread 14 "mariadbd" hit Breakpoint 4, rr_quick (info=0x7fff9801a8a8) at /home/rex/src/mariadb/server.30605/sql/records.cc:403
403	  while ((tmp= info->select->quick->get_next()))
(gdb) n
408	  return tmp;
(gdb) p dbp(info->table)
$66 = 0x5555577e2300 <dbug_print_row_buff> "t(pk,a)=(2,8)"
(gdb) cont
Continuing.
 
Thread 14 "mariadbd" hit Breakpoint 4, rr_quick (info=0x7fff9801a8a8) at /home/rex/src/mariadb/server.30605/sql/records.cc:403
403	  while ((tmp= info->select->quick->get_next()))
(gdb) n
408	  return tmp;
(gdb) p dbp(info->table)
$67 = 0x5555577e2300 <dbug_print_row_buff> "t(pk,a)=(5,10)"
(gdb) cont
Continuing.

It appears to skip past the rest of the group (a==NULL) without further examination.

Comment by Rex Johnston [ 2023-02-21 ]

drop table if exists t;
create table t (pk int primary key, a int, key(a)) engine=InnoDB;
insert into t values (1,-1),(2,8),(3,5),(4,-1),(5,10), (6,-1);
 
SELECT MIN(pk), a FROM t WHERE pk <> 1 GROUP BY a;
 
# Cleanup
drop table t;

produces

MIN(pk)	a
3	5
2	8
5	10

It would appear that the select condition (pk <> 1) is being incorrectly passed into into make_join_select() and eliminating the whole group during the join.

Comment by Rex Johnston [ 2023-02-22 ]

QUICK_GROUP_MIN_MAX_SELECT::next_prefix() calls ha_index_first() with a null search key. This can't be right.
Edit: not useful comment. Ignore.

Comment by Rex Johnston [ 2023-03-14 ]

Hi Sergei, this seems to do the trick and is a minimal change. Thanks, Rex

Comment by Sergei Petrunia [ 2023-03-27 ]

Writing down take-aways from an earlier optimizer call:

This bug is caused by this patch:

commit c03841ec0e2b7ac11711243d29821038b26e3edf
Author: Sergei Petrunia <psergey@askmonty.org>
Date:   Thu Apr 8 17:25:02 2021 +0300
 
    MDEV-23634: Select query hanged the server and leads to OOM ...
    
    Handle "col<>const" in the same way that MDEV-21958 did for
    "col NOT IN(const-list)": do not use the condition for range/index_merge
    accesses if there is a unique UNIQUE KEY(col).
    
    The testcase is in main/range.test. The rest of test updates are
    due to widespread use of 'pk<>1' in the testsuite. Changed the test
    to use different but equivalent forms of the conditions.

That patch makes the optimizer to not construct an interval for condition

pk <> 1 

then, QUICK_GROUP_MIN_MAX_SELECT code ignores this condition and returns a row with pk=1 which is filtered out when checking the WHERE condition.

Comment by Sergei Petrunia [ 2023-03-27 ]

The idea of QUICK_GROUP_MIN_MAX_SELECT is that it is able to use ranges to jump to the row that has the value of MIN() or MAX() that we're interested in.

For the query in this MDEV:

SELECT MIN(pk), a FROM t WHERE pk <> 1 GROUP BY a;

it uses the INDEX(a, pk) to:

  • locate the first value group of a on the first iteration (On the non-first iteration, locate the next value).
  • Within this value group, jump to the first value of pk. It looks for the first value because we're computing MIN(pk).

The second step requires that conditions on pk are converted into ranges that allow us to jump to the matching value.
QUICK_GROUP_MIN_MAX_SELECT code checks this as follows:

      SA3. The attribute C can be referenced in the WHERE clause only in
           predicates of the forms:
           - (C {< | <= | > | >= | =} const)
           - (const {< | <= | > | >= | =} C)
           - (C between const_i and const_j)
           - C IS NULL
           - C IS NOT NULL
           - C != const

...

      The function walks recursively over the cond tree representing a WHERE
      clause, and checks condition (SA3) - if a field is referenced by a MIN/MAX
      aggregate function, it is referenced only by one of the following
      predicates $FUNC$:
      {=, !=, <, <=, >, >=, between, is [not] null, multiple equal}.

This logic was fine but broke down when the fix for MDEV-23634 made "pk <>1" non-sargable.

Comment by Sergei Petrunia [ 2023-03-28 ]

https://github.com/MariaDB/server/commit/42aea1990d32cc31d9b2c0eabe7e58bcf15475ca .

oleg.smirnov could you review this?

Comment by Oleg Smirnov [ 2023-04-15 ]

After minor fix (group_min_max.result) it is ok to push.

Generated at Thu Feb 08 10:17:30 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.