[MDEV-4595] GROUP BY in subqueries removed too aggressively Created: 2013-05-28  Updated: 2022-09-08

Status: Open
Project: MariaDB Server
Component/s: None
Affects Version/s: 10.0.2, 5.5.31, 5.3.12
Fix Version/s: 5.5

Type: Bug Priority: Minor
Reporter: Sergei Golubchik Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None


 Description   

Since 5.3 MariaDB supports this optimization:

This optimization, although correct in strict SQL standard sense, does not always work when the subquery uses non-aggregated columns in the SELECT clause. This is non-standard MariaDB extension, but as long as it is supported, the optimizer should take it into account.

MariaDB [test]> create table t1 (a int, b int);
Query OK, 0 rows affected (0.00 sec)
 
MariaDB [test]> insert t1 values (1,1),(2,1);
Query OK, 2 rows affected (0.00 sec)
Records: 2  Duplicates: 0  Warnings: 0
 
MariaDB [test]> select a from t1 group by b;
+------+
| a    |
+------+
|    1 |
+------+
1 row in set (0.01 sec)
 
MariaDB [test]> select count(*) from t1 where a in (select a from t1);
+----------+
| count(*) |
+----------+
|        2 |
+----------+
1 row in set (0.00 sec)
 
MariaDB [test]> select count(*) from t1 where a in (select a from t1 group by b);
+----------+
| count(*) |
+----------+
|        2 |   <-- this is wrong!
+----------+
1 row in set (0.00 sec)
 
MariaDB [test]> select count(*) from t1 where a in (select min(a) from t1 group by b);
+----------+
| count(*) |
+----------+
|        1 |
+----------+
1 row in set (0.00 sec)

As a possible fix, the optimizer could disable this optimization when non-aggregated columns are selected. It already performs all necessary checks for the ONLY_FULL_GROUP_BY sql mode.



 Comments   
Comment by Sergei Petrunia [ 2013-05-28 ]

The manual says http://dev.mysql.com/doc/refman/5.6/en/group-by-extensions.html "The server is free to choose any value from each group".

Let's remember that and look at the query and dataset from this bugreport:

select a from t1 group by b

a=1 is a valid result. a=2 is also a valid result.

Then, lets consider a query:

select count from t1 where a in (select a from t1);

t1 has two records:
R1. record with a=1
R2. record with a=2

for record R1, "a in (select a from t1)" can be satisfied.
for record R2, "a in (select a from t1)" can be satisfied.

hence, "select count ... " may produce "2". It may also produce "1", or "0" (If this seems odd to you, consider a query which has "LIMIT 1" but no ORDER BY or GROUP BY. It can return different results, and all of them will be valid)

Comment by Sergei Petrunia [ 2013-05-28 ]

Maybe, you've meant to say that

"The server is free to choose any value from each group".

should be read as

"The server is free to choose any value from each group. The choice must be made once and then remain fixed for the duration of the query"

?

Comment by Sergei Golubchik [ 2013-05-28 ]

In a sense, yes. This query

select a from t1 group by b

can return either 1 or 2.

Thus,

select a from t1 where a in (select a from t1 group by b)

will be one of

select a from t1 where a in (1)
select a from t1 where a in (2)

how many times the optimizer will reevaluate the subquery is its internal implementation detail. It can do that as many times as it wants, but without breaking the semantics of the query.

Generated at Thu Feb 08 06:57:37 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.