[MDEV-30766] Issue with NOT IN (subquery that violates "FULL GROUP BY") Created: 2023-03-01 Updated: 2023-03-03 Resolved: 2023-03-03 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Optimizer |
| Affects Version/s: | 10.3, 10.4, 10.5, 10.6, 10.7, 10.8, 10.9, 10.10, 10.11, 11.0 |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Critical |
| Reporter: | Pinhan Zhao | Assignee: | Sergei Golubchik |
| Resolution: | Not a Bug | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||
| Description |
|
With the following database:
After running Q1
the output is a table with one column and one row, and the only value is 0:
And we have Q2:
Q2 is basically a query of `SELECT SESSION_ID FROM PLAYBACK WHERE SESSION_ID NOT IN Q1` Therefore, we would expect it outputs one row with SESSION_ID of -1 . However, the output is an empty set. We can further investigate this by running Q3:
The output is:
We can see the result doesn't make sense, because -1 NOT IN (0) shouldn't be false. |
| Comments |
| Comment by Alice Sherepa [ 2023-03-02 ] | ||||||||||||||||||||||||
|
MariaDB does not have ANY_VALUE() function. Could you please add SHOW CREATE FUNCTION as it is in your database? | ||||||||||||||||||||||||
| Comment by Pinhan Zhao [ 2023-03-03 ] | ||||||||||||||||||||||||
|
My apologies – the ANY_VALUE functions shouldn't have been in the queries. I've included the revised complete example below:
And without the ANY_VALUE functions we're then able to reproduce this issue. Thanks. | ||||||||||||||||||||||||
| Comment by Sergei Golubchik [ 2023-03-03 ] | ||||||||||||||||||||||||
|
I tend to qualify it as a not a bug. For a query to have well-defined result, SESSION_ID must have a functional dependency on CUSTOMER_ID. In other words, if two rows have the same value of CUSTOMER_ID they must have the same value of SESSION_ID. The way the query is written now it's basically invalid and most RDBMS would throw an error and would not execute it. MariaDB assumes that the query is valid and have well-defined result, and it optimizes it under this assumption. If you run EXPLAIN EXTENDED you'll see what exactly the optimizer is doing — it omits GROUP BY in a subquery. One possible way of making this work would be implementing ANY_VALUE() function in MariaDB. This function would make the query valid again by basically telling the optimizer "the column SESSION_ID might have different values in the same group". And then optimizer would know not to perform optimizations that rely on all values being the same (like omitting GROUP BY). ANY_VALUE() will be implemented in MDEV-10426 |