[MDEV-10146] Wrong result (or questionable result and behavior) with aggregate function in uncorrelated SELECT subquery Created: 2016-05-27  Updated: 2017-10-02  Resolved: 2017-07-05

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 5.5, 10.0, 10.1, 10.2
Fix Version/s: 10.1.26, 5.5.57, 10.0.32, 10.2.7

Type: Bug Priority: Minor
Reporter: Elena Stepanova Assignee: Oleksandr Byelkin
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Relates
relates to MDEV-7857 Assertion `info->last_key.keyinfo == ... Closed
Sprint: 10.2.7-1

 Description   

Please consider the scenario below. I don't know whether my expectations are even correct, because MySQL, Oracle and PostgreSQL all behave differently in this case, and none does exactly what I expect.

Output from the CLI

MariaDB [test]> INSERT INTO t2 VALUES (3);
Query OK, 1 row affected (0.00 sec)
 
MariaDB [test]> --echo # This looks wrong. t1 has 2 rows, so the result set should have 2 rows too,
MariaDB [test]> --echo # but it only has 1:
MariaDB [test]> SELECT ( SELECT MAX(f1) FROM t2 ) FROM t1;
+----------------------------+
| ( SELECT MAX(f1) FROM t2 ) |
+----------------------------+
|                          2 |
+----------------------------+
1 row in set (0.00 sec)
 
MariaDB [test]> 
MariaDB [test]> --echo # This also looks wrong. Now there are two rows in the result set,
MariaDB [test]> --echo # but they are different, which should not happen, as we select MAX:
MariaDB [test]> SELECT ( SELECT MAX(f1) FROM t2 ) FROM v1;
+----------------------------+
| ( SELECT MAX(f1) FROM t2 ) |
+----------------------------+
|                          1 |
|                          2 |
+----------------------------+
2 rows in set (0.00 sec)
 
MariaDB [test]> 
MariaDB [test]> INSERT INTO t2 VALUES (4);
Query OK, 1 row affected (0.01 sec)
 
MariaDB [test]> 
MariaDB [test]> --echo # This shouldn't even work, because the subquery returns more than 1 row,
MariaDB [test]> --echo # but it works and returns wrong result, same as before:
MariaDB [test]> SELECT ( SELECT MAX(f1) FROM t2 ) FROM v1;
+----------------------------+
| ( SELECT MAX(f1) FROM t2 ) |
+----------------------------+
|                          1 |
|                          2 |
+----------------------------+
2 rows in set (0.00 sec)
 
MariaDB [test]> 
MariaDB [test]> --echo # This returns the error as expected:
MariaDB [test]> SELECT ( SELECT MAX(f1) FROM t2 ) FROM t1;
ERROR 1242 (21000): Subquery returns more than 1 row

Clean test case (for copy-paste)

--disable_abort_on_error
 
CREATE TABLE t1 (f1 INT);
CREATE VIEW v1 AS SELECT * FROM t1;
INSERT INTO t1 VALUES (1),(2);
 
CREATE TABLE t2 (f2 int);
 
INSERT INTO t2 VALUES (3);
--echo # This looks wrong. t1 has 2 rows, so the result set should have 2 rows too,
--echo # but it only has 1:
SELECT ( SELECT MAX(f1) FROM t2 ) FROM t1;
 
--echo # This also looks wrong. Now there are two rows in the result set,
--echo # but they are different, which should not happen, as we select MAX:
SELECT ( SELECT MAX(f1) FROM t2 ) FROM v1;
 
INSERT INTO t2 VALUES (4);
 
--echo # This shouldn't even work, because the subquery returns more than 1 row,
--echo # but it works and returns wrong result, same as before:
SELECT ( SELECT MAX(f1) FROM t2 ) FROM v1;
 
--echo # This returns the error as expected:
SELECT ( SELECT MAX(f1) FROM t2 ) FROM t1;



 Comments   
Comment by Oleksandr Byelkin [ 2017-06-22 ]

subquery ( SELECT MAX(f1) FROM t2 ) with no GROUP BY should return only one row always

Comment by Oleksandr Byelkin [ 2017-06-22 ]

The problem do not depend on MAX specific, the same is with SUM for example.

Comment by Oleksandr Byelkin [ 2017-06-22 ]

EXPLAIN EXTENDED explain why the result is correct:
explain extended
SELECT ( SELECT SUM(f1) FROM t2 ) FROM t1;
id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY t1 ALL NULL NULL NULL NULL 2 100.00
2 DEPENDENT SUBQUERY t2 system NULL NULL NULL NULL 1100.00
Warnings:
Note 1276 Field or reference 'test.t1.f1' of SELECT #2 was resolved in SELECT #1
Note 1981 Aggregate function 'sum()' of SELECT #2 belongs to SELECT #1
Note 1003 select <expr_cache><`test`.`t1`.`f1`,sum(`test`.`t1`.`f1`),sum(`test`.`t1`.`f1`)>((select sum(`test`.`t1`.`f1`) from dual)) AS `( SELECT SUM(f1) FROM t2 )` from `test`.`t1`

Comment by Oleksandr Byelkin [ 2017-06-22 ]

The problem is that even explicit GROUP BY ignored:
SELECT ( SELECT SUM(f1) FROM t2 GROUP BY "constant" ) FROM t1;
( SELECT SUM(f1) FROM t2 GROUP BY "constant" )
3

(probably also checks JOIN::group_list but not JOIN::simple_grouping)

Comment by Oleksandr Byelkin [ 2017-06-22 ]

Result of discussing with Igor:
first is correct.
Everythoing else incorrect:
2 - with view it also should move aggregation in the top SELECT
3 /4 - if aggregation moved in top SELECT then inner one should return 2 records (error)
my example - explicit GROUP BY should prevent moving aggregation up.

Comment by Igor Babaev [ 2017-06-22 ]

Some observations what postgresql returns for these queries:

SELECT ( SELECT MAX(f1) FROM t2 ) FROM t1;
max
2
SELECT ( SELECT MAX(f1) FROM t2 ) FROM v1
max
2

So, here postgresql (and most probably the Standard) agrees with Sanja.

Another observation:

MariaDB [test]> SELECT * FROM t2 GROUP BY f2 having 3 in ( SELECT sum(t2.f2) FROM t1 );
 
MariaDB [test]> SELECT * FROM t2 GROUP BY f2 having 2 in ( SELECT sum(1) FROM t1 );
+------+
| f2   |
+------+
|    3 |
+------+

The first query returns a wrong result set, while the second works fine.
Postgresql agrees with me:

SELECT * FROM t2 GROUP BY f2 having 3 in ( SELECT sum(t2.f2) FROM t1 );
f2
3
SELECT * FROM t2 GROUP BY f2 having 2 in ( SELECT sum(1) FROM t1 );
f2
3

So, probably this is another bug.

Comment by Oleksandr Byelkin [ 2017-06-23 ]

error with not returning 2 records connected to not moving subselect in top subselect

Comment by Oleksandr Byelkin [ 2017-06-26 ]

Bug with view resolved: outer field view resolving did not set max_arg_level.

diff --git a/sql/item.cc b/sql/item.cc
index 61a85f6d487..6eeabdd0bc7 100644
--- a/sql/item.cc
+++ b/sql/item.cc
@@ -5237,6 +5237,13 @@ Item_field::fix_outer_field(THD *thd, Field **from_field, Item **reference)
                             ((ref_type == REF_ITEM || ref_type == FIELD_ITEM) ?
                              (Item_ident*) (*reference) :
                              0));
+          if (thd->lex->in_sum_func &&
+              thd->lex->in_sum_func->nest_level >= select->nest_level)
+          {
+            Item::Type ref_type= (*reference)->type();
+            set_if_bigger(thd->lex->in_sum_func->max_arg_level,
+                          select->nest_level);
+          }
           /*
             A reference to a view field had been found and we
             substituted it instead of this Item (find_field_in_tables

Comment by Oleksandr Byelkin [ 2017-06-26 ]

revision-id: 7b5a04c14fb93fa6ebeaad4b40f028bf547bce14 (mariadb-10.2.6-64-g7b5a04c14fb)
parent(s): 0288fa619f4cf95dd4725f16c11804cbd2b3bbab
committer: Oleksandr Byelkin
timestamp: 2017-06-26 11:57:55 +0200
message:

MDEV-10146: Wrong result (or questionable result and behavior) with aggregate function in uncorrelated SELECT subquery

When outer reference resolved in a VIEW it still should mark aggregate function resolving border.

Comment by Igor Babaev [ 2017-07-04 ]

Sanja,

Ok to push the patch into 5.5.

Comment by Oleksandr Byelkin [ 2017-07-05 ]

2 them who will merge: there will be problem with subselect*.test. Solution is in commit above (for 10.2 not for 5.5).

Comment by Sergei Golubchik [ 2017-07-06 ]

"The commit above" is http://lists.askmonty.org/pipermail/commits/2017-June/011271.html

Generated at Thu Feb 08 07:40:01 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.