[MDEV-20057] Distinct SUM on CROSS JOIN and grouped returns wrong result Created: 2019-07-12  Updated: 2023-07-26  Resolved: 2023-02-17

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 5.5, 10.0, 10.1, 10.4.6, 10.2, 10.3, 10.4, 10.5, 10.6, 10.7, 10.8, 10.9, 10.10
Fix Version/s: 10.11.3, 11.0.1, 10.4.29, 10.5.20, 10.6.13, 10.7.8, 10.8.8, 10.9.6, 10.10.4

Type: Bug Priority: Major
Reporter: - Assignee: Michael Widenius
Resolution: Fixed Votes: 0
Labels: 11.0-sel, optimizer, upstream
Environment:

Ubuntu 18.04, running MariaDB 10.4.6 in Docker


Issue Links:
Problem/Incident
causes MDEV-31743 Server crash in store_length, asserti... Closed

 Description   

This sql query returns a wrong result:

CREATE SCHEMA A;
 
CREATE TABLE A.B (C INTEGER, D INTEGER);
 
INSERT INTO A.B VALUES (5, 1), (0, 3);
 
SELECT DISTINCT SUM(DISTINCT 1), SUM( B.D) > 2 FROM (A.B AS E CROSS JOIN A.B) GROUP BY B.C

Postgres 11.3 returns (1, 0) and (1,1), MariaDB in Version 10.4.6 and MySQL in Version 8.0.16 both only return (1, 1). Manually verified, the Postgres result is the correct one (group by results into two rows, sum > 2 is one for one group and false for the other. Therefore, I expect that MariaDB should also return 1,1 and 1,0.

Removing one DISTINCT, one of the columns or > 2 removes the issue. I reduced the testcase as much as possible, therefore I replaced columns with constant if possible. The program now looks a bit constructed, but with columns, the bug also occurs.



 Comments   
Comment by Alice Sherepa [ 2019-10-10 ]

Thanks for the report! Reproducible on 5.5-10.4:

create table t1 (c int, d int);
insert into t1 values (5, 1), (0, 3);
 
select distinct sum(distinct 1), sum(t1.d) > 2 , t1.c from (t1 e join t1) group by t1.c;
select distinct sum(distinct 1), sum(t1.d) > 2  from (t1 e join t1) group by t1.c;

MariaDB [test]> create table t1 (c int, d int);
Query OK, 0 rows affected (0.024 sec)
 
MariaDB [test]> insert into t1 values (5, 1), (0, 3);
Query OK, 2 rows affected (0.001 sec)
Records: 2  Duplicates: 0  Warnings: 0
 
MariaDB [test]> select distinct sum(distinct 1), sum(t1.d) > 2 , t1.c
    -> from (t1 e join t1) group by t1.c;
+-----------------+---------------+------+
| sum(distinct 1) | sum(t1.d) > 2 | c    |
+-----------------+---------------+------+
|               1 |             1 |    0 |
|               1 |             0 |    5 |
+-----------------+---------------+------+
2 rows in set (0.001 sec)
 
MariaDB [test]> select distinct sum(distinct 1), sum(t1.d) > 2 
    -> from (t1 e join t1) group by t1.c;
+-----------------+---------------+
| sum(distinct 1) | sum(t1.d) > 2 |
+-----------------+---------------+
|               1 |             1 |
+-----------------+---------------+
1 row in set (0.001 sec)
## Should be (1,1), (1,0)
 
MariaDB [test]> explain extended select distinct sum(distinct 1), sum(t1.d) > 2  from (t1 e join t1) group by t1.c;
+------+-------------+-------+------+---------------+------+---------+------+------+----------+------------------------------------+
| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                              |
+------+-------------+-------+------+---------------+------+---------+------+------+----------+------------------------------------+
|    1 | SIMPLE      | e     | ALL  | NULL          | NULL | NULL    | NULL |    2 |   100.00 | Using temporary; Using filesort    |
|    1 | SIMPLE      | t1    | ALL  | NULL          | NULL | NULL    | NULL |    2 |   100.00 | Using join buffer (flat, BNL join) |
+------+-------------+-------+------+---------------+------+---------+------+------+----------+------------------------------------+
2 rows in set, 1 warning (0.001 sec)
 
Note (Code 1003): select distinct sum(distinct 1) AS `sum(distinct 1)`,sum(`test`.`t1`.`d`) > 2 AS `sum(t1.d) > 2` from `test`.`t1` `e` join `test`.`t1` group by `test`.`t1`.`c`

Comment by Michael Widenius [ 2023-02-15 ]

This test case exposed a serious bug in MariaDB distinct handling.

What was not handled was SELECT DISTINCT on expressions with sum functions.
In other words:
SELECT DISTINCT sum(a),sum(b),avg(c) ... works.
SELECT DISTINCT sum(a),sum(b) > 2,sum(c)+sum(d) would not work.
The later one would do only apply distinct on the sum(a) part and ignore the others.

Will be fixed by extending remove_dup_with_hash_index() and remove_dup_with_compare() to take into account the columns in the result list and not only work on the fields in the internal temporary table (which only stores the values of the individual sum() operations).

Note that in many cases the above dup removal functions are not used as the optimizer may be able to either remove duplicates early or it will discover that duplicate remove is not needed, for example if the group by fields is part of the result.
This was the case if one removed distinct, or sum(distinct) or added t1.d to the result set in the above queries.

Comment by Michael Widenius [ 2023-02-16 ]

Waiting for review by Sergei Petrunia

Comment by Michael Widenius [ 2023-02-17 ]

Pushed to 10.4

Generated at Thu Feb 08 08:56:24 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.