Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-20057

Distinct SUM on CROSS JOIN and grouped returns wrong result

Details

    Description

      This sql query returns a wrong result:

      CREATE SCHEMA A;
       
      CREATE TABLE A.B (C INTEGER, D INTEGER);
       
      INSERT INTO A.B VALUES (5, 1), (0, 3);
       
      SELECT DISTINCT SUM(DISTINCT 1), SUM( B.D) > 2 FROM (A.B AS E CROSS JOIN A.B) GROUP BY B.C
      

      Postgres 11.3 returns (1, 0) and (1,1), MariaDB in Version 10.4.6 and MySQL in Version 8.0.16 both only return (1, 1). Manually verified, the Postgres result is the correct one (group by results into two rows, sum > 2 is one for one group and false for the other. Therefore, I expect that MariaDB should also return 1,1 and 1,0.

      Removing one DISTINCT, one of the columns or > 2 removes the issue. I reduced the testcase as much as possible, therefore I replaced columns with constant if possible. The program now looks a bit constructed, but with columns, the bug also occurs.

      Attachments

        Issue Links

          Activity

            alice Alice Sherepa added a comment - - edited

            Thanks for the report! Reproducible on 5.5-10.4:

            create table t1 (c int, d int);
            insert into t1 values (5, 1), (0, 3);
             
            select distinct sum(distinct 1), sum(t1.d) > 2 , t1.c from (t1 e join t1) group by t1.c;
            select distinct sum(distinct 1), sum(t1.d) > 2  from (t1 e join t1) group by t1.c;
            

            MariaDB [test]> create table t1 (c int, d int);
            Query OK, 0 rows affected (0.024 sec)
             
            MariaDB [test]> insert into t1 values (5, 1), (0, 3);
            Query OK, 2 rows affected (0.001 sec)
            Records: 2  Duplicates: 0  Warnings: 0
             
            MariaDB [test]> select distinct sum(distinct 1), sum(t1.d) > 2 , t1.c
                -> from (t1 e join t1) group by t1.c;
            +-----------------+---------------+------+
            | sum(distinct 1) | sum(t1.d) > 2 | c    |
            +-----------------+---------------+------+
            |               1 |             1 |    0 |
            |               1 |             0 |    5 |
            +-----------------+---------------+------+
            2 rows in set (0.001 sec)
             
            MariaDB [test]> select distinct sum(distinct 1), sum(t1.d) > 2 
                -> from (t1 e join t1) group by t1.c;
            +-----------------+---------------+
            | sum(distinct 1) | sum(t1.d) > 2 |
            +-----------------+---------------+
            |               1 |             1 |
            +-----------------+---------------+
            1 row in set (0.001 sec)
            ## Should be (1,1), (1,0)
             
            MariaDB [test]> explain extended select distinct sum(distinct 1), sum(t1.d) > 2  from (t1 e join t1) group by t1.c;
            +------+-------------+-------+------+---------------+------+---------+------+------+----------+------------------------------------+
            | id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                              |
            +------+-------------+-------+------+---------------+------+---------+------+------+----------+------------------------------------+
            |    1 | SIMPLE      | e     | ALL  | NULL          | NULL | NULL    | NULL |    2 |   100.00 | Using temporary; Using filesort    |
            |    1 | SIMPLE      | t1    | ALL  | NULL          | NULL | NULL    | NULL |    2 |   100.00 | Using join buffer (flat, BNL join) |
            +------+-------------+-------+------+---------------+------+---------+------+------+----------+------------------------------------+
            2 rows in set, 1 warning (0.001 sec)
             
            Note (Code 1003): select distinct sum(distinct 1) AS `sum(distinct 1)`,sum(`test`.`t1`.`d`) > 2 AS `sum(t1.d) > 2` from `test`.`t1` `e` join `test`.`t1` group by `test`.`t1`.`c`
            
            

            alice Alice Sherepa added a comment - - edited Thanks for the report! Reproducible on 5.5-10.4: create table t1 (c int , d int ); insert into t1 values (5, 1), (0, 3);   select distinct sum ( distinct 1), sum (t1.d) > 2 , t1.c from (t1 e join t1) group by t1.c; select distinct sum ( distinct 1), sum (t1.d) > 2 from (t1 e join t1) group by t1.c; MariaDB [test]> create table t1 (c int, d int); Query OK, 0 rows affected (0.024 sec)   MariaDB [test]> insert into t1 values (5, 1), (0, 3); Query OK, 2 rows affected (0.001 sec) Records: 2 Duplicates: 0 Warnings: 0   MariaDB [test]> select distinct sum(distinct 1), sum(t1.d) > 2 , t1.c -> from (t1 e join t1) group by t1.c; +-----------------+---------------+------+ | sum(distinct 1) | sum(t1.d) > 2 | c | +-----------------+---------------+------+ | 1 | 1 | 0 | | 1 | 0 | 5 | +-----------------+---------------+------+ 2 rows in set (0.001 sec)   MariaDB [test]> select distinct sum(distinct 1), sum(t1.d) > 2 -> from (t1 e join t1) group by t1.c; +-----------------+---------------+ | sum(distinct 1) | sum(t1.d) > 2 | +-----------------+---------------+ | 1 | 1 | +-----------------+---------------+ 1 row in set (0.001 sec) ## Should be (1,1), (1,0)   MariaDB [test]> explain extended select distinct sum(distinct 1), sum(t1.d) > 2 from (t1 e join t1) group by t1.c; +------+-------------+-------+------+---------------+------+---------+------+------+----------+------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +------+-------------+-------+------+---------------+------+---------+------+------+----------+------------------------------------+ | 1 | SIMPLE | e | ALL | NULL | NULL | NULL | NULL | 2 | 100.00 | Using temporary; Using filesort | | 1 | SIMPLE | t1 | ALL | NULL | NULL | NULL | NULL | 2 | 100.00 | Using join buffer (flat, BNL join) | +------+-------------+-------+------+---------------+------+---------+------+------+----------+------------------------------------+ 2 rows in set, 1 warning (0.001 sec)   Note (Code 1003): select distinct sum(distinct 1) AS `sum(distinct 1)`,sum(`test`.`t1`.`d`) > 2 AS `sum(t1.d) > 2` from `test`.`t1` `e` join `test`.`t1` group by `test`.`t1`.`c`

            This test case exposed a serious bug in MariaDB distinct handling.

            What was not handled was SELECT DISTINCT on expressions with sum functions.
            In other words:
            SELECT DISTINCT sum(a),sum(b),avg(c) ... works.
            SELECT DISTINCT sum(a),sum(b) > 2,sum(c)+sum(d) would not work.
            The later one would do only apply distinct on the sum(a) part and ignore the others.

            Will be fixed by extending remove_dup_with_hash_index() and remove_dup_with_compare() to take into account the columns in the result list and not only work on the fields in the internal temporary table (which only stores the values of the individual sum() operations).

            Note that in many cases the above dup removal functions are not used as the optimizer may be able to either remove duplicates early or it will discover that duplicate remove is not needed, for example if the group by fields is part of the result.
            This was the case if one removed distinct, or sum(distinct) or added t1.d to the result set in the above queries.

            monty Michael Widenius added a comment - This test case exposed a serious bug in MariaDB distinct handling. What was not handled was SELECT DISTINCT on expressions with sum functions. In other words: SELECT DISTINCT sum(a),sum(b),avg(c) ... works. SELECT DISTINCT sum(a),sum(b) > 2,sum(c)+sum(d) would not work. The later one would do only apply distinct on the sum(a) part and ignore the others. Will be fixed by extending remove_dup_with_hash_index() and remove_dup_with_compare() to take into account the columns in the result list and not only work on the fields in the internal temporary table (which only stores the values of the individual sum() operations). Note that in many cases the above dup removal functions are not used as the optimizer may be able to either remove duplicates early or it will discover that duplicate remove is not needed, for example if the group by fields is part of the result. This was the case if one removed distinct, or sum(distinct) or added t1.d to the result set in the above queries.

            Waiting for review by Sergei Petrunia

            monty Michael Widenius added a comment - Waiting for review by Sergei Petrunia

            Pushed to 10.4

            monty Michael Widenius added a comment - Pushed to 10.4

            People

              monty Michael Widenius
              Alicen -
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.