[MDEV-9445] Wrong result for a grouping query with DISTINCT Created: 2016-01-22  Updated: 2023-11-28

Status: Confirmed
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 5.5, 10.0, 10.1, 10.2, 10.3, 10.4, 10.5, 10.6, 10.7, 10.8, 10.9
Fix Version/s: 10.4, 10.5, 10.6

Type: Bug Priority: Major
Reporter: Igor Babaev Assignee: Sergei Petrunia
Resolution: Unresolved Votes: 0
Labels: 11.0-sel


 Description   

When working on mdev-8646 I discovered that in in mariadb-10.0/10.1.
the query

SELECT DISTINCT (COUNT(*) + 1) AS c FROM t1 GROUP BY a

always returns only 1 row.
This can be demonstrated with the following test case:

create table t1 (a int, b int);
insert into t1 values (1,3), (2,4), (1,5), (1,3), (2,1), (1,5), (1,7), (3,1), (3,2), (3,1), (2,4);
SELECT (COUNT(*) + 1) AS c FROM t1 GROUP BY a;
SELECT DISTINCT (COUNT(*) + 1) AS c FROM t1 GROUP BY a;

The first query from the test case returns the correct result set:

MariaDB [test]> SELECT (COUNT(*) + 1) AS c FROM t1 GROUP BY a;
+---+
| c |
+---+
| 6 |
| 4 |
| 4 |
+---+

The second query returns only 1 row:

MariaDB [test]> SELECT DISTINCT (COUNT(*) + 1) AS c FROM t1 GROUP BY a;
+---+
| c |
+---+
| 6 |
+---+

Interesting that the query

 
SELECT DISTINCT (COUNT(distinct b) + 1) AS c FROM t1 GROUP BY a;

returns the correct result set:

MariaDB [test]> SELECT DISTINCT (COUNT(distinct b) + 1) AS c FROM t1 GROUP BY a;
+---+
| c |
+---+
| 4 |
| 3 |
+---+

I did not check the earlier versions of mariadb.



 Comments   
Comment by Igor Babaev [ 2016-01-22 ]

MySQL 5.7/5.6 returns the result sets that differ.
This one is correct:

mysql> SELECT DISTINCT (COUNT(*) + 1) AS c FROM t1 GROUP BY a;
+---+
| c |
+---+
| 6 |
| 4 |
+---+

This one is wrong:

mysql> SELECT DISTINCT (COUNT(distinct b) + 1) AS c FROM t1 GROUP by a;
+---+
| c |
+---+
| 0 |
+---+

Comment by Sergei Petrunia [ 2016-01-27 ]

<igor3> no, the problem is only with count(*)
<igor3> and the regression is due to:
<igor3>     /*
<igor3>       MariaDB: don't run the following {
<igor3>       
<igor3>       used_tables_cache&= PSEUDO_TABLE_BITS;
<igor3>       // the aggregate function is aggregated into its local context
<igor3>       used_tables_cache|= ((table_map)1 << aggr_sel->join->tables) - 1;
<igor3>       
<igor3>       } because if we do it, table elimination will assume that
<igor3>         - constructs like "COUNT(*)" use columns from all tables
<igor3>         - so, it is not possible to eliminate any table
<igor3>       our solution for COUNT(*) is that it has
<igor3>         item->used_tables() == 0 && !item->const_item()
<igor3>     */

Comment by Sergei Petrunia [ 2016-01-27 ]

<igor3> the question for me is:
<igor3> how to catch items with local COUNT(*) here:
<igor3>         if ((item->real_type() == Item::SUBSELECT_ITEM) ||
<igor3>             (item->used_tables() & ~OUTER_REF_TABLE_BIT))
<igor3>         {
<igor3>    /*
<igor3>      Mark that the we have ignored an item that refers to a summary
<igor3>      function. We need to know this if someone is going to use
<igor3>      DISTINCT on the result.
<igor3>    */
<igor3>    param->using_indirect_summary_function=1;
<igor3>    continue;
<igor3>         }
<igor3>  

Generated at Thu Feb 08 07:34:43 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.