[MDEV-16579] Wrong result of query using DISTINCT COUNT(*) OVER (*) Created: 2018-06-25  Updated: 2019-12-24  Resolved: 2019-12-17

Status: Closed
Project: MariaDB Server
Component/s: Optimizer - Window functions
Affects Version/s: 10.2, 10.3, 10.4
Fix Version/s: 10.2.31, 10.3.22, 10.4.12, 10.5.1

Type: Bug Priority: Major
Reporter: Alice Sherepa Assignee: Varun Gupta (Inactive)
Resolution: Fixed Votes: 0
Labels: None


 Description   

CREATE TABLE tt (i int) ;
INSERT INTO `tt` VALUES (1),(0),(1),(2),(0),(NULL),(1),(2),(NULL),(1),(2);
 
SELECT  DISTINCT COUNT(*) OVER () FROM tt GROUP BY i ;
SELECT  DISTINCT COUNT(*) OVER (), MOD(MIN(i),2) FROM tt GROUP BY i ;

MariaDB [test]> SELECT  DISTINCT COUNT(*) OVER () FROM tt GROUP BY i ;
+------------------+
| COUNT(*) OVER () |
+------------------+
|                4 |
+------------------+
1 row in set (0.001 sec)
 
MariaDB [test]> SELECT  DISTINCT COUNT(*) OVER (), MOD(MIN(i),2) FROM tt GROUP BY i ;
+------------------+---------------+
| COUNT(*) OVER () | MOD(MIN(i),2) |
+------------------+---------------+
|                0 |          NULL |
|                0 |             0 |
|                0 |             1 |
+------------------+---------------+
3 rows in set (0.001 sec)

in Mysql 8.0.11

mysql> SELECT  DISTINCT COUNT(*) OVER (), MOD(MIN(i),2) FROM tt GROUP BY i ;
+------------------+---------------+
| COUNT(*) OVER () | MOD(MIN(i),2) |
+------------------+---------------+
|                4 |             1 |
|                4 |             0 |
|                4 |          NULL |
+------------------+---------------+
3 rows in set (0.00 sec)



 Comments   
Comment by Varun Gupta (Inactive) [ 2019-12-05 ]

The query is:  SELECT  DISTINCT COUNT(*) OVER (), MOD(MIN(i),2) FROM tt GROUP BY i ;
 
This query requires two temp tables for execution, lets say these are tmp_table_1 and tmp_table_2
 
tmp_table_1 fields =>  i, MIN(i), COUNT(*) OVER ()
tmp_table_2 fields =>  i, MIN(i), COUNT(*) OVER (), MOD(MIN(i),2)

So the execution is like
 
1)Read data from table t1
     a) Perform grouping on the fly
     b) update MIN(i)
     c) write the new value to temporary table tmp_table1
2) Read the records from temp table tmp_table1
3) Evaluate MOD(MIN(i),2) and write the new record in the tmp_table2
4) After all records are written, then compute the window function COUNT(*) OVER () on tmp_table2
5) Perform Distinct on tmp_table2

Comment by Varun Gupta (Inactive) [ 2019-12-05 ]

(gdb) p (join_tab-2)->table
$17 = (TABLE *) 0x61f00002f4f0
(gdb) p (join_tab-2)->table->alias.Ptr
$18 = 0x60c0000170f0 "t1"
 
(gdb) p (join_tab-1)->table
$19 = (TABLE *) 0x62200000f190
(gdb) p (join_tab-1)->table->alias.Ptr
$20 = 0x555557f2df60 ""
 
(gdb) p (join_tab)->table
$21 = (TABLE *) 0x622000010990
(gdb) p (join_tab)->table->alias.Ptr
$22 = 0x555557f2df60 ""
 
 
(gdb) p item_win->result_field
$23 = (Field *) 0x6190000417b0
(gdb) p item_win->result_field->table
$24 = (TABLE *) 0x62200000f190
(gdb) p dbug_print_item(item_win)
$26 = 0x555559464800 <dbug_item_print_buf> "count(0) over ()"

Comment by Varun Gupta (Inactive) [ 2019-12-05 ]

Patch
http://lists.askmonty.org/pipermail/commits/2019-December/014085.html

Comment by Sergei Petrunia [ 2019-12-12 ]

Followup to the optimizer call discussion: I wasn't able to find any evidence that SELECT DISTINCT window_function()" is not allowed by the SQL Standard.

I was able to find this evidence that others expect it to be allowed:

http://www.mysqltutorial.org/mysql-window-functions/ says
"Note that window functions are performed on the result set after all JOIN, WHERE, GROUP BY, and HAVING clauses and before the ORDER BY, LIMIT and SELECT DISTINCT."

https://blog.jooq.org/2013/10/09/sql-trick-row_number-is-to-select-what-dense_rank-is-to-select-distinct/ uses

SELECT DISTINCT
  v1,
  v2,
  v3,
  DENSE_RANK() OVER (window) row_number
FROM t
WINDOW window AS (ORDER BY v1, v2, v3)

Comment by Sergei Petrunia [ 2019-12-12 ]

SQL Standard or not, Sybase Manual here http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.help.sqlanywhere.12.0.1/dbusage/ug-olap-s-3858836.html has examples like

SELECT DISTINCT ID,
FIRST_VALUE( ProductID ) OVER ( PARTITION BY ID ORDER BY LineID )
FROM SalesOrderItems
ORDER BY ID;

Comment by Igor Babaev [ 2019-12-12 ]

I can confirm that SQL Standard says:
DISTINCT as a <set quantifier> is valid before any valid <select list>

Comment by Sergei Petrunia [ 2019-12-17 ]

The patch is ok to push.

Generated at Thu Feb 08 08:29:58 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.