Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-16579

Wrong result of query using DISTINCT COUNT(*) OVER (*)

Details

    Description

      CREATE TABLE tt (i int) ;
      INSERT INTO `tt` VALUES (1),(0),(1),(2),(0),(NULL),(1),(2),(NULL),(1),(2);
       
      SELECT  DISTINCT COUNT(*) OVER () FROM tt GROUP BY i ;
      SELECT  DISTINCT COUNT(*) OVER (), MOD(MIN(i),2) FROM tt GROUP BY i ;
      

      MariaDB [test]> SELECT  DISTINCT COUNT(*) OVER () FROM tt GROUP BY i ;
      +------------------+
      | COUNT(*) OVER () |
      +------------------+
      |                4 |
      +------------------+
      1 row in set (0.001 sec)
       
      MariaDB [test]> SELECT  DISTINCT COUNT(*) OVER (), MOD(MIN(i),2) FROM tt GROUP BY i ;
      +------------------+---------------+
      | COUNT(*) OVER () | MOD(MIN(i),2) |
      +------------------+---------------+
      |                0 |          NULL |
      |                0 |             0 |
      |                0 |             1 |
      +------------------+---------------+
      3 rows in set (0.001 sec)
      

      in Mysql 8.0.11

      mysql> SELECT  DISTINCT COUNT(*) OVER (), MOD(MIN(i),2) FROM tt GROUP BY i ;
      +------------------+---------------+
      | COUNT(*) OVER () | MOD(MIN(i),2) |
      +------------------+---------------+
      |                4 |             1 |
      |                4 |             0 |
      |                4 |          NULL |
      +------------------+---------------+
      3 rows in set (0.00 sec)
      

      Attachments

        Activity

          varun Varun Gupta (Inactive) added a comment - - edited

          The query is:  SELECT  DISTINCT COUNT(*) OVER (), MOD(MIN(i),2) FROM tt GROUP BY i ;
           
          This query requires two temp tables for execution, lets say these are tmp_table_1 and tmp_table_2
           
          tmp_table_1 fields =>  i, MIN(i), COUNT(*) OVER ()
          tmp_table_2 fields =>  i, MIN(i), COUNT(*) OVER (), MOD(MIN(i),2)
          

          So the execution is like
           
          1)Read data from table t1
               a) Perform grouping on the fly
               b) update MIN(i)
               c) write the new value to temporary table tmp_table1
          2) Read the records from temp table tmp_table1
          3) Evaluate MOD(MIN(i),2) and write the new record in the tmp_table2
          4) After all records are written, then compute the window function COUNT(*) OVER () on tmp_table2
          5) Perform Distinct on tmp_table2
          
          

          varun Varun Gupta (Inactive) added a comment - - edited The query is: SELECT DISTINCT COUNT(*) OVER (), MOD(MIN(i),2) FROM tt GROUP BY i ;   This query requires two temp tables for execution, lets say these are tmp_table_1 and tmp_table_2   tmp_table_1 fields => i, MIN(i), COUNT(*) OVER () tmp_table_2 fields => i, MIN(i), COUNT(*) OVER (), MOD(MIN(i),2) So the execution is like   1)Read data from table t1 a) Perform grouping on the fly b) update MIN(i) c) write the new value to temporary table tmp_table1 2) Read the records from temp table tmp_table1 3) Evaluate MOD(MIN(i),2) and write the new record in the tmp_table2 4) After all records are written, then compute the window function COUNT(*) OVER () on tmp_table2 5) Perform Distinct on tmp_table2

          (gdb) p (join_tab-2)->table
          $17 = (TABLE *) 0x61f00002f4f0
          (gdb) p (join_tab-2)->table->alias.Ptr
          $18 = 0x60c0000170f0 "t1"
           
          (gdb) p (join_tab-1)->table
          $19 = (TABLE *) 0x62200000f190
          (gdb) p (join_tab-1)->table->alias.Ptr
          $20 = 0x555557f2df60 ""
           
          (gdb) p (join_tab)->table
          $21 = (TABLE *) 0x622000010990
          (gdb) p (join_tab)->table->alias.Ptr
          $22 = 0x555557f2df60 ""
           
           
          (gdb) p item_win->result_field
          $23 = (Field *) 0x6190000417b0
          (gdb) p item_win->result_field->table
          $24 = (TABLE *) 0x62200000f190
          (gdb) p dbug_print_item(item_win)
          $26 = 0x555559464800 <dbug_item_print_buf> "count(0) over ()"
          

          varun Varun Gupta (Inactive) added a comment - (gdb) p (join_tab-2)->table $17 = (TABLE *) 0x61f00002f4f0 (gdb) p (join_tab-2)->table->alias.Ptr $18 = 0x60c0000170f0 "t1"   (gdb) p (join_tab-1)->table $19 = (TABLE *) 0x62200000f190 (gdb) p (join_tab-1)->table->alias.Ptr $20 = 0x555557f2df60 ""   (gdb) p (join_tab)->table $21 = (TABLE *) 0x622000010990 (gdb) p (join_tab)->table->alias.Ptr $22 = 0x555557f2df60 ""     (gdb) p item_win->result_field $23 = (Field *) 0x6190000417b0 (gdb) p item_win->result_field->table $24 = (TABLE *) 0x62200000f190 (gdb) p dbug_print_item(item_win) $26 = 0x555559464800 <dbug_item_print_buf> "count(0) over ()"
          varun Varun Gupta (Inactive) added a comment - Patch http://lists.askmonty.org/pipermail/commits/2019-December/014085.html

          Followup to the optimizer call discussion: I wasn't able to find any evidence that SELECT DISTINCT window_function()" is not allowed by the SQL Standard.

          I was able to find this evidence that others expect it to be allowed:

          http://www.mysqltutorial.org/mysql-window-functions/ says
          "Note that window functions are performed on the result set after all JOIN, WHERE, GROUP BY, and HAVING clauses and before the ORDER BY, LIMIT and SELECT DISTINCT."

          https://blog.jooq.org/2013/10/09/sql-trick-row_number-is-to-select-what-dense_rank-is-to-select-distinct/ uses

          SELECT DISTINCT
            v1,
            v2,
            v3,
            DENSE_RANK() OVER (window) row_number
          FROM t
          WINDOW window AS (ORDER BY v1, v2, v3)
          

          psergei Sergei Petrunia added a comment - Followup to the optimizer call discussion: I wasn't able to find any evidence that SELECT DISTINCT window_function()" is not allowed by the SQL Standard. I was able to find this evidence that others expect it to be allowed: http://www.mysqltutorial.org/mysql-window-functions/ says "Note that window functions are performed on the result set after all JOIN, WHERE, GROUP BY, and HAVING clauses and before the ORDER BY, LIMIT and SELECT DISTINCT." https://blog.jooq.org/2013/10/09/sql-trick-row_number-is-to-select-what-dense_rank-is-to-select-distinct/ uses SELECT DISTINCT v1, v2, v3, DENSE_RANK() OVER (window) row_number FROM t WINDOW window AS (ORDER BY v1, v2, v3)
          psergei Sergei Petrunia added a comment - - edited

          SQL Standard or not, Sybase Manual here http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.help.sqlanywhere.12.0.1/dbusage/ug-olap-s-3858836.html has examples like

          SELECT DISTINCT ID,
          FIRST_VALUE( ProductID ) OVER ( PARTITION BY ID ORDER BY LineID )
          FROM SalesOrderItems
          ORDER BY ID;
          

          psergei Sergei Petrunia added a comment - - edited SQL Standard or not, Sybase Manual here http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.help.sqlanywhere.12.0.1/dbusage/ug-olap-s-3858836.html has examples like SELECT DISTINCT ID, FIRST_VALUE( ProductID ) OVER ( PARTITION BY ID ORDER BY LineID ) FROM SalesOrderItems ORDER BY ID;

          I can confirm that SQL Standard says:
          DISTINCT as a <set quantifier> is valid before any valid <select list>

          igor Igor Babaev (Inactive) added a comment - I can confirm that SQL Standard says: DISTINCT as a <set quantifier> is valid before any valid <select list>

          The patch is ok to push.

          psergei Sergei Petrunia added a comment - The patch is ok to push.

          People

            varun Varun Gupta (Inactive)
            alice Alice Sherepa
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.