[MCOL-3492] Do DISTINCT in it's proper order -- after Window Functions and before UNION Created: 2019-09-09 Updated: 2020-08-25 Due: 2019-09-25 Resolved: 2019-12-11 |
|
| Status: | Closed |
| Project: | MariaDB ColumnStore |
| Component/s: | ExeMgr |
| Affects Version/s: | 1.2.5 |
| Fix Version/s: | 1.2.6 |
| Type: | Bug | Priority: | Major |
| Reporter: | David Hall (Inactive) | Assignee: | Daniel Lee (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Sprint: | 2019-06 |
| Description |
|
Currently, DISTINCT is implemented by creating a GROUP BY with all the non-aggregate columns. This causes DISTINCT to be executed before Window Functions, and it includes the partition and order by columns from the window function. The standard is do the uniqueness after Window Functions, just before UNION (if any). The results are similar, unless Window Functions are involved, in which case, the DISTINCT being executed before Window Functions deletes rows that should be included in the Window Function processing, resulting in the wrong answer. This page shows the expected order of SQL processing: https://www.eversql.com/sql-order-of-operations-sql-query-order-of-execution/ |
| Comments |
| Comment by David Hall (Inactive) [ 2019-09-12 ] | ||||||||||||||||||||||||||
|
TupleAnnexStep already has this capability. By disabling the DISTINCT as aggregate feature in the presence of Window Functions, we get the proper answer. | ||||||||||||||||||||||||||
| Comment by David Hall (Inactive) [ 2019-09-12 ] | ||||||||||||||||||||||||||
|
regressiontest PR #139 | ||||||||||||||||||||||||||
| Comment by Daniel Lee (Inactive) [ 2019-12-11 ] | ||||||||||||||||||||||||||
|
build tested: 1.2.6-1 engine commit: I don't know if this is a internal logic change, or a change that corrects results output. I tried the following test case from the developer on 1.2.5-1, for both ColumnStore and InnoDB tables (1GB D BT3 database) and got identical results (See below). I then tried the same test case on 1.2.6-1 and also got the same results. QA needs some input from development on the identified issue and implemented fix. Thanks. MariaDB [tpch1]> select count
----------
---------- MariaDB [tpch1]> select distinct l_orderkey as ord, sum(l_quantity) over (partition by l_orderkey) as qty_sum from lineitem where l_orderkey > 5996000;
--------
-------- Removed distinct MariaDB [tpch1]> select l_orderkey as ord, sum(l_quantity) over (partition by l_orderkey) as qty_sum from lineitem where l_orderkey > 5996000;
--------
-------- | ||||||||||||||||||||||||||
| Comment by David Hall (Inactive) [ 2019-12-11 ] | ||||||||||||||||||||||||||
|
This test case tests the modified code. All looks well. |