Currently, DISTINCT is implemented by creating a GROUP BY with all the non-aggregate columns. This causes DISTINCT to be executed before Window Functions, and it includes the partition and order by columns from the window function. The standard is do the uniqueness after Window Functions, just before UNION (if any). The results are similar, unless Window Functions are involved, in which case, the DISTINCT being executed before Window Functions deletes rows that should be included in the Window Function processing, resulting in the wrong answer.
TupleAnnexStep already has this capability. By disabling the DISTINCT as aggregate feature in the presence of Window Functions, we get the proper answer.
David Hall (Inactive)
added a comment - TupleAnnexStep already has this capability. By disabling the DISTINCT as aggregate feature in the presence of Window Functions, we get the proper answer.
I don't know if this is a internal logic change, or a change that corrects results output. I tried the following test case from the developer on 1.2.5-1, for both ColumnStore and InnoDB tables (1GB D BT3 database) and got identical results (See below). I then tried the same test case on 1.2.6-1 and also got the same results.
QA needs some input from development on the identified issue and implemented fix.
Thanks.
MariaDB [tpch1]> select count from lineitem where l_orderkey > 5996000;
----------
count
----------
4012
----------
1 row in set (0.155 sec)
MariaDB [tpch1]> select distinct l_orderkey as ord, sum(l_quantity) over (partition by l_orderkey) as qty_sum from lineitem where l_orderkey > 5996000;
----------------+
ord
qty_sum
----------------+
.
.
.
5999972
80.00
5999973
50.00
5999974
70.00
5999975
57.00
6000000
33.00
----------------+
1000 rows in set (0.263 sec)
Removed distinct
MariaDB [tpch1]> select l_orderkey as ord, sum(l_quantity) over (partition by l_orderkey) as qty_sum from lineitem where l_orderkey > 5996000;
----------------+
ord
qty_sum
----------------+
.
.
.
5999975
57.00
5999975
57.00
5999975
57.00
6000000
33.00
6000000
33.00
----------------+
4012 rows in set (0.177 sec)
Daniel Lee (Inactive)
added a comment - - edited build tested: 1.2.6-1
engine commit:
d4173ef
I don't know if this is a internal logic change, or a change that corrects results output. I tried the following test case from the developer on 1.2.5-1, for both ColumnStore and InnoDB tables (1GB D BT3 database) and got identical results (See below). I then tried the same test case on 1.2.6-1 and also got the same results.
QA needs some input from development on the identified issue and implemented fix.
Thanks.
MariaDB [tpch1] > select count from lineitem where l_orderkey > 5996000;
----------
count
----------
4012
----------
1 row in set (0.155 sec)
MariaDB [tpch1] > select distinct l_orderkey as ord, sum(l_quantity) over (partition by l_orderkey) as qty_sum from lineitem where l_orderkey > 5996000;
-------- --------+
ord
qty_sum
-------- --------+
.
.
.
5999972
80.00
5999973
50.00
5999974
70.00
5999975
57.00
6000000
33.00
-------- --------+
1000 rows in set (0.263 sec)
Removed distinct
MariaDB [tpch1] > select l_orderkey as ord, sum(l_quantity) over (partition by l_orderkey) as qty_sum from lineitem where l_orderkey > 5996000;
-------- --------+
ord
qty_sum
-------- --------+
.
.
.
5999975
57.00
5999975
57.00
5999975
57.00
6000000
33.00
6000000
33.00
-------- --------+
4012 rows in set (0.177 sec)
TupleAnnexStep already has this capability. By disabling the DISTINCT as aggregate feature in the presence of Window Functions, we get the proper answer.