[MCOL-3435] Aggregate and Window functions can't co-exist Created: 2019-08-15  Updated: 2020-01-20  Resolved: 2020-01-20

Status: Closed
Project: MariaDB ColumnStore
Component/s: ExeMgr
Affects Version/s: 1.1.6, 1.2.5
Fix Version/s: 1.4

Type: Task Priority: Major
Reporter: David Hall (Inactive) Assignee: David Hall (Inactive)
Resolution: Won't Fix Votes: 0
Labels: None

Sprint: 2019-06, 2020-1

 Description   

A simple query containing both a window function and an aggregate will cause an eroneous syntax err:

MariaDB [dhall]> select avg(comm_total) as wk_comm_avg, max(trade_date) over () as max_week from book_entry_cs1;
ERROR 1815 (HY000): Internal error: IDB-2021: 'trade_date' is not in GROUP BY clause. All non-aggregate columns in the SELECT and ORDER BY clause must be included in the GROUP BY clause.
MariaDB [dhall]>

The error is thrown by tupleaggregatestep.cpp prep2PhasesAggregate() (for this case) during setup. It checks for window functions, but the items inside the window function are erroneously marked as not in a group by. They don't need to be because they're in a window function.

It uses jobInfo.windowSet to look for window functions. These are entered into the set by WindowFunctionStep::checkWindowFunction(). The parts of the window function are not added.

Investigation needs to made on whether they should just be added to the windowSet, or should something else be done.

These queries work in InnoDB.



 Comments   
Comment by David Hall (Inactive) [ 2019-09-12 ]

The fix is to add the elements of the Window Function as implied group by with the caveat that an element that is a nested aggregate not be grouped.

It appears that other DB's insist that they be in an explicit group by, but MariaDB does not. Both Postgres and Oracle were tested.

Comment by David Hall (Inactive) [ 2019-09-12 ]

regressiontest PR#140

Comment by Daniel Lee (Inactive) [ 2019-12-10 ]

Build verified: 1.2.6-1

engine commit:
d4173ef

I did the following test on both ColumnStore and InnoDB tables.

1) Create DBT3 orders table
2) load 1 MB dataset ( 1500 rows)
3) Run this query "select avg(o_totalprice) as a, max(o_orderkey) over () as b from orders;"

InnoDB and ColumnStore tables returned difference resultsets (see below). Sending the ticket back to further investigation.

InnoDB return 1 row:

MariaDB [tpch1i]> select avg(o_totalprice) from orders;
-------------------

avg(o_totalprice)

-------------------

100672.603033

-------------------
1 row in set (0.006 sec)

ColumnStore table return 1500 rows

MariaDB [tpch1m]> select avg(o_totalprice) as a, max(o_orderkey) over () as b from orders;
-------------------+

a b

-------------------+

16763.950000 5988
79646.890000 5988
3223.170000 5988
114990.630000 5988

.
.
.

163794.530000 5988
13282.230000 5988
34768.680000 5988
120626.490000 5988
199102.230000 5988

-------------------+
1500 rows in set (0.197 sec)

Generated at Thu Feb 08 02:42:42 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.