[MCOL-5430] Window Functions in projection with GROUP BY delivers wrong results Created: 2023-02-20  Updated: 2023-12-22

Status: Open
Project: MariaDB ColumnStore
Component/s: ExeMgr, PrimProc
Affects Version/s: 22.08.7
Fix Version/s: 23.10

Type: Bug Priority: Major
Reporter: Roman Assignee: Roman
Resolution: Unresolved Votes: 0
Labels: rm_invalid_data

Issue Links:
Relates
relates to MCOL-5187 OOM happening when querying large dat... Stalled
Sprint: 2023-4, 2023-5, 2023-6, 2023-7, 2023-8, 2023-10, 2023-11

 Description   

Consider the example:

CREATE TABLE empsalary (
    depname varchar(100),
    empno bigint,
    salary int,
    enroll_date date
)engine=columnstore;
 
insert into empsalary values ('develop'  , 10, 5200, '2007-08-01');
insert into empsalary values ('sales'    ,  1, 5000, '2006-10-01');
insert into empsalary values ('personnel',  5, 3500, '2007-12-10');
insert into empsalary values ('sales'    ,  4, 4800, '2007-08-08');
insert into empsalary values ('personnel',  2, 3900, '2006-12-23');
insert into empsalary values ('develop'  ,  7, 4200, '2008-01-01');
insert into empsalary values ('develop'  ,  9, 4500, '2008-01-01');
insert into empsalary values ('sales'    ,  3, 4800, '2007-08-01');
insert into empsalary values ('develop'  ,  8, 6000, '2006-10-01');
insert into empsalary values ('develop'  , 11, 5200, '2007-08-15');
insert into empsalary values ('develop'  , 12, null, '2008-08-09');

Here is the SELECT that differs b/w MDB and MCS. Here is MCS version.

MariaDB [test]> select avg(salary),depname, moda(salary) over(partition by depname order by enroll_date) from empsalary group by depname;
+-------------+-----------+--------------------------------------------------------------+
| avg(salary) | depname   | moda(salary) over(partition by depname order by enroll_date) |
+-------------+-----------+--------------------------------------------------------------+
|   6000.0000 | develop   | 6000                                                         |
|   5200.0000 | develop   | 5200                                                         |
|   5200.0000 | develop   | 5200                                                         |
|   4200.0000 | develop   | 5200                                                         |
|   4500.0000 | develop   | 5200                                                         |
|        NULL | develop   | 5200                                                         |
|   3900.0000 | personnel | 3900                                                         |
|   3500.0000 | personnel | 3500                                                         |
|   5000.0000 | sales     | 5000                                                         |
|   4800.0000 | sales     | 4800                                                         |
|   4800.0000 | sales     | 4800                                                         |
+-------------+-----------+--------------------------------------------------------------+
11 rows in set, 1 warning (0.080 sec)

Here is MDB.

MariaDB [test]> select avg(salary),depname, moda(salary) over(partition by depname order by enroll_date) from emp group by depname;
+-------------+-----------+--------------------------------------------------------------+
| avg(salary) | depname   | moda(salary) over(partition by depname order by enroll_date) |
+-------------+-----------+--------------------------------------------------------------+
|   5020.0000 | develop   | 5200                                                         |
|   3700.0000 | personnel | 3500                                                         |
|   4866.6667 | sales     | 5000                                                         |
+-------------+-----------+--------------------------------------------------------------+
3 rows in set (0.002 sec)

The origins of this incorrect behavior is the fact that RowStorage::hash gets an incorrect number of key columns to hash on thus MCS treats every row it receives from scanning in BPP::execute as a unique one.
The calculation of key columns number is buried deep in CSEP to JobList translation, namely in addProjectStepsToBps that is called inside combineJobStepsByTable. In simple words TAS uses the set of columns returned the previous delivery/projection step as TAS input. It removes all aggregate columns from the set and those that left are TAS key columns participate in hashing.
In case when WF is projection combineJobStepsByTable run by TAS sees aux columns added by WF so TAS hashing miscalculates every row hash making all rows potentially unique and this alters TAS result set.


Generated at Thu Feb 08 02:57:49 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.