[MCOL-3343] Window Functions don't work with arithmetic operators or other functions Created: 2019-05-30  Updated: 2019-10-28  Resolved: 2019-07-05

Status: Closed
Project: MariaDB ColumnStore
Component/s: ExeMgr
Affects Version/s: 1.1.5, 1.2.3
Fix Version/s: 1.2.5

Type: Task Priority: Major
Reporter: David Hall (Inactive) Assignee: Daniel Lee (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Duplicate
is duplicated by MCOL-3338 Sporadic Internal error: TreeNode::ge... Closed
Relates
relates to MCOL-3338 Sporadic Internal error: TreeNode::ge... Closed
Sprint: 2019-05, 2019-06

 Description   

Discovered during QA:

create table tb1 (d1 decimal(7,2), i1 int)engine=columnstore;
insert into tb1 values (7.12, 1), (12.12, 1);

select (sum(d1)*100.0) div sum(sum(d1)) over (partition by i1) from tb1 group by i1;

This results in a random output in 1.2, and a 0 output in 1.1. The answer is 100 as can be seen when using innodb:
MariaDB [dhall]> select (sum(d1)*100.0) div sum(sum(d1)) over (partition by i1) from inno_tb1 group by i1;
---------------------------------------------------------

(sum(d1)*100.0) div sum(sum(d1)) over (partition by i1)

---------------------------------------------------------

100

---------------------------------------------------------
1 row in set (0.008 sec)

Using the divide operator '/' instead of the div function may result in a different set of random numbers.



 Comments   
Comment by David Hall (Inactive) [ 2019-05-31 ]

It appears that if one side of an arithmetic operator (or function like DIV) is an aggregate and the other is a Window Function, then somehow it gets the operator twice as a sub function of each side. When each is executed, the other side isn't available. Whichever gets done last is the final result.

ExeMgr spins up threads for each, so in the following query:

select count / sum(d1) over () from tb1 group by d1;

The count will get done in one thread and perform the divide of count with unknown in the denominator.
The sum(d1) will get done in another thread and perform the divide with unknown in the numerator and sum(d1) in the denominator.
On my machine, the second result is returned.

Somehow, the setup code is acting as if the '/' is inside of each function, rather than the root of the tree.

This does not happen if we do aggregate <op> aggregate. Only if one side is a Window Function.

Comment by David Hall (Inactive) [ 2019-06-14 ]

The title of this MCOL says incorrect in a specific case. That is incorrect. It's a much more general case involving more than one cause.

After fixing the original compaint of agg/win, I thought to play around with more complex arithmetic formula involving window and aggregate functions and got a crash. Simplifying it down, it turns out something simple doesn't work:
MariaDB [dhall]> select sum(d1) over() / c1 from tb1;
ERROR 1815 (HY000): Internal error: IDB-9024: 'c1' is not in tuple.
Whereas aggregate:
MariaDB [dhall]> select sum(d1) / c1 from tb1 group by c1;
--------------------

sum(d1) / c1

--------------------

0.2714285714285714
2.267515923566879
0.7087378640776698

--------------------
The c1 is not in tuple error implies that the setup logic doesn't realize that c1 has to be in the projection list. Figuring out just where to make a small change isn't always easy.
Interestingly.
MariaDB [dhall]> select c1,sum(d1) over() / c1 from tb1;
-------------------------+

c1 sum(d1) over() / c1

-------------------------+

3.14 4.095541401273885
5.15 2.4970873786407766
7.7 1.67012987012987

-------------------------+
This works because c1 is already in the projection list because we specifically asked for it.

However, fixing this little problem may or may not cause this to work:
MariaDB [dhall]> select c1*sum(d1) over() / count from tb1 group by c1, d1;
--------------------------------

c1*sum(d1) over() / count

--------------------------------

0.0000000000007071900395043137
NULL
NULL

--------------------------------
As you can see, this returns garbage.

Comment by David Hall (Inactive) [ 2019-06-20 ]

Test case PR #121

For QA: The bug is manifest any time a Window Function is used with another function or arithmetic operator. See the regression tests in working_tpch1_compareLogOnly/windowFunctions/MCOL-3343.sql for examples. You can then try any combination of Window Functions, aggregates and simple columns to try to break it. Use any database for a ref if you want. Notice that MariaDB Server may allow some combinations without the proper GROUP BY clause, but CS does not allow such things.

Comment by David Hall (Inactive) [ 2019-06-20 ]

Once this bug is closed, then MCOL-3338 should be tested and closed

Comment by Daniel Lee (Inactive) [ 2019-07-05 ]

Build verified: 1.2.5-1 nightly

erver commit:
f44f7d9
engine commit:
4e477ab

Verified test case in the ticket.

Generated at Thu Feb 08 02:42:02 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.