[MCOL-1793] regr_slope() and regr_r2() produce incorrect result when used as window functions Created: 2018-10-11  Updated: 2020-08-25  Resolved: 2018-11-29

Status: Closed
Project: MariaDB ColumnStore
Component/s: ExeMgr
Affects Version/s: 1.2.0
Fix Version/s: 1.2.2

Type: Bug Priority: Critical
Reporter: Daniel Lee (Inactive) Assignee: Daniel Lee (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Attachments: File regr_r2.100.sql     Text File regr_r2.100.sql.ref.txt     Text File regr_r2.100.sql.tst.txt    
Issue Links:
Duplicate
is duplicated by MCOL-1882 Aggregates performance degradation be... Closed
Problem/Incident
is caused by MCOL-521 add distributed regression aggregate ... Closed
is caused by MCOL-1759 Implement regr_sxx, regr_syy, regr_sx... Closed
Sprint: 2018-19, 2018-20

 Description   

The mentioned two functions returned different values as compared to Oracle 18c.



 Comments   
Comment by David Hall (Inactive) [ 2018-10-12 ]

This problem only seems to occur if the Window size is one row.

Comment by David Hall (Inactive) [ 2018-11-01 ]

It's not because of one row. This problem seems to be because the Current Row is outside the current frame.

Comment by David Hall (Inactive) [ 2018-11-02 ]

It appears that any UDAnF may show this issue. Here's a simple case where all values should be 0.00000000:
select o_custkey, regr_sxy(o_custkey,o_orderkey) OVER (PARTITION BY o_custkey ROWS BETWEEN 15 PRECEDING AND 15 PRECEDING) regr from orders where o_custkey = 100 order by 1, 2;
-------------------------------+

o_custkey regr

-------------------------------+

100 NULL
100 -1758119600.00000000
100 -1275744800.00000000
100 -816135000.00000000
100 -494070400.00000000
100 0.00000000
100 0.00000000
100 0.00000000
100 0.00000000
100 0.00000000
100 0.00000000
100 0.00000000
100 0.00000000
100 0.00000000
100 0.00000000
100 0.00000000
100 0.00000000
100 0.00000000
100 0.00000000
100 0.00000000

-------------------------------+
20 rows in set (0.099 sec)

Comment by David Hall (Inactive) [ 2018-11-06 ]

The issue may also show up in built-in aggregates as Window Functions.

Comment by David Hall (Inactive) [ 2018-11-23 ]

This includes the removal of the wide table usage in regr.

Comment by Daniel Lee (Inactive) [ 2018-11-27 ]

Build tested: 2018-11-26 nightly build, Lentos 7 RPM

regr_slope result is now matching with Oracle
regr_r2 result is still not matching. Please see attached results.

Comment by David Hall (Inactive) [ 2018-11-27 ]

Built in Aggregates now work correctly. Also, the following agrees with a reference Postgrsql:

MariaDB [tpch1]> select o_custkey, regr_sxy(o_custkey,o_orderkey) OVER (PARTITION BY o_custkey ROWS BETWEEN 15 PRECEDING AND 15 PRECEDING) regr from orders where o_custkey = 100 order by 1, 2;
---------------+

o_custkey regr

---------------+

100 NULL
100 NULL
100 NULL
100 NULL
100 NULL
100 NULL
100 NULL
100 NULL
100 NULL
100 NULL
100 NULL
100 NULL
100 NULL
100 NULL
100 NULL
100 0
100 0
100 0
100 0
100 0

---------------+
20 rows in set (0.112 sec)

Comment by David Hall (Inactive) [ 2018-11-28 ]

When used as a Window Function, a UDAnF with PARTITION BY restarts the calculations for each partition. The API is written such that if the function wants a NULL result, it simply doesn't assign a value in evaluate. Unfortunately, the framework isn't resetting the return value back to NULL each cycle, so any previous value was used instead.

The fix is to reset fValOut each time.

Comment by David Hall (Inactive) [ 2018-11-28 ]

Daniel bounced this back with one more small bug. If needed, it can wait for the next minor release.

Comment by Daniel Lee (Inactive) [ 2018-11-29 ]

Build tested: 1.2.2-1 (released to QA 2018-11-29 morning)

regr_slope() has matching results
regr_r2() is still having some variance and that is not due to significant digits

16 out of 248 queries have non-matching results.
The following is the reference and test results from query #166.
So the reference has some 0s, while the test have 1s.

[root@localhost compareResult]# more ref/regr_r2/regr_r2.166.sql.ref.txt
1 0.0000
1 1.0000
1 1.0000
1 1.0000
1 1.0000
2 0.0000
2 1.0000
2 1.0000
2 1.0000
2 1.0000
2 1.0000
2 1.0000
2 1.0000
2 1.0000
4 0.0000
4 1.0000
4 1.0000
4 1.0000
4 1.0000
4 1.0000
4 1.0000
4 1.0000
4 1.0000
4 1.0000
4 1.0000
4 1.0000
4 1.0000
4 1.0000
4 1.0000
4 1.0000
4 1.0000
4 1.0000
4 1.0000
4 1.0000
4 1.0000
4 1.0000
5 0.0000
5 1.0000
5 1.0000
5 1.0000
5 1.0000
5 1.0000
5 1.0000
5 1.0000
this is the test result
[root@localhost compareResult]# more tst/regr_r2/regr_r2.166.sql.tst.txt
1 1.0000
1 1.0000
1 1.0000
1 1.0000
1 1.0000
2 1.0000
2 1.0000
2 1.0000
2 1.0000
2 1.0000
2 1.0000
2 1.0000
2 1.0000
2 1.0000
4 1.0000
4 1.0000
4 1.0000
4 1.0000
4 1.0000
4 1.0000
4 1.0000
4 1.0000
4 1.0000
4 1.0000
4 1.0000
4 1.0000
4 1.0000
4 1.0000
4 1.0000
4 1.0000
4 1.0000
4 1.0000
4 1.0000
4 1.0000
4 1.0000
4 1.0000
5 1.0000
5 1.0000
5 1.0000
5 1.0000
5 1.0000
5 1.0000
5 1.0000
5 1.0000

here is the query on a 1mb dbt3 database
select o_custkey, regr_r2(o_custkey,o_orderkey) OVER (PARTITION BY o_custkey ORDER BY o_custkey ,o_orderkey ROWS BETWEEN CURRENT ROW AND 15 FOLLOWING) from (select * from orders where o_custkey <= 20000) s order by 1, 2;

Comment by Daniel Lee (Inactive) [ 2018-11-29 ]

The remaining variance issue for regr_r2() is being tracked by MCOL-1983. Closing this ticket now.

Generated at Thu Feb 08 02:31:25 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.