[MCOL-1983] regr_intercept, regr_r2, regr_slope and possibly other regr functions should return NULL with only one row. Created: 2018-11-29  Updated: 2019-02-08  Resolved: 2019-02-08

Status: Closed
Project: MariaDB ColumnStore
Component/s: ExeMgr
Affects Version/s: 1.2.1
Fix Version/s: 1.2.3

Type: Bug Priority: Minor
Reporter: David Hall (Inactive) Assignee: Daniel Lee (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Sprint: 2018-21, 2019-01

 Description   

These regr functions are working with lines. It requires at least two data points to define a line, so when the functions are given only one data point, they should return NULL. This would be consistent with PostgreSQL, Oracle and Snowflake.



 Comments   
Comment by David Hall (Inactive) [ 2018-12-10 ]

For test. Any query that would return the aggregate of a single row should now return NULL.

Comment by Roman [ 2018-12-10 ]

It looks like this isn't addressed by regression test suite b/c it wasn't changed. Could you add a query with mentioned UDFs and with only one point in the dataset.

Comment by Roman [ 2018-12-10 ]

Though code is OK.

Comment by David Hall (Inactive) [ 2019-01-02 ]

Spent a bit of time re-writing many of the tests for the regr_*** functions, as it turned out the results were non-deterministic, meaning our regression tests failed because of small differences.

Specifically, I added ROUND() to many to get rid of DOUBLE inconsistencies, and changed all the queries that use RANK for nested testing so that the inside query returned unique results and the RANK wouldn't be random.

I also changed all the queries that had a non-numeric type as an argument, as this caused inconsistencies as well. Code was changed to dis-allow non-numeric arguments where they don't make sense.

Comment by David Hall (Inactive) [ 2019-01-21 ]

Modified the regr_*** functions to use long double internally. Most g++ compilers for intel based cpus use 128 bit floating point for long double. This additional accuracy helps a bit with the rounding error problems we've been having.

Comment by Daniel Lee (Inactive) [ 2019-02-08 ]

Build verified: 1.2.3-1 from buildbot nightly

server commit:
61f32f2
engine commit:
46cc344

Reduced number of variance for all regression* functions. Now regr_r2() and regr_slope(), as well as corr() match with Oracle reference.

Generated at Thu Feb 08 02:32:49 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.