[MCOL-4516] regr_r2 and potentially other statistical functions can't handle tiny negative float values as variance Created: 2021-01-27  Updated: 2021-04-30  Resolved: 2021-03-23

Status: Closed
Project: MariaDB ColumnStore
Component/s: ExeMgr
Affects Version/s: 5.4.1
Fix Version/s: 5.6.1

Type: Bug Priority: Minor
Reporter: Roman Assignee: Daniel Lee (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Relates
relates to MCOL-641 Full DECIMAL support in ColumnStore Closed
relates to MCOL-4188 Verify the whole Full DECIMAL patch a... Closed
Sprint: 2021-2, 2021-3, 2021-4, 2021-5

 Description   

The current statistical functions code doesn't take into account a minuscule negative deviation value that is actually represents a float 0.
According with the basic properties of Variance [1] and the regr_r2 calculation algo [2] the covar_popy must be non-negative so that sqrt(covar_popy) in regr_r2::evaluate() doesn't return NaN.

4QA This query must return 0.0000000000 according with the reference

{ select round(regr_r2(l_tax, l_extendedprice),10) from lineitem where l_tax = 0.05 group by l_tax order by l_tax; }

1. https://en.wikipedia.org/wiki/Variance
2. https://docs.oracle.com/cd/B28359_01/server.111/b28286/functions139.htm#SQLRF00696



 Comments   
Comment by Gregory Dorman (Inactive) [ 2021-01-29 ]

David.Hall - do we believe this will go away with Full DECIMAL?

Can you double check "affects versions"?

Comment by David Hall (Inactive) [ 2021-02-02 ]

No, it will not go away since statistical calculations are done in long double. This is industry standard.
It's simple. Change "<value> == 0" to "<value> <= 0". Check all the regr functions for this bug.

Comment by Roman [ 2021-02-02 ]

I didn't get where we should apply the change suggested but my suggestion is to check the regr_* functions code to avoid using negative arguments with sqrt() function.

Comment by Alexander Barkov [ 2021-03-05 ]

The patches at
https://github.com/mariadb-corporation/mariadb-columnstore-engine/pull/1776
now look fine for me.
However, can you please squash them into a single patch?

Ok to push after that.

Thanks.

Comment by Daniel Lee (Inactive) [ 2021-03-23 ]

Build verified: 5.6.1 ( Drone #1915 )

[centos8:root~]# /data/qa/autopilot/databases/dbt3/sh/buildDatabase.sh tpch1 columnstore 1g
[centos8:root~]# mysql tpch1
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 11
Server version: 10.5.10-MariaDB MariaDB Server

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [tpch1]> select round(regr_r2(l_tax, l_extendedprice),10) from lineitem where l_tax = 0.05 group by l_tax order by l_tax;
-------------------------------------------

round(regr_r2(l_tax, l_extendedprice),10)

-------------------------------------------

0.0000000000

-------------------------------------------
1 row in set (0.462 sec)

Generated at Thu Feb 08 02:50:56 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.