[MCOL-4516] regr_r2 and potentially other statistical functions can't handle tiny negative float values as variance Created: 2021-01-27 Updated: 2021-04-30 Resolved: 2021-03-23 |
|
| Status: | Closed |
| Project: | MariaDB ColumnStore |
| Component/s: | ExeMgr |
| Affects Version/s: | 5.4.1 |
| Fix Version/s: | 5.6.1 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Roman | Assignee: | Daniel Lee (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||
| Sprint: | 2021-2, 2021-3, 2021-4, 2021-5 | ||||||||||||
| Description |
|
The current statistical functions code doesn't take into account a minuscule negative deviation value that is actually represents a float 0. 4QA This query must return 0.0000000000 according with the reference { select round(regr_r2(l_tax, l_extendedprice),10) from lineitem where l_tax = 0.05 group by l_tax order by l_tax; }1. https://en.wikipedia.org/wiki/Variance |
| Comments |
| Comment by Gregory Dorman (Inactive) [ 2021-01-29 ] | ||
|
David.Hall - do we believe this will go away with Full DECIMAL? Can you double check "affects versions"? | ||
| Comment by David Hall (Inactive) [ 2021-02-02 ] | ||
|
No, it will not go away since statistical calculations are done in long double. This is industry standard. | ||
| Comment by Roman [ 2021-02-02 ] | ||
|
I didn't get where we should apply the change suggested but my suggestion is to check the regr_* functions code to avoid using negative arguments with sqrt() function. | ||
| Comment by Alexander Barkov [ 2021-03-05 ] | ||
|
The patches at Ok to push after that. Thanks. | ||
| Comment by Daniel Lee (Inactive) [ 2021-03-23 ] | ||
|
Build verified: 5.6.1 ( Drone #1915 ) [centos8:root~]# /data/qa/autopilot/databases/dbt3/sh/buildDatabase.sh tpch1 columnstore 1g Welcome to the MariaDB monitor. Commands end with ; or \g. Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. MariaDB [tpch1]> select round(regr_r2(l_tax, l_extendedprice),10) from lineitem where l_tax = 0.05 group by l_tax order by l_tax;
-------------------------------------------
------------------------------------------- |