[MCOL-1985] Fix up regr functions in regrmysql.cpp so regr_* funtions work correcly for InnoDB tables Created: 2018-11-29 Updated: 2019-10-28 Resolved: 2019-07-05 |
|
| Status: | Closed |
| Project: | MariaDB ColumnStore |
| Component/s: | MDB Plugin |
| Affects Version/s: | 1.2.1 |
| Fix Version/s: | 1.2.5 |
| Type: | Bug | Priority: | Minor |
| Reporter: | David Hall (Inactive) | Assignee: | Daniel Lee (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Sprint: | 2019-04, 2019-05, 2019-06 |
| Description |
|
regrmysql.cpp is the UDAF plugin for the regr_*** functions. It hasn't been kept up to date. The idea is that we also add the regr_*** functions to all engines via the Server API. However, the algorithms haven't been kept up with the changes in the Columnstore implementations. In addition, they return a fixed decimal of 2 decimal places, which isn't correct. |
| Comments |
| Comment by David Hall (Inactive) [ 2019-04-18 ] |
|
QA: The following queries should produce the same answer in InnoDB as Columnstore. In some cases, the number of decimals will be different, but it should never happen that InnoDB has fewer decimals than Columnstore in the regr_*** column. In these queries, I created an inno_**** set of the database. If you already have a reference database set up that happens to have the Columnstore libs, then you can use that (change the table names in the queries). These queries are not magic. What's important is that InnoDB and Columnstore match in the results of the regr_*** functions. Any query that returns a non-zero result can be used for comparison, but it is important to check that the number of digits to the right of the decimal for innodb is at least that of Columnstore. A true test would use Int, double and decimal types. Also testing as a Window function might be appropriate. Be aware of WARNING: These queries can run a long time on InnoDB, and really long if there's no indexes. A smaller data sample might be better. If you have a MariaDB reference without the Columnstore libs, you can copy libregr_mysql.so.1.0.0 from columnstore/mysql/lib to your mariadb installation. and create a symlink to libregr_mysql.so. CREATE the functions in a mariadb client: CREATE AGGREGATE FUNCTION corr returns REAL soname 'libregr_mysql.so'; SELECT empno, depname, salary, bonus, regr_avgx(empno, bonus) OVER (ORDER BY empno), MAX(bonus) OVER (ORDER BY empno) from (select *, (2013 - extract(YEAR FROM enroll_date)) * 500 as bonus from inno_empsalary3 where (empno >10000) and (empno < 10100) ) s order by depname, salary, bonus; select avg(distinct l_extendedprice), l_tax, regr_avgy(l_tax, l_extendedprice) from inno_lineitem group by l_tax order by l_tax; select count(l_extendedprice), l_tax, regr_count(l_tax, l_extendedprice) from inno_lineitem group by l_tax order by l_tax; select avg(l_extendedprice), l_tax, round(regr_intercept(l_tax, l_extendedprice), 8) intercept from inno_lineitem group by l_tax order by l_tax; select avg(l_extendedprice), l_tax, round(regr_r2(l_tax, l_extendedprice), 12) from inno_lineitem group by l_tax order by l_tax; select avg(l_extendedprice), l_tax, regr_sxx(l_partkey, l_suppkey) from inno_lineitem group by l_tax order by l_tax; select n_name, l_commitdate, round(regr_sxy(l_extendedprice, s_acctbal), 3) regr_sxy_bal, select p_brand, sum(l_quantity) tot_qty, select p_brand, sum(l_quantity) tot_qty, round(corr(l_tax, l_quantity),3) corr_qty, count select p_brand, sum(l_quantity) tot_qty, round(covar_pop(l_tax, l_quantity),3) covar_pop_qty, count select p_brand, sum(l_quantity) tot_qty, round(covar_samp(l_tax, l_quantity),3) covar_samp_qty, count |
| Comment by David Hall (Inactive) [ 2019-04-23 ] |
|
Added new PR#741 |
| Comment by Daniel Lee (Inactive) [ 2019-05-14 ] |
|
Testing has been completed on nightly build. Please review test results and checkout the issues that have been identified. |
| Comment by David Hall (Inactive) [ 2019-05-21 ] |
|
The functions are returning the correct answers in both InnoDB and Columnstore. There's a significant discrepancy between Columnstore and an Oracle reference run. The discrepancy appears to be caused by a difference in how the final sort is done. Columnstore (and MariaDB) treat the output of the regr_*** window functions as REAL and sort numerically. Oracle treats these as strings and sorts as such: select o_custkey, regr_sxx(o_custkey,o_orderkey) OVER (PARTITION BY 55 ORDER BY o_custkey ,o_orderkey ROWS 15 PRECEDING) from orders order by 1, 2; From columnstore: From oracle: So naturally, a diff of these results shows significant differences. |
| Comment by Daniel Lee (Inactive) [ 2019-07-05 ] |
|
Build verified: 1.2.5-1 nightly server commit: |