[MCOL-3503] Create MODA function for columnstore Created: 2019-09-16 Updated: 2022-02-28 Due: 2019-09-25 Resolved: 2019-11-18 |
|
| Status: | Closed |
| Project: | MariaDB ColumnStore |
| Component/s: | ExeMgr |
| Affects Version/s: | 1.4 |
| Fix Version/s: | 1.4.1 |
| Type: | New Feature | Priority: | Major |
| Reporter: | David Hall (Inactive) | Assignee: | Daniel Lee (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||
| Epic Link: | ColumnStore Compatibility Improvements | ||||
| Sprint: | 2019-06 | ||||
| Description |
|
A new aggregate function MODA which takes the mode of the input numeric field and determines the mode. Tie breakers are: Closest to the AVG, then smallest absolute value. |
| Comments |
| Comment by David Hall (Inactive) [ 2019-09-25 ] | ||||||||
|
The Columnstore UDAF uses a template class to differentiate the types of input and allows for smallest type in the hash table. The Server UDF uses the return type to dictate the datatype of the hash table. This does not allow for the smallest possible hash table. The Return Types are INT_RESULT, REAL_RESULT, DECIMAL_RESULT and STRING_RESULT, with DECIMAL_RESULT expected to return a c string. So we have int64_t, double, char*, char*. That's our options. There doesn't appear to be any way to determine the characteristics of the incoming column. For decimal, I chose to store as long double, as that seemed more efficient than trying to store the incoming decimal strings. | ||||||||
| Comment by David Hall (Inactive) [ 2019-09-30 ] | ||||||||
|
There's a regression test pull request as well | ||||||||
| Comment by Daniel Lee (Inactive) [ 2019-10-07 ] | ||||||||
|
Build tested: 1.4.1-1 source Engine code: [root@localhost engine]# git show The moda() should return 2. I worked with Mr. Hall and there seems to be an issue here: MariaDB [(none)]> select * from mysql.func where name = 'moda';
-----
----- After I changed the value for ret to 4 and restarted the system, the same query would return the correct result. | ||||||||
| Comment by David Hall (Inactive) [ 2019-10-07 ] | ||||||||
|
Changed return type to 4 (DECIMAL_RESULT) in install_calpont_mysql.sh | ||||||||
| Comment by Daniel Lee (Inactive) [ 2019-10-08 ] | ||||||||
|
Build tested: 1.4.1-1 source [root@localhost engine]# git show Merge pull request #896 from mariadb-corporation/updateGitIgnore update .gitignore to reflect generated scripts For unsigned integer and unsigned bigint, incorrect result is returned: -------------- Query OK, 0 rows affected (0.168 sec) -------------- Query OK, 9 rows affected (0.580 sec) -------------- ---------
---------
--------- -------------- ----------
----------
---------- -------------- Query OK, 0 rows affected (0.172 sec) -------------- Query OK, 9 rows affected (0.890 sec) -------------- ---------
---------
--------- -------------- ----------
----------
---------- Both queries should return 2 instead of 8. Also verify moda() with subquery, functions, and having clause. | ||||||||
| Comment by Daniel Lee (Inactive) [ 2019-10-09 ] | ||||||||
|
The following test case also failed, for both SIGN and UNSIGNED data types: Does it have something to do with the failed data types being integers? Example for TINYINT. A return value of 6 is expected Query OK, 0 rows affected (0.247 sec) -------------- Query OK, 9 rows affected (0.633 sec) -------------- ---------
---------
--------- -------------- --------
--------
-------- | ||||||||
| Comment by David Hall (Inactive) [ 2019-10-11 ] | ||||||||
|
For unsigned, the issue is this: The second issue with the return value being 5 rather than 6 is indeed due to INTEGER arithmetic. AVG is being truncated before the compare, so the AVG 5.6667 is turned into 5. | ||||||||
| Comment by David Hall (Inactive) [ 2019-10-11 ] | ||||||||
|
The fix is to use long double arithmetic for tie breaking regardless of the column data type. Long double gives us the full unsigned int precision for integers (on most platforms), while double would have more rounding error for large values. | ||||||||
| Comment by Daniel Lee (Inactive) [ 2019-10-30 ] | ||||||||
|
Build tested: 1.4.1-1 [dlee@master centos7]$ cat gitversionInfo.txt Encountered the following issue. Investigated on the issue but have not figured out the cause. MariaDB [mytest]> select moda(c1) from mcol3505; | ||||||||
| Comment by Daniel Lee (Inactive) [ 2019-11-18 ] | ||||||||
|
Build verfied: 1.4.1-1 github source Server Engine Make a build from source and verified test case |