[MCOL-5161] MODA() to support char and varchar Created: 2022-07-15  Updated: 2022-08-16  Resolved: 2022-08-16

Status: Closed
Project: MariaDB ColumnStore
Component/s: PrimProc
Affects Version/s: None
Fix Version/s: 22.08.1

Type: New Feature Priority: Major
Reporter: David Hall (Inactive) Assignee: Daniel Lee (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
PartOf
is part of MCOL-5092 group by (moda) on FLOAT result in In... Closed
Assigned for Testing: Daniel Lee Daniel Lee (Inactive)

 Description   

Our current implementation of the aggregate MODA supports all numeric types. Implementations by other databases support string types as well. I would expect some users to want such functionality.



 Comments   
Comment by David Hall (Inactive) [ 2022-07-29 ]

This functionality was incorporated in MCOL-5092

Comment by David Hall (Inactive) [ 2022-07-29 ]

QA: Moda() takes the input column (after filters) and finds the value with the most occurrences. If there's a tie for numerical types it grabs the one closest to avg(). If there's still a tie, it uses the smallest one.

For string types, there is no concept of avg(), so in case of a tie, it should choose the one with the lowest collation order.

We should test for both CHAR with lengths < 9 and lengths > 8 and VARCHAR with lengths < 8 and lengths > 7. Note that this is byte lengths. We should also test with various collations and character sets with multi-byte characters.

Comment by Daniel Lee (Inactive) [ 2022-08-15 ]

Build tested: 22.08-1 (#5290)

Tested the following test case using different data types char(5), char(13), varchar(5), and varchar(13).

Using numeric strings

MariaDB [mytest]> CREATE TABLE tchar (C1 char(5)) ENGINE=COLUMNSTORE
    -> ;
Query OK, 0 rows affected (1.119 sec)
 
MariaDB [mytest]> INSERT tchar VALUES ('1'),('2'),('2'),('2'),('3'),('4'),('8'),('8'),('8');
Query OK, 9 rows affected (0.246 sec)
Records: 9  Duplicates: 0  Warnings: 0
 
MariaDB [mytest]> SELECT MODA(C1) FROM tchar;
+----------+
| MODA(C1) |
+----------+
| 8        |
+----------+
1 row in set (0.028 sec)
 
MariaDB [mytest]> select * from tchar order by c1;
+------+
| C1   |
+------+
| 1    |
| 2    |
| 2    |
| 2    |
| 3    |
| 4    |
| 8    |
| 8    |
| 8    |
+------+
9 rows in set (0.004 sec)
 
MariaDB [mytest]> insert into tchar values('9');
Query OK, 1 row affected (1.188 sec)
 
MariaDB [mytest]> SELECT MODA(C1) FROM tchar;
+----------+
| MODA(C1) |
+----------+
| 8        |
+----------+
1 row in set (0.014 sec)

Using strings

MariaDB [mytest]> CREATE TABLE tvchar (C1 varchar(13)) ENGINE=COLUMNSTORE
    -> ;
Query OK, 0 rows affected (1.112 sec)
 
MariaDB [mytest]> INSERT tvchar VALUES ('t1'),('t2'),('t2'),('t2'),('t3'),('t4'),('t8'),('t8'),('t8');
Query OK, 9 rows affected (0.145 sec)
Records: 9  Duplicates: 0  Warnings: 0
 
MariaDB [mytest]> insert into tvchar values('t9'),('t9'), ('t9');
Query OK, 3 rows affected (0.171 sec)
Records: 3  Duplicates: 0  Warnings: 0
 
MariaDB [mytest]> SELECT MODA(C1) FROM tvchar;
+----------+
| MODA(C1) |
+----------+
| t8       |
+----------+
1 row in set (0.028 sec)

Comment by Daniel Lee (Inactive) [ 2022-08-15 ]

2 or t2 were expected from the tests above.

Comment by Daniel Lee (Inactive) [ 2022-08-16 ]

Reopen per the last test result.

Comment by Daniel Lee (Inactive) [ 2022-08-16 ]

Build verified: 22.08-1 (#5303)

Generated at Thu Feb 08 02:55:48 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.