[MCOL-3536] Order by with UTF Created: 2019-10-01 Updated: 2021-03-19 Resolved: 2020-06-24 |
|
| Status: | Closed |
| Project: | MariaDB ColumnStore |
| Component/s: | ExeMgr |
| Affects Version/s: | 1.2, 1.4 |
| Fix Version/s: | 1.5.2 |
| Type: | Bug | Priority: | Major |
| Reporter: | David Hall (Inactive) | Assignee: | Daniel Lee (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Sprint: | 2019-06, 2020-1, 2020-2, 2020-3, 2020-4, 2020-5, 2020-6, 2020-7 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
We now use the internal ORDER BY of columnstore rather than relying on Server to do the ORDER BY. However, our current collation doesn't allow for anything other than Latin-1. A means of getting the collation type from the table and passing it to the collation step must be created. Then the sort itself must be made to use it. In addition, the WINDOW FUNCTION ORDER BY must use this same collation. |
| Comments |
| Comment by Roman [ 2019-10-17 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Plz note that native CS sorting used with subqueries also so this bug also affects all prior versions including 1.2. | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Andrew Hutchings (Inactive) [ 2019-10-18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Merged into 1.2.6 | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by David Hall (Inactive) [ 2019-11-11 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
This is on hold until we decide what path to use for collation support – ICU or import the MariaDB code. | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Roman [ 2019-11-11 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
MDB team is ready to make a library out of their encoding subsystem. | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by David Hall (Inactive) [ 2020-04-23 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
This breaks working_tpch1/qa_fe_cnxFunctions/unix_timestamp.sql. The ref file has been modified for now. It needs to be fixed when this fix is in. | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Gagan Goel (Inactive) [ 2020-04-23 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
This also breaks working_tpch1_compareLogOnly/distinctAggregationAndGroupBy/distinctorders.sql. Update this ref file when the fix is in, it is currently temporarily changed to get the test to pass. | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Roman [ 2020-06-11 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
David.HallCould you suggest the methods to test the supported features subset for our QA? | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by David Hall (Inactive) [ 2020-06-15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
QA: This affects ORDER BY with char types. ORDER BY will now use the collation of the ORDER BY column(s). All character setsd default to case insensitive. For each case insensitive collation, there is generally a case sensitive collation available. For DLee, you may want to try Mandarin: COLLATE "gb2312_chinese_ci", "gbk_chinese_ci" or "big5_chinese_ci". For case sensitive you could try "gb2312_bin", "gbk_bin" or "big5_bin". This change also affects many of the functions that work with strings. Here, capital and lower case values should compare as if they were equal (for case insensitive collations). These were added Monday June 15, so you may want to wait for the patch to be merged. These were in the Friday June 12 build (1) Code was changed, so it needs to be tested, but unless you know of a language that doesn't us 1,2,3 etc., functionality hasn't changed Joins are not yet case insensitive and still work with a binary compare. See Order and compare functionality is controlled by the prevailing character set/collation of each column involved. If comparing columns with different collations, we are not as smart as Server at choosing the best collation to use, so there may be some differences in this very rare edge case. | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Lee (Inactive) [ 2020-06-23 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Build tested: 1.5.2-1 (community edition, b33685) Did the first round of testing and the following is the preliminary results DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci; Tested: LDI without cpimport self join substring insert select english, chinese, insert(chinese,1,2, '哈羅') from chinesecol;
------------- innodb returns:
------------- least issues: 1. LDI with cpimport load UTF8mb4 characters as NULL
--------
-------- | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by David Hall (Inactive) [ 2020-06-24 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
1. LDI is being looked at. It is a separate issue from this JIRA. | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Lee (Inactive) [ 2020-06-24 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The requested feature has been implemented. Known and new issues are being tracked by respective individual tickets. Closing this ticket. |