[MCOL-4727] Queries which use GROUP BY with ORDER BY HEX return more rows than the rules of character identity expects in some collations. Created: 2021-05-18 Updated: 2022-02-18 |
|
| Status: | Open |
| Project: | MariaDB ColumnStore |
| Component/s: | PrimProc |
| Affects Version/s: | 5.5.1 |
| Fix Version/s: | Icebox |
| Type: | Bug | Priority: | Major |
| Reporter: | Alexander Barkov | Assignee: | Unassigned |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Description |
|
I run this script:
Looks good so far. In the given collation:
If I do not use HEX in ORDER BY - everything is fine also.
Things go wrong if do ORDER BY HEX(c1):
Notice, the numbers of records in the result sets have changed. These two queries are expected to return two rows each. |
| Comments |
| Comment by David Hall (Inactive) [ 2021-05-21 ] | |||||||
|
The problem occurs when you give an ORDER BY that conflicts with the aggregate. It depends on how the engine interprets it. Should it run the query, get a result set, then look at the function in the ORDER BY? This appears to be what InnoDB does.
So in the first case, we get two rows that HEX is applied to. The problem is exacerbated by the fact that these functions are run in PrimProc. Order By is run in ExeMgr. This is why the function is sent as part of the query to PrimProc. We may need to re-engineer a bunch of stuff such that functions that are part of Order By are not sent to PrimProc, but rather just ask for the value of the columns in the function. Then run the function in ExeMgr as part of the Order By By process. Some additional thought needs to be put into this, since ORDER BY can contain multiple functions, nested functions, and functions on columns that are not part of the query's result set. How do you handle
Other stange combinations can be found, I'm sure. Each will have to be tested in InnoDB to see what behavior is needed. | |||||||
| Comment by David Hall (Inactive) [ 2021-05-21 ] | |||||||
|
Answer to the last example (InnoDB)
Even though, in this example, c2 had six unique values, the query still managed to get only two rows. It's obvious it chose the value for the first row in each group. | |||||||
| Comment by Alexander Barkov [ 2021-05-21 ] | |||||||
|
A workaround is to use this query with ColumnStore:
|