[MCOL-2105] Improve disk join behavior for unfortunate data distributions Created: 2019-01-24 Updated: 2023-07-02 Resolved: 2023-07-02 |
|
| Status: | Closed |
| Project: | MariaDB ColumnStore |
| Component/s: | N/A |
| Affects Version/s: | 1.0.15, 1.1.6, 1.2.2 |
| Fix Version/s: | Icebox |
| Type: | Task | Priority: | Major |
| Reporter: | Patrick LeBlanc (Inactive) | Assignee: | Unassigned |
| Resolution: | Won't Do | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Description |
|
We've recently run into a couple cases where the disk join was triggered, but the data distribution was such that there were so many rows with a single value in the join column that it overflowed a partition in the loading stage. We currently reject the query when that happens, but it should be easy to handle as a special case. For reference, the disk join algorithm is called 'GRACE', you can google 'grace join algorithm' to get more understanding of what it's doing. In the code, to find the place where it detects the data distribution problem, grep for 'ERR_DBJ_DATA_DISTRIBUTION' in joinpartition.cpp. My initial thoughts. It'll take some understanding of the partitioning structures & behavior; whoever gets this one, feel free to ask me. |
| Comments |
| Comment by Patrick LeBlanc (Inactive) [ 2019-01-24 ] |
|
Added the support case |
| Comment by Patrick LeBlanc (Inactive) [ 2019-02-15 ] |
|
The workaround suggested in |
| Comment by Todd Stoffel (Inactive) [ 2023-07-02 ] |
|
The "create date" on this ticket is pre-convergence with MariaDB server. If the issue still exists in a modern version of the engine/plugin please submit a new ticket. |