[MCOL-4534] MariaDB collation library: improve comparison performance in 8bit nopad_bin collations Created: 2021-02-05 Updated: 2023-02-08 |
|
| Status: | Stalled |
| Project: | MariaDB ColumnStore |
| Component/s: | MariaDB Server |
| Affects Version/s: | 5.5.1 |
| Fix Version/s: | 23.10 |
| Type: | Bug | Priority: | Major |
| Reporter: | Alexander Barkov | Assignee: | Unassigned |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||
| Sprint: | 2021-2, 2021-3, 2021-8, 2021-9, 2021-10, 2021-11, 2021-12 | ||||||||||||||||||||||||
| Description |
2021-03-02 UpdateBefore starting working in this issue, we probably need to implement MCOL-4568 first. The values passed to the new function in MariaDB collation library must be padded with spaces rather than zero bytes. 2021-02-15 UpdateNote, the technique described below is only valid for xxx_nopad_bin collations (having NO PAD attribute). Actual descriptionAfter adding collation support into ColumnStore, performance of the comparison operator degraded for short CHAR columns, even for _bin collations.
The attached patch reconstructs old ColumnStore behavior inside MariaDB collation library. It really makes comparison work faster for CHAR(4). Benchmarking a 10.5 RelWithDebInfo build before and after the patch applied: Comparison performance for INT and latin1_swedish_ci (for reference)
Comparison performance for BINARY, CHAR(4) latin1_bin, CHAR(4) latin1_nopad_binFor `clean 10.5` versus `10.5 with patch applied`
Notice, comparison of CHAR(4) latin1_bin (versus comparison of INT data) is:
The patch gives a good performance improvement, around 25%. Let's add new methods into my_collation_handler_st, with the following tentative API:
(and correspoding wrapper methods in CHARSET_INFO). So ColumnStore will be able to use these optimized comparison functions for short CHAR and VARCHAR data. ColumnStore stores short CHAR values in memory in numeric format, either in 4 bytes or in 8 bytes, depending on the width. So it will use:
|