[MDEV-37849] UCA: Change "skip equal simple prefix" to "compare simple prefix" - Jira

XML

Word

Printable

Details

Type: Task
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Fix Version/s: 12.1.2, 12.2.1, 11.8.6
Component/s: Character Sets
Labels:
None

Description

The UCA implementation uses optimization for ASCII-compatible character sets (utf8mb3, utf8mb4) implemented in the function my_uca_level_booster_equal_prefix_length(). The idea is that if two strings have equal (according to the collation) simple prefix, it can be skipped quickly before the comparison enters a heavier slower loop.

This optimization uses the member MY_UCA_LEVEL_BOOSTER::weight_strings_2bytes_to_1_or_2_weights.

"Simple" means that prefixes must have the following data:

The data can be traversed two bytes at a time, i.e.:
- Every two bytes are either two ASCII characters or one 2-byte character
- There are no two-byte characters at an odd octet position
- There are no ASCII contraction heads at an odd octet position
Each two bytes producing one or two weights

Skipping the equal prefix optimizes well when we compare equal strings. However it's not good if we compare different strings massively (e.g. during sorting of an array of different strings).

Let's change the "skip equal simple prefix" approach to "compare simple prefix". The member MY_UCA_LEVEL_BOOSTER::weight_strings_2bytes_to_1_or_2_weights has almost everything for this.

After changes are done the implementer should make sure that the new version works really faster, using some benchmarks.

Attachments

Activity

People

Assignee:: Alexander Barkov

Reporter:: Alexander Barkov

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 2025-10-13 12:46

Updated:: 2025-11-11 08:02

Resolved:: 2025-11-11 06:24

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.