[MDEV-8420] UCA: compare broken bytes as "greater than any non-broken character" Created: 2015-07-03  Updated: 2021-05-11

Status: Open
Project: MariaDB Server
Component/s: Character Sets
Fix Version/s: None

Type: Task Priority: Minor
Reporter: Alexander Barkov Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: performance

Issue Links:
Relates
relates to MDEV-8036 Fix all collations to compare broken ... Closed
relates to MDEV-8433 Make field<'broken-string' use indexes In Review

 Description   

UCA collations compare:

  • all broken mbminlen units as having weight 0xFFFF
  • all non-BMP characters that have no weight in the weight table for the current collation as having weight 0xFFFD

This is different from the other collations, which take into account byte values when comparing broken byte sequences. For example, strnncollsp(0xFE, 0xFF) for utf8_general_ci returns -1, because the broken byte value (0xFE) in the left operand is smaller than the broken byte value (0xFF) in the right operand.

UCA collations, for consistency purposes, should perhaps be fixed to compare different broken bytes as non-equal, like the other collations do.

This task was originally created as a subtask for MDEV-8036, for all UCA based collations in all Unicode character sets, together with a set of other subtasks of MDEV-8036, which is needed for MDEV-8433. However, the UCA collations already seem to suite the MDEV-8433 needs and MDEV-8433 should probably work without any changes in the UCA collations. For search purposes we can have a broken string only in one operand (the string literal), while the other operand (the field) contains well formed strings. So the string comparison function should normally never compare two broken strings. So MDEV-8420 is now removed from MDEV-8036 dependencies.


Generated at Thu Feb 08 07:27:01 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.