Details
-
Task
-
Status: Open (View Workflow)
-
Minor
-
Resolution: Unresolved
-
None
Description
UCA collations compare:
- all broken mbminlen units as having weight 0xFFFF
- all non-BMP characters that have no weight in the weight table for the current collation as having weight 0xFFFD
This is different from the other collations, which take into account byte values when comparing broken byte sequences. For example, strnncollsp(0xFE, 0xFF) for utf8_general_ci returns -1, because the broken byte value (0xFE) in the left operand is smaller than the broken byte value (0xFF) in the right operand.
UCA collations, for consistency purposes, should perhaps be fixed to compare different broken bytes as non-equal, like the other collations do.
This task was originally created as a subtask for MDEV-8036, for all UCA based collations in all Unicode character sets, together with a set of other subtasks of MDEV-8036, which is needed for MDEV-8433. However, the UCA collations already seem to suite the MDEV-8433 needs and MDEV-8433 should probably work without any changes in the UCA collations. For search purposes we can have a broken string only in one operand (the string literal), while the other operand (the field) contains well formed strings. So the string comparison function should normally never compare two broken strings. So MDEV-8420 is now removed from MDEV-8036 dependencies.