[MDEV-23400] Add UCA case sensitive accent sensitive collations for Unicode character sets Created: 2020-08-04 Updated: 2022-08-18 Resolved: 2022-08-18 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Character Sets |
| Fix Version/s: | N/A |
| Type: | Task | Priority: | Major |
| Reporter: | Alexander Barkov | Assignee: | Alexander Barkov |
| Resolution: | Duplicate | Votes: | 1 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||||||||||
| Description |
|
As of the version 10.5.5, MariaDB support the following collations for Unicode character sets (using utf8 as an example):
This script demonstrates the order provided by utf8_bin:
So MariaDB has collations with a good linguistic order for these comparison styles:
But it does not have collations with a good linguistic order for the case sensitive and accent sensitive comparison style. Let's implement good linguistic case sensitive and accent sensitive collations for Unicode character sets. Tentative names: xxx_unicode_520_w3 (where xxx is utf8, utf8mb4, ucs2, utf16, utf32). The new collations will use 3 levels of Unicode weights. It wil provide much better sorting order than utf8_bin. Small letters will appear before capital letters. Using the same data, the new collation will return records in the following order:
Open questions:
|
| Comments |
| Comment by Alexander Barkov [ 2022-08-18 ] |
|
Case and accent sensitive collations were added into MariaDB-10.10 under terms of |