[MDEV-11255] LDML: allow defining 2-level UCA collations Created: 2016-11-08  Updated: 2016-11-08  Resolved: 2016-11-08

Status: Closed
Project: MariaDB Server
Component/s: Character Sets
Fix Version/s: 10.2.3

Type: Task Priority: Major
Reporter: Alexander Barkov Assignee: Alexander Barkov
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Relates
relates to MDEV-10132 utf8_thai_520_w2 collation Closed
relates to MDEV-11199 Czech Unicode collations are wrong Closed

 Description   

We recently added infrastructure for 2-level built-in collations. See MDEV-10132.

This task will allow defining 2-level collations in Index.xml.

Something like this should be supported:

    <collation name="utf8_czech_520_w2" id="370" version="5.2.0">
      <settings strength="2"/>
      <rules>
        <reset>C</reset><p>\u010D</p><t>\u010C</t>
        <reset>H</reset><p>ch</p><t>Ch</t><t>CH</t>
        <reset>R</reset><p>\u0159</p><t>\u0158</t>
        <reset>S</reset><p>\u0161</p><t>\u0160</t>
        <reset>Z</reset><p>\u017E</p><t>\u017D</t>
      </rules>
    </collation>

Notice the new "strength" attribute in the "settings" tag.
Note, we currently have secondary weights for Unicode-5.2.0 only.
So the collations that specify strength="2" should also specify version="5.2.0".

Note, the flag="nopad" should also be understood, to define nopad 2-level collations.

    <collation name="utf8_czech_520_nopad_w2" id="371" version="5.2.0" flag="nopad">
      <settings strength="2"/>
      <rules>
        <reset>C</reset><p>\u010D</p><t>\u010C</t>
        <reset>H</reset><p>ch</p><t>Ch</t><t>CH</t>
        <reset>R</reset><p>\u0159</p><t>\u0158</t>
        <reset>S</reset><p>\u0161</p><t>\u0160</t>
        <reset>Z</reset><p>\u017E</p><t>\u017D</t>
      </rules>
    </collation>


Generated at Thu Feb 08 07:48:31 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.