Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-27210

New naming convention for UCA collations

Details

    • New Feature
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • None
    • Character Sets
    • None

    Description

      As of version 10.7, MariaDB understands the following flags in collation names:

      • _ci for case insensitive collations
      • _cs for case sensitive collations
      • _nopad_ for NO PAD collations

      We eventually want to support all customizations (collation preferences) as described in:
      https://unicode.org/reports/tr10/#Customization

      This new naming convention will encode more flags inside collation names.

      This new naming conversion will be applied to newly added UCA based collations. Old collation names will stay untouched.

      Collation name structure

      The whole collation name structure will consist of the following parts delimited by underscores:

      • Character set name
      • Unicode collation algorithm version: letters "uca" followed by two digit major version, one digit minor version, one digit patch version (e.g. uca1400 for Unicode-14.0.0).
      • Optional tailoring name (usually a language name). This part will be omitted if the collation is based on a UCA collation without any language specific rules.
      • Flags, as described below

      PAD flags

      • _pad - NO PAD (default)
      • _nopad - PAD SPACE

      Variable Weighting (punctuation) flags

      • _vn — "Variable non-ignorable" - handles variable characters on Level 1 (default)
      • _vs — "Variable shifted" - shifts punctuation from Level 1 to Level 4 and enables Level 4.
      • _vb — "Variable blanked" - variable collation elements are reset so that all weights (except for the identical level) are zero.

      Accent sensitivity flags

      • _ai — Accent insensitive - disables Level.
      • _as - Accent sensitive - enables Level 2.

      Case sensitivity flags

      • _ci - Case insensitive - disables Level 3.
      • _cs — Case sensitive - enables Level 3. Case difference is handled according to tertiary weight, together with fullwidth, circled, square forms. See https://unicode.org/reports/tr10/#Tertiary_Weight_Table for details.
      • _co - Case only - enables a dedicated Level 2.5 only consisting of the case characteristics (upper vs lower), without other tertiary weight forms.

      Identity sensitivity flags

      • _ii - identity insensitive - disables Level 5 (default)
      • _is - identity sensitivy - enables Level 5 (full binary equality)

      Canonical collation names

      The collation name parser will understand flags in the described above order, e.g.

      • _as_ci - correct
      • _ci_as - incorrect

      The canonical names (i.e. as displayed in SHOW CREATE statements or I_S queries) will also print flags in the order described above.

      The accent and case sensitivity flags will always be printed in canonical names, even with default values.

      Other flags will be printed only if they have a non-default value.

      Examples:

      • utf8mb4_uca1400_as_ci - a generic Unicode-14.0.0 collation, accent sensitive, case insensitive.
      • utf8mb4_uca1400_czech_nopad_vs_ai_cs_is - a Czech Unicode-14.0.0 collation, with punctuation shifted from Level 1 to Level 4, accent insensitive, case sensitive, identity sensitive.

      Disclaimer

      We won't implement all flags mentioned here in a single patch. They will be added in steps under terms of different tasks.

      Variable weighting and Identity sensitivity flags will most likely be implemented later than other flags.

      Attachments

        Activity

          bar Alexander Barkov created issue -
          bar Alexander Barkov made changes -
          Field Original Value New Value
          Description Currently MariaDB has understands the following flags in collation names:
          - _ci for case insensitive collations
          - _cs for case sensitive collations
          - _nopad_ for NO PAD collations

          We eventually want to support all customizations (collation preferences) as described in:
          https://unicode.org/reports/tr10/#Customization

          This new naming convension will encode more flags inside collation names.

          The whole collation name structure will consist of the following parts delimited by underscores:

          - Character set name
          - Optional CLDR language name (based on two-letter ISO 639-1 language codes). This part will be omitted if the collation is based on a UCA collation without any language specific rules.
          - Unicode version: two digit major version, one digit minor version, one digit patch version (e.g. 0900 for Unicode-9.0.0 or 1400 for Unicode-14.0.0).
          - Flags, as described below


          PAD flags
          - _pad - NO PAD (default)
          - _nopad - PAD SPACE


          Variable Weighting (punctuation) flags
          - _vn — "Variable non-ignorable" - handles variable characters on Level 1 (default)
          - _vs — "Variable shifted" - shifts punctuation from Level 1 to Level 4 and enables Level 4.
          - _vb — "Variable blanked" - shifts punctuation from Level 1 to Level 4 but does not enable Level 4.

          Accent sensitivity flags
          - _ai — Accent insensitive - disables Level 2 (default).
          - _as - Accent sensitive - enables Level 2.

          Case sensitivity flags
          - _ci - Case insensitive - disables Level 3 (default).
          - _cs — Case sensitive - enables Level 3. Ccase difference is handled according to tertiary weight, together with fullwidth, circled, square forms. See https://unicode.org/reports/tr10/#Tertiary_Weight_Table for details.
          - _co - Case only - enables Level 2.5 only consisting of the case characteristics (upper vs lower).

          Identity sensitivity flags
          - _ii - identity insensitive - disables Level 5 (default)
          - _is - identity sensitivy - enables Level 5 (full binary equality)


          The accent and case sensitivity flags will always be printed in "canonical" displayed collation names, e.g. in {{SHOW CREATE TABLE}} output is I_S queries.

          Other flags will be printed only if they have a non-default value.

          Examples:
          - utf8mb4_1400_as_ci - a generic Unicode-14.0.0 collation, accent sensitive, case insensitive.
          - utf8mb4_cs_1400_nopad_vs_ai_cs_is - a Czech Unicode-14.0.0 collation, with punctuation shifted from Level 1 to Level 4, accent insensitive, case sensitive, identity sensitive.

          Note, the language part will intentionally reside between the character set and the version, to avoid confusing languages with flags. E.g. in the last example, the first "cs" stands for Czech language, the second "cs" stands for case sensitivity.
          As of version 10.7, MariaDB has understands the following flags in collation names:
          - _ci for case insensitive collations
          - _cs for case sensitive collations
          - _nopad_ for NO PAD collations

          We eventually want to support all customizations (collation preferences) as described in:
          https://unicode.org/reports/tr10/#Customization

          This new naming convension will encode more flags inside collation names.

          The whole collation name structure will consist of the following parts delimited by underscores:

          - Character set name
          - Optional CLDR language name (based on two-letter ISO 639-1 language codes). This part will be omitted if the collation is based on a UCA collation without any language specific rules.
          - Unicode version: two digit major version, one digit minor version, one digit patch version (e.g. 0900 for Unicode-9.0.0 or 1400 for Unicode-14.0.0).
          - Flags, as described below


          PAD flags
          - _pad - NO PAD (default)
          - _nopad - PAD SPACE


          Variable Weighting (punctuation) flags
          - _vn — "Variable non-ignorable" - handles variable characters on Level 1 (default)
          - _vs — "Variable shifted" - shifts punctuation from Level 1 to Level 4 and enables Level 4.
          - _vb — "Variable blanked" - shifts punctuation from Level 1 to Level 4 but does not enable Level 4.

          Accent sensitivity flags
          - _ai — Accent insensitive - disables Level 2 (default).
          - _as - Accent sensitive - enables Level 2.

          Case sensitivity flags
          - _ci - Case insensitive - disables Level 3 (default).
          - _cs — Case sensitive - enables Level 3. Ccase difference is handled according to tertiary weight, together with fullwidth, circled, square forms. See https://unicode.org/reports/tr10/#Tertiary_Weight_Table for details.
          - _co - Case only - enables Level 2.5 only consisting of the case characteristics (upper vs lower).

          Identity sensitivity flags
          - _ii - identity insensitive - disables Level 5 (default)
          - _is - identity sensitivy - enables Level 5 (full binary equality)


          The accent and case sensitivity flags will always be printed in "canonical" displayed collation names, e.g. in {{SHOW CREATE TABLE}} output is I_S queries.

          Other flags will be printed only if they have a non-default value.

          Examples:
          - utf8mb4_1400_as_ci - a generic Unicode-14.0.0 collation, accent sensitive, case insensitive.
          - utf8mb4_cs_1400_nopad_vs_ai_cs_is - a Czech Unicode-14.0.0 collation, with punctuation shifted from Level 1 to Level 4, accent insensitive, case sensitive, identity sensitive.

          Note, the language part will intentionally reside between the character set and the version, to avoid confusing languages with flags. E.g. in the last example, the first "cs" stands for Czech language, the second "cs" stands for case sensitivity.
          bar Alexander Barkov made changes -
          Description As of version 10.7, MariaDB has understands the following flags in collation names:
          - _ci for case insensitive collations
          - _cs for case sensitive collations
          - _nopad_ for NO PAD collations

          We eventually want to support all customizations (collation preferences) as described in:
          https://unicode.org/reports/tr10/#Customization

          This new naming convension will encode more flags inside collation names.

          The whole collation name structure will consist of the following parts delimited by underscores:

          - Character set name
          - Optional CLDR language name (based on two-letter ISO 639-1 language codes). This part will be omitted if the collation is based on a UCA collation without any language specific rules.
          - Unicode version: two digit major version, one digit minor version, one digit patch version (e.g. 0900 for Unicode-9.0.0 or 1400 for Unicode-14.0.0).
          - Flags, as described below


          PAD flags
          - _pad - NO PAD (default)
          - _nopad - PAD SPACE


          Variable Weighting (punctuation) flags
          - _vn — "Variable non-ignorable" - handles variable characters on Level 1 (default)
          - _vs — "Variable shifted" - shifts punctuation from Level 1 to Level 4 and enables Level 4.
          - _vb — "Variable blanked" - shifts punctuation from Level 1 to Level 4 but does not enable Level 4.

          Accent sensitivity flags
          - _ai — Accent insensitive - disables Level 2 (default).
          - _as - Accent sensitive - enables Level 2.

          Case sensitivity flags
          - _ci - Case insensitive - disables Level 3 (default).
          - _cs — Case sensitive - enables Level 3. Ccase difference is handled according to tertiary weight, together with fullwidth, circled, square forms. See https://unicode.org/reports/tr10/#Tertiary_Weight_Table for details.
          - _co - Case only - enables Level 2.5 only consisting of the case characteristics (upper vs lower).

          Identity sensitivity flags
          - _ii - identity insensitive - disables Level 5 (default)
          - _is - identity sensitivy - enables Level 5 (full binary equality)


          The accent and case sensitivity flags will always be printed in "canonical" displayed collation names, e.g. in {{SHOW CREATE TABLE}} output is I_S queries.

          Other flags will be printed only if they have a non-default value.

          Examples:
          - utf8mb4_1400_as_ci - a generic Unicode-14.0.0 collation, accent sensitive, case insensitive.
          - utf8mb4_cs_1400_nopad_vs_ai_cs_is - a Czech Unicode-14.0.0 collation, with punctuation shifted from Level 1 to Level 4, accent insensitive, case sensitive, identity sensitive.

          Note, the language part will intentionally reside between the character set and the version, to avoid confusing languages with flags. E.g. in the last example, the first "cs" stands for Czech language, the second "cs" stands for case sensitivity.
          As of version 10.7, MariaDB understands the following flags in collation names:
          - _ci for case insensitive collations
          - _cs for case sensitive collations
          - _nopad_ for NO PAD collations

          We eventually want to support all customizations (collation preferences) as described in:
          https://unicode.org/reports/tr10/#Customization

          This new naming convension will encode more flags inside collation names.

          The whole collation name structure will consist of the following parts delimited by underscores:

          - Character set name
          - Optional CLDR language name (based on two-letter ISO 639-1 language codes). This part will be omitted if the collation is based on a UCA collation without any language specific rules.
          - Unicode version: two digit major version, one digit minor version, one digit patch version (e.g. 0900 for Unicode-9.0.0 or 1400 for Unicode-14.0.0).
          - Flags, as described below


          PAD flags
          - _pad - NO PAD (default)
          - _nopad - PAD SPACE


          Variable Weighting (punctuation) flags
          - _vn — "Variable non-ignorable" - handles variable characters on Level 1 (default)
          - _vs — "Variable shifted" - shifts punctuation from Level 1 to Level 4 and enables Level 4.
          - _vb — "Variable blanked" - shifts punctuation from Level 1 to Level 4 but does not enable Level 4.

          Accent sensitivity flags
          - _ai — Accent insensitive - disables Level 2 (default).
          - _as - Accent sensitive - enables Level 2.

          Case sensitivity flags
          - _ci - Case insensitive - disables Level 3 (default).
          - _cs — Case sensitive - enables Level 3. Ccase difference is handled according to tertiary weight, together with fullwidth, circled, square forms. See https://unicode.org/reports/tr10/#Tertiary_Weight_Table for details.
          - _co - Case only - enables Level 2.5 only consisting of the case characteristics (upper vs lower).

          Identity sensitivity flags
          - _ii - identity insensitive - disables Level 5 (default)
          - _is - identity sensitivy - enables Level 5 (full binary equality)


          The accent and case sensitivity flags will always be printed in "canonical" displayed collation names, e.g. in {{SHOW CREATE TABLE}} output is I_S queries.

          Other flags will be printed only if they have a non-default value.

          Examples:
          - utf8mb4_1400_as_ci - a generic Unicode-14.0.0 collation, accent sensitive, case insensitive.
          - utf8mb4_cs_1400_nopad_vs_ai_cs_is - a Czech Unicode-14.0.0 collation, with punctuation shifted from Level 1 to Level 4, accent insensitive, case sensitive, identity sensitive.

          Note, the language part will intentionally reside between the character set and the version, to avoid confusing languages with flags. E.g. in the last example, the first "cs" stands for Czech language, the second "cs" stands for case sensitivity.
          bar Alexander Barkov made changes -
          Description As of version 10.7, MariaDB understands the following flags in collation names:
          - _ci for case insensitive collations
          - _cs for case sensitive collations
          - _nopad_ for NO PAD collations

          We eventually want to support all customizations (collation preferences) as described in:
          https://unicode.org/reports/tr10/#Customization

          This new naming convension will encode more flags inside collation names.

          The whole collation name structure will consist of the following parts delimited by underscores:

          - Character set name
          - Optional CLDR language name (based on two-letter ISO 639-1 language codes). This part will be omitted if the collation is based on a UCA collation without any language specific rules.
          - Unicode version: two digit major version, one digit minor version, one digit patch version (e.g. 0900 for Unicode-9.0.0 or 1400 for Unicode-14.0.0).
          - Flags, as described below


          PAD flags
          - _pad - NO PAD (default)
          - _nopad - PAD SPACE


          Variable Weighting (punctuation) flags
          - _vn — "Variable non-ignorable" - handles variable characters on Level 1 (default)
          - _vs — "Variable shifted" - shifts punctuation from Level 1 to Level 4 and enables Level 4.
          - _vb — "Variable blanked" - shifts punctuation from Level 1 to Level 4 but does not enable Level 4.

          Accent sensitivity flags
          - _ai — Accent insensitive - disables Level 2 (default).
          - _as - Accent sensitive - enables Level 2.

          Case sensitivity flags
          - _ci - Case insensitive - disables Level 3 (default).
          - _cs — Case sensitive - enables Level 3. Ccase difference is handled according to tertiary weight, together with fullwidth, circled, square forms. See https://unicode.org/reports/tr10/#Tertiary_Weight_Table for details.
          - _co - Case only - enables Level 2.5 only consisting of the case characteristics (upper vs lower).

          Identity sensitivity flags
          - _ii - identity insensitive - disables Level 5 (default)
          - _is - identity sensitivy - enables Level 5 (full binary equality)


          The accent and case sensitivity flags will always be printed in "canonical" displayed collation names, e.g. in {{SHOW CREATE TABLE}} output is I_S queries.

          Other flags will be printed only if they have a non-default value.

          Examples:
          - utf8mb4_1400_as_ci - a generic Unicode-14.0.0 collation, accent sensitive, case insensitive.
          - utf8mb4_cs_1400_nopad_vs_ai_cs_is - a Czech Unicode-14.0.0 collation, with punctuation shifted from Level 1 to Level 4, accent insensitive, case sensitive, identity sensitive.

          Note, the language part will intentionally reside between the character set and the version, to avoid confusing languages with flags. E.g. in the last example, the first "cs" stands for Czech language, the second "cs" stands for case sensitivity.
          As of version 10.7, MariaDB understands the following flags in collation names:
          - _ci for case insensitive collations
          - _cs for case sensitive collations
          - {{_nopad_}} for NO PAD collations

          We eventually want to support all customizations (collation preferences) as described in:
          https://unicode.org/reports/tr10/#Customization

          This new naming convension will encode more flags inside collation names.

          The whole collation name structure will consist of the following parts delimited by underscores:

          - Character set name
          - Optional CLDR language name (based on two-letter ISO 639-1 language codes). This part will be omitted if the collation is based on a UCA collation without any language specific rules.
          - Unicode version: two digit major version, one digit minor version, one digit patch version (e.g. 0900 for Unicode-9.0.0 or 1400 for Unicode-14.0.0).
          - Flags, as described below


          PAD flags
          - _pad - NO PAD (default)
          - _nopad - PAD SPACE


          Variable Weighting (punctuation) flags
          - _vn — "Variable non-ignorable" - handles variable characters on Level 1 (default)
          - _vs — "Variable shifted" - shifts punctuation from Level 1 to Level 4 and enables Level 4.
          - _vb — "Variable blanked" - shifts punctuation from Level 1 to Level 4 but does not enable Level 4.

          Accent sensitivity flags
          - _ai — Accent insensitive - disables Level 2 (default).
          - _as - Accent sensitive - enables Level 2.

          Case sensitivity flags
          - _ci - Case insensitive - disables Level 3 (default).
          - _cs — Case sensitive - enables Level 3. Ccase difference is handled according to tertiary weight, together with fullwidth, circled, square forms. See https://unicode.org/reports/tr10/#Tertiary_Weight_Table for details.
          - _co - Case only - enables Level 2.5 only consisting of the case characteristics (upper vs lower).

          Identity sensitivity flags
          - _ii - identity insensitive - disables Level 5 (default)
          - _is - identity sensitivy - enables Level 5 (full binary equality)


          The accent and case sensitivity flags will always be printed in "canonical" displayed collation names, e.g. in {{SHOW CREATE TABLE}} output is I_S queries.

          Other flags will be printed only if they have a non-default value.

          Examples:
          - utf8mb4_1400_as_ci - a generic Unicode-14.0.0 collation, accent sensitive, case insensitive.
          - utf8mb4_cs_1400_nopad_vs_ai_cs_is - a Czech Unicode-14.0.0 collation, with punctuation shifted from Level 1 to Level 4, accent insensitive, case sensitive, identity sensitive.

          Note, the language part will intentionally reside between the character set and the version, to avoid confusing languages with flags. E.g. in the last example, the first "cs" stands for Czech language, the second "cs" stands for case sensitivity.
          bar Alexander Barkov made changes -
          Description As of version 10.7, MariaDB understands the following flags in collation names:
          - _ci for case insensitive collations
          - _cs for case sensitive collations
          - {{_nopad_}} for NO PAD collations

          We eventually want to support all customizations (collation preferences) as described in:
          https://unicode.org/reports/tr10/#Customization

          This new naming convension will encode more flags inside collation names.

          The whole collation name structure will consist of the following parts delimited by underscores:

          - Character set name
          - Optional CLDR language name (based on two-letter ISO 639-1 language codes). This part will be omitted if the collation is based on a UCA collation without any language specific rules.
          - Unicode version: two digit major version, one digit minor version, one digit patch version (e.g. 0900 for Unicode-9.0.0 or 1400 for Unicode-14.0.0).
          - Flags, as described below


          PAD flags
          - _pad - NO PAD (default)
          - _nopad - PAD SPACE


          Variable Weighting (punctuation) flags
          - _vn — "Variable non-ignorable" - handles variable characters on Level 1 (default)
          - _vs — "Variable shifted" - shifts punctuation from Level 1 to Level 4 and enables Level 4.
          - _vb — "Variable blanked" - shifts punctuation from Level 1 to Level 4 but does not enable Level 4.

          Accent sensitivity flags
          - _ai — Accent insensitive - disables Level 2 (default).
          - _as - Accent sensitive - enables Level 2.

          Case sensitivity flags
          - _ci - Case insensitive - disables Level 3 (default).
          - _cs — Case sensitive - enables Level 3. Ccase difference is handled according to tertiary weight, together with fullwidth, circled, square forms. See https://unicode.org/reports/tr10/#Tertiary_Weight_Table for details.
          - _co - Case only - enables Level 2.5 only consisting of the case characteristics (upper vs lower).

          Identity sensitivity flags
          - _ii - identity insensitive - disables Level 5 (default)
          - _is - identity sensitivy - enables Level 5 (full binary equality)


          The accent and case sensitivity flags will always be printed in "canonical" displayed collation names, e.g. in {{SHOW CREATE TABLE}} output is I_S queries.

          Other flags will be printed only if they have a non-default value.

          Examples:
          - utf8mb4_1400_as_ci - a generic Unicode-14.0.0 collation, accent sensitive, case insensitive.
          - utf8mb4_cs_1400_nopad_vs_ai_cs_is - a Czech Unicode-14.0.0 collation, with punctuation shifted from Level 1 to Level 4, accent insensitive, case sensitive, identity sensitive.

          Note, the language part will intentionally reside between the character set and the version, to avoid confusing languages with flags. E.g. in the last example, the first "cs" stands for Czech language, the second "cs" stands for case sensitivity.
          As of version 10.7, MariaDB understands the following flags in collation names:
          - _ci for case insensitive collations
          - _cs for case sensitive collations
          - {{_nopad_}} for NO PAD collations

          We eventually want to support all customizations (collation preferences) as described in:
          https://unicode.org/reports/tr10/#Customization

          This new naming convension will encode more flags inside collation names.

          The whole collation name structure will consist of the following parts delimited by underscores:

          - Character set name
          - Optional CLDR language name (based on two-letter ISO 639-1 language codes). This part will be omitted if the collation is based on a UCA collation without any language specific rules.
          - Unicode version: two digit major version, one digit minor version, one digit patch version (e.g. 0900 for Unicode-9.0.0 or 1400 for Unicode-14.0.0).
          - Flags, as described below


          PAD flags
          - _pad - NO PAD (default)
          - _nopad - PAD SPACE


          Variable Weighting (punctuation) flags
          - _vn — "Variable non-ignorable" - handles variable characters on Level 1 (default)
          - _vs — "Variable shifted" - shifts punctuation from Level 1 to Level 4 and enables Level 4.
          - _vb — "Variable blanked" - shifts punctuation from Level 1 to Level 4 but does not enable Level 4. (TODO: let's double check what variable really means in UCA)

          Accent sensitivity flags
          - _ai — Accent insensitive - disables Level 2 (default).
          - _as - Accent sensitive - enables Level 2.

          Case sensitivity flags
          - _ci - Case insensitive - disables Level 3 (default).
          - _cs — Case sensitive - enables Level 3. Ccase difference is handled according to tertiary weight, together with fullwidth, circled, square forms. See https://unicode.org/reports/tr10/#Tertiary_Weight_Table for details.
          - _co - Case only - enables Level 2.5 only consisting of the case characteristics (upper vs lower).

          Identity sensitivity flags
          - _ii - identity insensitive - disables Level 5 (default)
          - _is - identity sensitivy - enables Level 5 (full binary equality)


          The accent and case sensitivity flags will always be printed in "canonical" displayed collation names, e.g. in {{SHOW CREATE TABLE}} output is I_S queries.

          Other flags will be printed only if they have a non-default value.

          Examples:
          - utf8mb4_1400_as_ci - a generic Unicode-14.0.0 collation, accent sensitive, case insensitive.
          - utf8mb4_cs_1400_nopad_vs_ai_cs_is - a Czech Unicode-14.0.0 collation, with punctuation shifted from Level 1 to Level 4, accent insensitive, case sensitive, identity sensitive.

          Note, the language part will intentionally reside between the character set and the version, to avoid confusing languages with flags. E.g. in the last example, the first "cs" stands for Czech language, the second "cs" stands for case sensitivity.
          bar Alexander Barkov made changes -
          Description As of version 10.7, MariaDB understands the following flags in collation names:
          - _ci for case insensitive collations
          - _cs for case sensitive collations
          - {{_nopad_}} for NO PAD collations

          We eventually want to support all customizations (collation preferences) as described in:
          https://unicode.org/reports/tr10/#Customization

          This new naming convension will encode more flags inside collation names.

          The whole collation name structure will consist of the following parts delimited by underscores:

          - Character set name
          - Optional CLDR language name (based on two-letter ISO 639-1 language codes). This part will be omitted if the collation is based on a UCA collation without any language specific rules.
          - Unicode version: two digit major version, one digit minor version, one digit patch version (e.g. 0900 for Unicode-9.0.0 or 1400 for Unicode-14.0.0).
          - Flags, as described below


          PAD flags
          - _pad - NO PAD (default)
          - _nopad - PAD SPACE


          Variable Weighting (punctuation) flags
          - _vn — "Variable non-ignorable" - handles variable characters on Level 1 (default)
          - _vs — "Variable shifted" - shifts punctuation from Level 1 to Level 4 and enables Level 4.
          - _vb — "Variable blanked" - shifts punctuation from Level 1 to Level 4 but does not enable Level 4. (TODO: let's double check what variable really means in UCA)

          Accent sensitivity flags
          - _ai — Accent insensitive - disables Level 2 (default).
          - _as - Accent sensitive - enables Level 2.

          Case sensitivity flags
          - _ci - Case insensitive - disables Level 3 (default).
          - _cs — Case sensitive - enables Level 3. Ccase difference is handled according to tertiary weight, together with fullwidth, circled, square forms. See https://unicode.org/reports/tr10/#Tertiary_Weight_Table for details.
          - _co - Case only - enables Level 2.5 only consisting of the case characteristics (upper vs lower).

          Identity sensitivity flags
          - _ii - identity insensitive - disables Level 5 (default)
          - _is - identity sensitivy - enables Level 5 (full binary equality)


          The accent and case sensitivity flags will always be printed in "canonical" displayed collation names, e.g. in {{SHOW CREATE TABLE}} output is I_S queries.

          Other flags will be printed only if they have a non-default value.

          Examples:
          - utf8mb4_1400_as_ci - a generic Unicode-14.0.0 collation, accent sensitive, case insensitive.
          - utf8mb4_cs_1400_nopad_vs_ai_cs_is - a Czech Unicode-14.0.0 collation, with punctuation shifted from Level 1 to Level 4, accent insensitive, case sensitive, identity sensitive.

          Note, the language part will intentionally reside between the character set and the version, to avoid confusing languages with flags. E.g. in the last example, the first "cs" stands for Czech language, the second "cs" stands for case sensitivity.
          As of version 10.7, MariaDB understands the following flags in collation names:
          - _ci for case insensitive collations
          - _cs for case sensitive collations
          - {{_nopad_}} for NO PAD collations

          We eventually want to support all customizations (collation preferences) as described in:
          https://unicode.org/reports/tr10/#Customization

          This new naming convension will encode more flags inside collation names.

          h2. Collation name structure
          The whole collation name structure will consist of the following parts delimited by underscores:

          - Character set name
          - Optional CLDR language name (based on two-letter ISO 639-1 language codes). This part will be omitted if the collation is based on a UCA collation without any language specific rules.
          - Unicode version: two digit major version, one digit minor version, one digit patch version (e.g. 0900 for Unicode-9.0.0 or 1400 for Unicode-14.0.0).
          - Flags, as described below


          h2. PAD flags
          - _pad - NO PAD (default)
          - _nopad - PAD SPACE


          h2. Variable Weighting (punctuation) flags
          - _vn — "Variable non-ignorable" - handles variable characters on Level 1 (default)
          - _vs — "Variable shifted" - shifts punctuation from Level 1 to Level 4 and enables Level 4.
          - _vb — "Variable blanked" - shifts punctuation from Level 1 to Level 4 but does not enable Level 4. (TODO: let's double check what variable really means in UCA)

          h2. Accent sensitivity flags
          - _ai — Accent insensitive - disables Level 2 (default).
          - _as - Accent sensitive - enables Level 2.

          h2. Case sensitivity flags
          - _ci - Case insensitive - disables Level 3 (default).
          - _cs — Case sensitive - enables Level 3. Ccase difference is handled according to tertiary weight, together with fullwidth, circled, square forms. See https://unicode.org/reports/tr10/#Tertiary_Weight_Table for details.
          - _co - Case only - enables Level 2.5 only consisting of the case characteristics (upper vs lower).

          h2. Identity sensitivity flags
          - _ii - identity insensitive - disables Level 5 (default)
          - _is - identity sensitivy - enables Level 5 (full binary equality)


          The accent and case sensitivity flags will always be printed in "canonical" displayed collation names, e.g. in {{SHOW CREATE TABLE}} output is I_S queries.

          Other flags will be printed only if they have a non-default value.

          h2. Examples:
          - utf8mb4_1400_as_ci - a generic Unicode-14.0.0 collation, accent sensitive, case insensitive.
          - utf8mb4_cs_1400_nopad_vs_ai_cs_is - a Czech Unicode-14.0.0 collation, with punctuation shifted from Level 1 to Level 4, accent insensitive, case sensitive, identity sensitive.

          h2. Notes
          The language part will intentionally reside between the character set and the version, to avoid confusing languages with flags. E.g. in the last example, the first "cs" stands for Czech language, the second "cs" stands for case sensitivity.
          bar Alexander Barkov made changes -
          Description As of version 10.7, MariaDB understands the following flags in collation names:
          - _ci for case insensitive collations
          - _cs for case sensitive collations
          - {{_nopad_}} for NO PAD collations

          We eventually want to support all customizations (collation preferences) as described in:
          https://unicode.org/reports/tr10/#Customization

          This new naming convension will encode more flags inside collation names.

          h2. Collation name structure
          The whole collation name structure will consist of the following parts delimited by underscores:

          - Character set name
          - Optional CLDR language name (based on two-letter ISO 639-1 language codes). This part will be omitted if the collation is based on a UCA collation without any language specific rules.
          - Unicode version: two digit major version, one digit minor version, one digit patch version (e.g. 0900 for Unicode-9.0.0 or 1400 for Unicode-14.0.0).
          - Flags, as described below


          h2. PAD flags
          - _pad - NO PAD (default)
          - _nopad - PAD SPACE


          h2. Variable Weighting (punctuation) flags
          - _vn — "Variable non-ignorable" - handles variable characters on Level 1 (default)
          - _vs — "Variable shifted" - shifts punctuation from Level 1 to Level 4 and enables Level 4.
          - _vb — "Variable blanked" - shifts punctuation from Level 1 to Level 4 but does not enable Level 4. (TODO: let's double check what variable really means in UCA)

          h2. Accent sensitivity flags
          - _ai — Accent insensitive - disables Level 2 (default).
          - _as - Accent sensitive - enables Level 2.

          h2. Case sensitivity flags
          - _ci - Case insensitive - disables Level 3 (default).
          - _cs — Case sensitive - enables Level 3. Ccase difference is handled according to tertiary weight, together with fullwidth, circled, square forms. See https://unicode.org/reports/tr10/#Tertiary_Weight_Table for details.
          - _co - Case only - enables Level 2.5 only consisting of the case characteristics (upper vs lower).

          h2. Identity sensitivity flags
          - _ii - identity insensitive - disables Level 5 (default)
          - _is - identity sensitivy - enables Level 5 (full binary equality)


          The accent and case sensitivity flags will always be printed in "canonical" displayed collation names, e.g. in {{SHOW CREATE TABLE}} output is I_S queries.

          Other flags will be printed only if they have a non-default value.

          h2. Examples:
          - utf8mb4_1400_as_ci - a generic Unicode-14.0.0 collation, accent sensitive, case insensitive.
          - utf8mb4_cs_1400_nopad_vs_ai_cs_is - a Czech Unicode-14.0.0 collation, with punctuation shifted from Level 1 to Level 4, accent insensitive, case sensitive, identity sensitive.

          h2. Notes
          The language part will intentionally reside between the character set and the version, to avoid confusing languages with flags. E.g. in the last example, the first "cs" stands for Czech language, the second "cs" stands for case sensitivity.
          As of version 10.7, MariaDB understands the following flags in collation names:
          - _ci for case insensitive collations
          - _cs for case sensitive collations
          - {{_nopad_}} for NO PAD collations

          We eventually want to support all customizations (collation preferences) as described in:
          https://unicode.org/reports/tr10/#Customization

          This new naming convension will encode more flags inside collation names.

          h2. Collation name structure
          The whole collation name structure will consist of the following parts delimited by underscores:

          - Character set name
          - Optional CLDR language name (based on two-letter ISO 639-1 language codes). This part will be omitted if the collation is based on a UCA collation without any language specific rules.
          - Unicode version: two digit major version, one digit minor version, one digit patch version (e.g. 0900 for Unicode-9.0.0 or 1400 for Unicode-14.0.0).
          - Flags, as described below


          h2. PAD flags
          - _pad - NO PAD (default)
          - _nopad - PAD SPACE


          h2. Variable Weighting (punctuation) flags
          - _vn — "Variable non-ignorable" - handles variable characters on Level 1 (default)
          - _vs — "Variable shifted" - shifts punctuation from Level 1 to Level 4 and enables Level 4.
          - _vb — "Variable blanked" - shifts punctuation from Level 1 to Level 4 but does not enable Level 4. (TODO: let's double check what variable really means in UCA)

          h2. Accent sensitivity flags
          - _ai — Accent insensitive - disables Level 2 (default).
          - _as - Accent sensitive - enables Level 2.

          h2. Case sensitivity flags
          - _ci - Case insensitive - disables Level 3 (default).
          - _cs — Case sensitive - enables Level 3. Ccase difference is handled according to tertiary weight, together with fullwidth, circled, square forms. See https://unicode.org/reports/tr10/#Tertiary_Weight_Table for details.
          - _co - Case only - enables a dedicated Level 2.5 only consisting of the case characteristics (upper vs lower), without other tertiary weight forms.

          h2. Identity sensitivity flags
          - _ii - identity insensitive - disables Level 5 (default)
          - _is - identity sensitivy - enables Level 5 (full binary equality)


          The accent and case sensitivity flags will always be printed in "canonical" displayed collation names, e.g. in {{SHOW CREATE TABLE}} output is I_S queries.

          Other flags will be printed only if they have a non-default value.

          h2. Examples:
          - utf8mb4_1400_as_ci - a generic Unicode-14.0.0 collation, accent sensitive, case insensitive.
          - utf8mb4_cs_1400_nopad_vs_ai_cs_is - a Czech Unicode-14.0.0 collation, with punctuation shifted from Level 1 to Level 4, accent insensitive, case sensitive, identity sensitive.

          h2. Notes
          The language part will intentionally reside between the character set and the version, to avoid confusing languages with flags. E.g. in the last example, the first "cs" stands for Czech language, the second "cs" stands for case sensitivity.
          bar Alexander Barkov made changes -
          Description As of version 10.7, MariaDB understands the following flags in collation names:
          - _ci for case insensitive collations
          - _cs for case sensitive collations
          - {{_nopad_}} for NO PAD collations

          We eventually want to support all customizations (collation preferences) as described in:
          https://unicode.org/reports/tr10/#Customization

          This new naming convension will encode more flags inside collation names.

          h2. Collation name structure
          The whole collation name structure will consist of the following parts delimited by underscores:

          - Character set name
          - Optional CLDR language name (based on two-letter ISO 639-1 language codes). This part will be omitted if the collation is based on a UCA collation without any language specific rules.
          - Unicode version: two digit major version, one digit minor version, one digit patch version (e.g. 0900 for Unicode-9.0.0 or 1400 for Unicode-14.0.0).
          - Flags, as described below


          h2. PAD flags
          - _pad - NO PAD (default)
          - _nopad - PAD SPACE


          h2. Variable Weighting (punctuation) flags
          - _vn — "Variable non-ignorable" - handles variable characters on Level 1 (default)
          - _vs — "Variable shifted" - shifts punctuation from Level 1 to Level 4 and enables Level 4.
          - _vb — "Variable blanked" - shifts punctuation from Level 1 to Level 4 but does not enable Level 4. (TODO: let's double check what variable really means in UCA)

          h2. Accent sensitivity flags
          - _ai — Accent insensitive - disables Level 2 (default).
          - _as - Accent sensitive - enables Level 2.

          h2. Case sensitivity flags
          - _ci - Case insensitive - disables Level 3 (default).
          - _cs — Case sensitive - enables Level 3. Ccase difference is handled according to tertiary weight, together with fullwidth, circled, square forms. See https://unicode.org/reports/tr10/#Tertiary_Weight_Table for details.
          - _co - Case only - enables a dedicated Level 2.5 only consisting of the case characteristics (upper vs lower), without other tertiary weight forms.

          h2. Identity sensitivity flags
          - _ii - identity insensitive - disables Level 5 (default)
          - _is - identity sensitivy - enables Level 5 (full binary equality)


          The accent and case sensitivity flags will always be printed in "canonical" displayed collation names, e.g. in {{SHOW CREATE TABLE}} output is I_S queries.

          Other flags will be printed only if they have a non-default value.

          h2. Examples:
          - utf8mb4_1400_as_ci - a generic Unicode-14.0.0 collation, accent sensitive, case insensitive.
          - utf8mb4_cs_1400_nopad_vs_ai_cs_is - a Czech Unicode-14.0.0 collation, with punctuation shifted from Level 1 to Level 4, accent insensitive, case sensitive, identity sensitive.

          h2. Notes
          The language part will intentionally reside between the character set and the version, to avoid confusing languages with flags. E.g. in the last example, the first "cs" stands for Czech language, the second "cs" stands for case sensitivity.
          As of version 10.7, MariaDB understands the following flags in collation names:
          - _ci for case insensitive collations
          - _cs for case sensitive collations
          - {{_nopad_}} for NO PAD collations

          We eventually want to support all customizations (collation preferences) as described in:
          https://unicode.org/reports/tr10/#Customization

          This new naming convension will encode more flags inside collation names.

          h2. Collation name structure
          The whole collation name structure will consist of the following parts delimited by underscores:

          - Character set name
          - Optional CLDR language name (based on two-letter ISO 639-1 language codes). This part will be omitted if the collation is based on a UCA collation without any language specific rules.
          - Unicode version: two digit major version, one digit minor version, one digit patch version (e.g. 0900 for Unicode-9.0.0 or 1400 for Unicode-14.0.0).
          - Flags, as described below


          h2. PAD flags
          - _pad - NO PAD (default)
          - _nopad - PAD SPACE


          h2. Variable Weighting (punctuation) flags
          - _vn — "Variable non-ignorable" - handles variable characters on Level 1 (default)
          - _vs — "Variable shifted" - shifts punctuation from Level 1 to Level 4 and enables Level 4.
          - _vb — "Variable blanked" - shifts punctuation from Level 1 to Level 4 but does not enable Level 4. (TODO: let's double check what variable really means in UCA)

          h2. Accent sensitivity flags
          - _ai — Accent insensitive - disables Level 2 (default).
          - _as - Accent sensitive - enables Level 2.

          h2. Case sensitivity flags
          - _ci - Case insensitive - disables Level 3 (default).
          - _cs — Case sensitive - enables Level 3. Ccase difference is handled according to tertiary weight, together with fullwidth, circled, square forms. See https://unicode.org/reports/tr10/#Tertiary_Weight_Table for details.
          - _co - Case only - enables a dedicated Level 2.5 only consisting of the case characteristics (upper vs lower), without other tertiary weight forms.

          h2. Identity sensitivity flags
          - _ii - identity insensitive - disables Level 5 (default)
          - _is - identity sensitivy - enables Level 5 (full binary equality)

          h2. Canonical collation names

          The collation name parser will understand all flags (even default once) in any arbitrary order.

          The canonical names (i.e. as displayed in {{SHOW CREATE}} statements or {{I_S}} queries) will print flags in the order described above.

          The accent and case sensitivity flags will always be printed in canonical names, even with default values.

          Other flags will be printed only if they have a non-default value.

          h2. Examples:
          - utf8mb4_1400_as_ci - a generic Unicode-14.0.0 collation, accent sensitive, case insensitive.
          - utf8mb4_cs_1400_nopad_vs_ai_cs_is - a Czech Unicode-14.0.0 collation, with punctuation shifted from Level 1 to Level 4, accent insensitive, case sensitive, identity sensitive.

          h2. Notes
          The language part will intentionally reside between the character set and the version, to avoid confusing languages with flags. E.g. in the last example, the first "cs" stands for Czech language, the second "cs" stands for case sensitivity.
          bar Alexander Barkov made changes -
          Description As of version 10.7, MariaDB understands the following flags in collation names:
          - _ci for case insensitive collations
          - _cs for case sensitive collations
          - {{_nopad_}} for NO PAD collations

          We eventually want to support all customizations (collation preferences) as described in:
          https://unicode.org/reports/tr10/#Customization

          This new naming convension will encode more flags inside collation names.

          h2. Collation name structure
          The whole collation name structure will consist of the following parts delimited by underscores:

          - Character set name
          - Optional CLDR language name (based on two-letter ISO 639-1 language codes). This part will be omitted if the collation is based on a UCA collation without any language specific rules.
          - Unicode version: two digit major version, one digit minor version, one digit patch version (e.g. 0900 for Unicode-9.0.0 or 1400 for Unicode-14.0.0).
          - Flags, as described below


          h2. PAD flags
          - _pad - NO PAD (default)
          - _nopad - PAD SPACE


          h2. Variable Weighting (punctuation) flags
          - _vn — "Variable non-ignorable" - handles variable characters on Level 1 (default)
          - _vs — "Variable shifted" - shifts punctuation from Level 1 to Level 4 and enables Level 4.
          - _vb — "Variable blanked" - shifts punctuation from Level 1 to Level 4 but does not enable Level 4. (TODO: let's double check what variable really means in UCA)

          h2. Accent sensitivity flags
          - _ai — Accent insensitive - disables Level 2 (default).
          - _as - Accent sensitive - enables Level 2.

          h2. Case sensitivity flags
          - _ci - Case insensitive - disables Level 3 (default).
          - _cs — Case sensitive - enables Level 3. Ccase difference is handled according to tertiary weight, together with fullwidth, circled, square forms. See https://unicode.org/reports/tr10/#Tertiary_Weight_Table for details.
          - _co - Case only - enables a dedicated Level 2.5 only consisting of the case characteristics (upper vs lower), without other tertiary weight forms.

          h2. Identity sensitivity flags
          - _ii - identity insensitive - disables Level 5 (default)
          - _is - identity sensitivy - enables Level 5 (full binary equality)

          h2. Canonical collation names

          The collation name parser will understand all flags (even default once) in any arbitrary order.

          The canonical names (i.e. as displayed in {{SHOW CREATE}} statements or {{I_S}} queries) will print flags in the order described above.

          The accent and case sensitivity flags will always be printed in canonical names, even with default values.

          Other flags will be printed only if they have a non-default value.

          h2. Examples:
          - utf8mb4_1400_as_ci - a generic Unicode-14.0.0 collation, accent sensitive, case insensitive.
          - utf8mb4_cs_1400_nopad_vs_ai_cs_is - a Czech Unicode-14.0.0 collation, with punctuation shifted from Level 1 to Level 4, accent insensitive, case sensitive, identity sensitive.

          h2. Notes
          The language part will intentionally reside between the character set and the version, to avoid confusing languages with flags. E.g. in the last example, the first "cs" stands for Czech language, the second "cs" stands for case sensitivity.
          As of version 10.7, MariaDB understands the following flags in collation names:
          - _ci for case insensitive collations
          - _cs for case sensitive collations
          - {{_nopad_}} for NO PAD collations

          We eventually want to support all customizations (collation preferences) as described in:
          https://unicode.org/reports/tr10/#Customization

          This new naming convension will encode more flags inside collation names.

          This new naming conversion will be applied to newly added UCA based collations. Old collation names will stay untouched.

          h2. Collation name structure
          The whole collation name structure will consist of the following parts delimited by underscores:

          - Character set name
          - Optional CLDR language name (based on two-letter ISO 639-1 language codes). This part will be omitted if the collation is based on a UCA collation without any language specific rules.
          - Unicode version: two digit major version, one digit minor version, one digit patch version (e.g. 0900 for Unicode-9.0.0 or 1400 for Unicode-14.0.0).
          - Flags, as described below


          h2. PAD flags
          - _pad - NO PAD (default)
          - _nopad - PAD SPACE


          h2. Variable Weighting (punctuation) flags
          - _vn — "Variable non-ignorable" - handles variable characters on Level 1 (default)
          - _vs — "Variable shifted" - shifts punctuation from Level 1 to Level 4 and enables Level 4.
          - _vb — "Variable blanked" - shifts punctuation from Level 1 to Level 4 but does not enable Level 4. (TODO: let's double check what variable really means in UCA)

          h2. Accent sensitivity flags
          - _ai — Accent insensitive - disables Level 2 (default).
          - _as - Accent sensitive - enables Level 2.

          h2. Case sensitivity flags
          - _ci - Case insensitive - disables Level 3 (default).
          - _cs — Case sensitive - enables Level 3. Ccase difference is handled according to tertiary weight, together with fullwidth, circled, square forms. See https://unicode.org/reports/tr10/#Tertiary_Weight_Table for details.
          - _co - Case only - enables a dedicated Level 2.5 only consisting of the case characteristics (upper vs lower), without other tertiary weight forms.

          h2. Identity sensitivity flags
          - _ii - identity insensitive - disables Level 5 (default)
          - _is - identity sensitivy - enables Level 5 (full binary equality)

          h2. Canonical collation names

          The collation name parser will understand all flags (even default once) in any arbitrary order.

          The canonical names (i.e. as displayed in {{SHOW CREATE}} statements or {{I_S}} queries) will print flags in the order described above.

          The accent and case sensitivity flags will always be printed in canonical names, even with default values.

          Other flags will be printed only if they have a non-default value.

          h2. Examples:
          - utf8mb4_1400_as_ci - a generic Unicode-14.0.0 collation, accent sensitive, case insensitive.
          - utf8mb4_cs_1400_nopad_vs_ai_cs_is - a Czech Unicode-14.0.0 collation, with punctuation shifted from Level 1 to Level 4, accent insensitive, case sensitive, identity sensitive.

          h2. Notes
          The language part will intentionally reside between the character set and the version, to avoid confusing languages with flags. E.g. in the last example, the first "cs" stands for Czech language, the second "cs" stands for case sensitivity.
          bar Alexander Barkov made changes -
          Description As of version 10.7, MariaDB understands the following flags in collation names:
          - _ci for case insensitive collations
          - _cs for case sensitive collations
          - {{_nopad_}} for NO PAD collations

          We eventually want to support all customizations (collation preferences) as described in:
          https://unicode.org/reports/tr10/#Customization

          This new naming convension will encode more flags inside collation names.

          This new naming conversion will be applied to newly added UCA based collations. Old collation names will stay untouched.

          h2. Collation name structure
          The whole collation name structure will consist of the following parts delimited by underscores:

          - Character set name
          - Optional CLDR language name (based on two-letter ISO 639-1 language codes). This part will be omitted if the collation is based on a UCA collation without any language specific rules.
          - Unicode version: two digit major version, one digit minor version, one digit patch version (e.g. 0900 for Unicode-9.0.0 or 1400 for Unicode-14.0.0).
          - Flags, as described below


          h2. PAD flags
          - _pad - NO PAD (default)
          - _nopad - PAD SPACE


          h2. Variable Weighting (punctuation) flags
          - _vn — "Variable non-ignorable" - handles variable characters on Level 1 (default)
          - _vs — "Variable shifted" - shifts punctuation from Level 1 to Level 4 and enables Level 4.
          - _vb — "Variable blanked" - shifts punctuation from Level 1 to Level 4 but does not enable Level 4. (TODO: let's double check what variable really means in UCA)

          h2. Accent sensitivity flags
          - _ai — Accent insensitive - disables Level 2 (default).
          - _as - Accent sensitive - enables Level 2.

          h2. Case sensitivity flags
          - _ci - Case insensitive - disables Level 3 (default).
          - _cs — Case sensitive - enables Level 3. Ccase difference is handled according to tertiary weight, together with fullwidth, circled, square forms. See https://unicode.org/reports/tr10/#Tertiary_Weight_Table for details.
          - _co - Case only - enables a dedicated Level 2.5 only consisting of the case characteristics (upper vs lower), without other tertiary weight forms.

          h2. Identity sensitivity flags
          - _ii - identity insensitive - disables Level 5 (default)
          - _is - identity sensitivy - enables Level 5 (full binary equality)

          h2. Canonical collation names

          The collation name parser will understand all flags (even default once) in any arbitrary order.

          The canonical names (i.e. as displayed in {{SHOW CREATE}} statements or {{I_S}} queries) will print flags in the order described above.

          The accent and case sensitivity flags will always be printed in canonical names, even with default values.

          Other flags will be printed only if they have a non-default value.

          h2. Examples:
          - utf8mb4_1400_as_ci - a generic Unicode-14.0.0 collation, accent sensitive, case insensitive.
          - utf8mb4_cs_1400_nopad_vs_ai_cs_is - a Czech Unicode-14.0.0 collation, with punctuation shifted from Level 1 to Level 4, accent insensitive, case sensitive, identity sensitive.

          h2. Notes
          The language part will intentionally reside between the character set and the version, to avoid confusing languages with flags. E.g. in the last example, the first "cs" stands for Czech language, the second "cs" stands for case sensitivity.
          As of version 10.7, MariaDB understands the following flags in collation names:
          - _ci for case insensitive collations
          - _cs for case sensitive collations
          - {{_nopad_}} for NO PAD collations

          We eventually want to support all customizations (collation preferences) as described in:
          https://unicode.org/reports/tr10/#Customization

          This new naming convension will encode more flags inside collation names.

          This new naming conversion will be applied to newly added UCA based collations. Old collation names will stay untouched.

          h2. Collation name structure
          The whole collation name structure will consist of the following parts delimited by underscores:

          - Character set name
          - Optional CLDR language name (based on two-letter ISO 639-1 language codes). This part will be omitted if the collation is based on a UCA collation without any language specific rules.
          - Unicode version: two digit major version, one digit minor version, one digit patch version (e.g. 0900 for Unicode-9.0.0 or 1400 for Unicode-14.0.0).
          - Flags, as described below


          h2. PAD flags
          - _pad - NO PAD (default)
          - _nopad - PAD SPACE


          h2. Variable Weighting (punctuation) flags
          - _vn — "Variable non-ignorable" - handles variable characters on Level 1 (default)
          - _vs — "Variable shifted" - shifts punctuation from Level 1 to Level 4 and enables Level 4.
          - _vb — "Variable blanked" - shifts punctuation from Level 1 to Level 4 but does not enable Level 4. (TODO: let's double check what blanked really means in UCA)

          h2. Accent sensitivity flags
          - _ai — Accent insensitive - disables Level 2 (default).
          - _as - Accent sensitive - enables Level 2.

          h2. Case sensitivity flags
          - _ci - Case insensitive - disables Level 3 (default).
          - _cs — Case sensitive - enables Level 3. Ccase difference is handled according to tertiary weight, together with fullwidth, circled, square forms. See https://unicode.org/reports/tr10/#Tertiary_Weight_Table for details.
          - _co - Case only - enables a dedicated Level 2.5 only consisting of the case characteristics (upper vs lower), without other tertiary weight forms.

          h2. Identity sensitivity flags
          - _ii - identity insensitive - disables Level 5 (default)
          - _is - identity sensitivy - enables Level 5 (full binary equality)

          h2. Canonical collation names

          The collation name parser will understand all flags (even default once) in any arbitrary order.

          The canonical names (i.e. as displayed in {{SHOW CREATE}} statements or {{I_S}} queries) will print flags in the order described above.

          The accent and case sensitivity flags will always be printed in canonical names, even with default values.

          Other flags will be printed only if they have a non-default value.

          h2. Examples:
          - utf8mb4_1400_as_ci - a generic Unicode-14.0.0 collation, accent sensitive, case insensitive.
          - utf8mb4_cs_1400_nopad_vs_ai_cs_is - a Czech Unicode-14.0.0 collation, with punctuation shifted from Level 1 to Level 4, accent insensitive, case sensitive, identity sensitive.

          h2. Notes
          The language part will intentionally reside between the character set and the version, to avoid confusing languages with flags. E.g. in the last example, the first "cs" stands for Czech language, the second "cs" stands for case sensitivity.
          bar Alexander Barkov made changes -
          Description As of version 10.7, MariaDB understands the following flags in collation names:
          - _ci for case insensitive collations
          - _cs for case sensitive collations
          - {{_nopad_}} for NO PAD collations

          We eventually want to support all customizations (collation preferences) as described in:
          https://unicode.org/reports/tr10/#Customization

          This new naming convension will encode more flags inside collation names.

          This new naming conversion will be applied to newly added UCA based collations. Old collation names will stay untouched.

          h2. Collation name structure
          The whole collation name structure will consist of the following parts delimited by underscores:

          - Character set name
          - Optional CLDR language name (based on two-letter ISO 639-1 language codes). This part will be omitted if the collation is based on a UCA collation without any language specific rules.
          - Unicode version: two digit major version, one digit minor version, one digit patch version (e.g. 0900 for Unicode-9.0.0 or 1400 for Unicode-14.0.0).
          - Flags, as described below


          h2. PAD flags
          - _pad - NO PAD (default)
          - _nopad - PAD SPACE


          h2. Variable Weighting (punctuation) flags
          - _vn — "Variable non-ignorable" - handles variable characters on Level 1 (default)
          - _vs — "Variable shifted" - shifts punctuation from Level 1 to Level 4 and enables Level 4.
          - _vb — "Variable blanked" - shifts punctuation from Level 1 to Level 4 but does not enable Level 4. (TODO: let's double check what blanked really means in UCA)

          h2. Accent sensitivity flags
          - _ai — Accent insensitive - disables Level 2 (default).
          - _as - Accent sensitive - enables Level 2.

          h2. Case sensitivity flags
          - _ci - Case insensitive - disables Level 3 (default).
          - _cs — Case sensitive - enables Level 3. Ccase difference is handled according to tertiary weight, together with fullwidth, circled, square forms. See https://unicode.org/reports/tr10/#Tertiary_Weight_Table for details.
          - _co - Case only - enables a dedicated Level 2.5 only consisting of the case characteristics (upper vs lower), without other tertiary weight forms.

          h2. Identity sensitivity flags
          - _ii - identity insensitive - disables Level 5 (default)
          - _is - identity sensitivy - enables Level 5 (full binary equality)

          h2. Canonical collation names

          The collation name parser will understand all flags (even default once) in any arbitrary order.

          The canonical names (i.e. as displayed in {{SHOW CREATE}} statements or {{I_S}} queries) will print flags in the order described above.

          The accent and case sensitivity flags will always be printed in canonical names, even with default values.

          Other flags will be printed only if they have a non-default value.

          h2. Examples:
          - utf8mb4_1400_as_ci - a generic Unicode-14.0.0 collation, accent sensitive, case insensitive.
          - utf8mb4_cs_1400_nopad_vs_ai_cs_is - a Czech Unicode-14.0.0 collation, with punctuation shifted from Level 1 to Level 4, accent insensitive, case sensitive, identity sensitive.

          h2. Notes
          The language part will intentionally reside between the character set and the version, to avoid confusing languages with flags. E.g. in the last example, the first "cs" stands for Czech language, the second "cs" stands for case sensitivity.
          As of version 10.7, MariaDB understands the following flags in collation names:
          - _ci for case insensitive collations
          - _cs for case sensitive collations
          - {{_nopad_}} for NO PAD collations

          We eventually want to support all customizations (collation preferences) as described in:
          https://unicode.org/reports/tr10/#Customization

          This new naming convension will encode more flags inside collation names.

          This new naming conversion will be applied to newly added UCA based collations. Old collation names will stay untouched.

          h2. Collation name structure
          The whole collation name structure will consist of the following parts delimited by underscores:

          - Character set name
          - Optional CLDR language name (based on two-letter ISO 639-1 language codes). This part will be omitted if the collation is based on a UCA collation without any language specific rules.
          - Unicode version: two digit major version, one digit minor version, one digit patch version (e.g. 0900 for Unicode-9.0.0 or 1400 for Unicode-14.0.0).
          - Flags, as described below


          h2. PAD flags
          - _pad - NO PAD (default)
          - _nopad - PAD SPACE


          h2. Variable Weighting (punctuation) flags
          - _vn — "Variable non-ignorable" - handles variable characters on Level 1 (default)
          - _vs — "Variable shifted" - shifts punctuation from Level 1 to Level 4 and enables Level 4.
          - _vb — "Variable blanked" - shifts punctuation from Level 1 to Level 4 but does not enable Level 4. (TODO: let's double check what blanked really means in UCA)

          h2. Accent sensitivity flags
          - _ai — Accent insensitive - disables Level 2 (default).
          - _as - Accent sensitive - enables Level 2.

          h2. Case sensitivity flags
          - _ci - Case insensitive - disables Level 3 (default).
          - _cs — Case sensitive - enables Level 3. Case difference is handled according to tertiary weight, together with fullwidth, circled, square forms. See https://unicode.org/reports/tr10/#Tertiary_Weight_Table for details.
          - _co - Case only - enables a dedicated Level 2.5 only consisting of the case characteristics (upper vs lower), without other tertiary weight forms.

          h2. Identity sensitivity flags
          - _ii - identity insensitive - disables Level 5 (default)
          - _is - identity sensitivy - enables Level 5 (full binary equality)

          h2. Canonical collation names

          The collation name parser will understand all flags (even default once) in any arbitrary order.

          The canonical names (i.e. as displayed in {{SHOW CREATE}} statements or {{I_S}} queries) will print flags in the order described above.

          The accent and case sensitivity flags will always be printed in canonical names, even with default values.

          Other flags will be printed only if they have a non-default value.

          h2. Examples:
          - utf8mb4_1400_as_ci - a generic Unicode-14.0.0 collation, accent sensitive, case insensitive.
          - utf8mb4_cs_1400_nopad_vs_ai_cs_is - a Czech Unicode-14.0.0 collation, with punctuation shifted from Level 1 to Level 4, accent insensitive, case sensitive, identity sensitive.

          h2. Notes
          The language part will intentionally reside between the character set and the version, to avoid confusing languages with flags. E.g. in the last example, the first "cs" stands for Czech language, the second "cs" stands for case sensitivity.
          bar Alexander Barkov made changes -
          Description As of version 10.7, MariaDB understands the following flags in collation names:
          - _ci for case insensitive collations
          - _cs for case sensitive collations
          - {{_nopad_}} for NO PAD collations

          We eventually want to support all customizations (collation preferences) as described in:
          https://unicode.org/reports/tr10/#Customization

          This new naming convension will encode more flags inside collation names.

          This new naming conversion will be applied to newly added UCA based collations. Old collation names will stay untouched.

          h2. Collation name structure
          The whole collation name structure will consist of the following parts delimited by underscores:

          - Character set name
          - Optional CLDR language name (based on two-letter ISO 639-1 language codes). This part will be omitted if the collation is based on a UCA collation without any language specific rules.
          - Unicode version: two digit major version, one digit minor version, one digit patch version (e.g. 0900 for Unicode-9.0.0 or 1400 for Unicode-14.0.0).
          - Flags, as described below


          h2. PAD flags
          - _pad - NO PAD (default)
          - _nopad - PAD SPACE


          h2. Variable Weighting (punctuation) flags
          - _vn — "Variable non-ignorable" - handles variable characters on Level 1 (default)
          - _vs — "Variable shifted" - shifts punctuation from Level 1 to Level 4 and enables Level 4.
          - _vb — "Variable blanked" - shifts punctuation from Level 1 to Level 4 but does not enable Level 4. (TODO: let's double check what blanked really means in UCA)

          h2. Accent sensitivity flags
          - _ai — Accent insensitive - disables Level 2 (default).
          - _as - Accent sensitive - enables Level 2.

          h2. Case sensitivity flags
          - _ci - Case insensitive - disables Level 3 (default).
          - _cs — Case sensitive - enables Level 3. Case difference is handled according to tertiary weight, together with fullwidth, circled, square forms. See https://unicode.org/reports/tr10/#Tertiary_Weight_Table for details.
          - _co - Case only - enables a dedicated Level 2.5 only consisting of the case characteristics (upper vs lower), without other tertiary weight forms.

          h2. Identity sensitivity flags
          - _ii - identity insensitive - disables Level 5 (default)
          - _is - identity sensitivy - enables Level 5 (full binary equality)

          h2. Canonical collation names

          The collation name parser will understand all flags (even default once) in any arbitrary order.

          The canonical names (i.e. as displayed in {{SHOW CREATE}} statements or {{I_S}} queries) will print flags in the order described above.

          The accent and case sensitivity flags will always be printed in canonical names, even with default values.

          Other flags will be printed only if they have a non-default value.

          h2. Examples:
          - utf8mb4_1400_as_ci - a generic Unicode-14.0.0 collation, accent sensitive, case insensitive.
          - utf8mb4_cs_1400_nopad_vs_ai_cs_is - a Czech Unicode-14.0.0 collation, with punctuation shifted from Level 1 to Level 4, accent insensitive, case sensitive, identity sensitive.

          h2. Notes
          The language part will intentionally reside between the character set and the version, to avoid confusing languages with flags. E.g. in the last example, the first "cs" stands for Czech language, the second "cs" stands for case sensitivity.
          As of version 10.7, MariaDB understands the following flags in collation names:
          - _ci for case insensitive collations
          - _cs for case sensitive collations
          - {{_nopad_}} for NO PAD collations

          We eventually want to support all customizations (collation preferences) as described in:
          https://unicode.org/reports/tr10/#Customization

          This new naming convension will encode more flags inside collation names.

          This new naming conversion will be applied to newly added UCA based collations. Old collation names will stay untouched.

          h2. Collation name structure
          The whole collation name structure will consist of the following parts delimited by underscores:

          - Character set name
          - Optional CLDR language name (based on two-letter ISO 639-1 language codes). This part will be omitted if the collation is based on a UCA collation without any language specific rules.
          - Unicode version: two digit major version, one digit minor version, one digit patch version (e.g. 0900 for Unicode-9.0.0 or 1400 for Unicode-14.0.0).
          - Flags, as described below


          h2. PAD flags
          - _pad - NO PAD (default)
          - _nopad - PAD SPACE


          h2. Variable Weighting (punctuation) flags
          - _vn — "Variable non-ignorable" - handles variable characters on Level 1 (default)
          - _vs — "Variable shifted" - shifts punctuation from Level 1 to Level 4 and enables Level 4.
          - _vb — "Variable blanked" - Variable characters are reset so that all weights (except for the identical level) are zero.

          h2. Accent sensitivity flags
          - _ai — Accent insensitive - disables Level 2 (default).
          - _as - Accent sensitive - enables Level 2.

          h2. Case sensitivity flags
          - _ci - Case insensitive - disables Level 3 (default).
          - _cs — Case sensitive - enables Level 3. Case difference is handled according to tertiary weight, together with fullwidth, circled, square forms. See https://unicode.org/reports/tr10/#Tertiary_Weight_Table for details.
          - _co - Case only - enables a dedicated Level 2.5 only consisting of the case characteristics (upper vs lower), without other tertiary weight forms.

          h2. Identity sensitivity flags
          - _ii - identity insensitive - disables Level 5 (default)
          - _is - identity sensitivy - enables Level 5 (full binary equality)

          h2. Canonical collation names

          The collation name parser will understand all flags (even default once) in any arbitrary order.

          The canonical names (i.e. as displayed in {{SHOW CREATE}} statements or {{I_S}} queries) will print flags in the order described above.

          The accent and case sensitivity flags will always be printed in canonical names, even with default values.

          Other flags will be printed only if they have a non-default value.

          h2. Examples:
          - utf8mb4_1400_as_ci - a generic Unicode-14.0.0 collation, accent sensitive, case insensitive.
          - utf8mb4_cs_1400_nopad_vs_ai_cs_is - a Czech Unicode-14.0.0 collation, with punctuation shifted from Level 1 to Level 4, accent insensitive, case sensitive, identity sensitive.

          h2. Notes
          The language part will intentionally reside between the character set and the version, to avoid confusing languages with flags. E.g. in the last example, the first "cs" stands for Czech language, the second "cs" stands for case sensitivity.
          bar Alexander Barkov made changes -
          Description As of version 10.7, MariaDB understands the following flags in collation names:
          - _ci for case insensitive collations
          - _cs for case sensitive collations
          - {{_nopad_}} for NO PAD collations

          We eventually want to support all customizations (collation preferences) as described in:
          https://unicode.org/reports/tr10/#Customization

          This new naming convension will encode more flags inside collation names.

          This new naming conversion will be applied to newly added UCA based collations. Old collation names will stay untouched.

          h2. Collation name structure
          The whole collation name structure will consist of the following parts delimited by underscores:

          - Character set name
          - Optional CLDR language name (based on two-letter ISO 639-1 language codes). This part will be omitted if the collation is based on a UCA collation without any language specific rules.
          - Unicode version: two digit major version, one digit minor version, one digit patch version (e.g. 0900 for Unicode-9.0.0 or 1400 for Unicode-14.0.0).
          - Flags, as described below


          h2. PAD flags
          - _pad - NO PAD (default)
          - _nopad - PAD SPACE


          h2. Variable Weighting (punctuation) flags
          - _vn — "Variable non-ignorable" - handles variable characters on Level 1 (default)
          - _vs — "Variable shifted" - shifts punctuation from Level 1 to Level 4 and enables Level 4.
          - _vb — "Variable blanked" - Variable characters are reset so that all weights (except for the identical level) are zero.

          h2. Accent sensitivity flags
          - _ai — Accent insensitive - disables Level 2 (default).
          - _as - Accent sensitive - enables Level 2.

          h2. Case sensitivity flags
          - _ci - Case insensitive - disables Level 3 (default).
          - _cs — Case sensitive - enables Level 3. Case difference is handled according to tertiary weight, together with fullwidth, circled, square forms. See https://unicode.org/reports/tr10/#Tertiary_Weight_Table for details.
          - _co - Case only - enables a dedicated Level 2.5 only consisting of the case characteristics (upper vs lower), without other tertiary weight forms.

          h2. Identity sensitivity flags
          - _ii - identity insensitive - disables Level 5 (default)
          - _is - identity sensitivy - enables Level 5 (full binary equality)

          h2. Canonical collation names

          The collation name parser will understand all flags (even default once) in any arbitrary order.

          The canonical names (i.e. as displayed in {{SHOW CREATE}} statements or {{I_S}} queries) will print flags in the order described above.

          The accent and case sensitivity flags will always be printed in canonical names, even with default values.

          Other flags will be printed only if they have a non-default value.

          h2. Examples:
          - utf8mb4_1400_as_ci - a generic Unicode-14.0.0 collation, accent sensitive, case insensitive.
          - utf8mb4_cs_1400_nopad_vs_ai_cs_is - a Czech Unicode-14.0.0 collation, with punctuation shifted from Level 1 to Level 4, accent insensitive, case sensitive, identity sensitive.

          h2. Notes
          The language part will intentionally reside between the character set and the version, to avoid confusing languages with flags. E.g. in the last example, the first "cs" stands for Czech language, the second "cs" stands for case sensitivity.
          As of version 10.7, MariaDB understands the following flags in collation names:
          - _ci for case insensitive collations
          - _cs for case sensitive collations
          - {{_nopad_}} for NO PAD collations

          We eventually want to support all customizations (collation preferences) as described in:
          https://unicode.org/reports/tr10/#Customization

          This new naming convension will encode more flags inside collation names.

          This new naming conversion will be applied to newly added UCA based collations. Old collation names will stay untouched.

          h2. Collation name structure
          The whole collation name structure will consist of the following parts delimited by underscores:

          - Character set name
          - Optional CLDR language name (based on two-letter ISO 639-1 language codes). This part will be omitted if the collation is based on a UCA collation without any language specific rules.
          - Unicode version: two digit major version, one digit minor version, one digit patch version (e.g. 0900 for Unicode-9.0.0 or 1400 for Unicode-14.0.0).
          - Flags, as described below


          h2. PAD flags
          - _pad - NO PAD (default)
          - _nopad - PAD SPACE


          h2. Variable Weighting (punctuation) flags
          - _vn — "Variable non-ignorable" - handles variable characters on Level 1 (default)
          - _vs — "Variable shifted" - shifts variable characters from Level 1 to Level 4 and enables Level 4.
          - _vb — "Variable blanked" - Variable characters are reset so that all weights (except for the identical level) are zero.

          h2. Accent sensitivity flags
          - _ai — Accent insensitive - disables Level 2 (default).
          - _as - Accent sensitive - enables Level 2.

          h2. Case sensitivity flags
          - _ci - Case insensitive - disables Level 3 (default).
          - _cs — Case sensitive - enables Level 3. Case difference is handled according to tertiary weight, together with fullwidth, circled, square forms. See https://unicode.org/reports/tr10/#Tertiary_Weight_Table for details.
          - _co - Case only - enables a dedicated Level 2.5 only consisting of the case characteristics (upper vs lower), without other tertiary weight forms.

          h2. Identity sensitivity flags
          - _ii - identity insensitive - disables Level 5 (default)
          - _is - identity sensitivy - enables Level 5 (full binary equality)

          h2. Canonical collation names

          The collation name parser will understand all flags (even default once) in any arbitrary order.

          The canonical names (i.e. as displayed in {{SHOW CREATE}} statements or {{I_S}} queries) will print flags in the order described above.

          The accent and case sensitivity flags will always be printed in canonical names, even with default values.

          Other flags will be printed only if they have a non-default value.

          h2. Examples:
          - utf8mb4_1400_as_ci - a generic Unicode-14.0.0 collation, accent sensitive, case insensitive.
          - utf8mb4_cs_1400_nopad_vs_ai_cs_is - a Czech Unicode-14.0.0 collation, with punctuation shifted from Level 1 to Level 4, accent insensitive, case sensitive, identity sensitive.

          h2. Notes
          The language part will intentionally reside between the character set and the version, to avoid confusing languages with flags. E.g. in the last example, the first "cs" stands for Czech language, the second "cs" stands for case sensitivity.
          serg Sergei Golubchik made changes -
          Description As of version 10.7, MariaDB understands the following flags in collation names:
          - _ci for case insensitive collations
          - _cs for case sensitive collations
          - {{_nopad_}} for NO PAD collations

          We eventually want to support all customizations (collation preferences) as described in:
          https://unicode.org/reports/tr10/#Customization

          This new naming convension will encode more flags inside collation names.

          This new naming conversion will be applied to newly added UCA based collations. Old collation names will stay untouched.

          h2. Collation name structure
          The whole collation name structure will consist of the following parts delimited by underscores:

          - Character set name
          - Optional CLDR language name (based on two-letter ISO 639-1 language codes). This part will be omitted if the collation is based on a UCA collation without any language specific rules.
          - Unicode version: two digit major version, one digit minor version, one digit patch version (e.g. 0900 for Unicode-9.0.0 or 1400 for Unicode-14.0.0).
          - Flags, as described below


          h2. PAD flags
          - _pad - NO PAD (default)
          - _nopad - PAD SPACE


          h2. Variable Weighting (punctuation) flags
          - _vn — "Variable non-ignorable" - handles variable characters on Level 1 (default)
          - _vs — "Variable shifted" - shifts variable characters from Level 1 to Level 4 and enables Level 4.
          - _vb — "Variable blanked" - Variable characters are reset so that all weights (except for the identical level) are zero.

          h2. Accent sensitivity flags
          - _ai — Accent insensitive - disables Level 2 (default).
          - _as - Accent sensitive - enables Level 2.

          h2. Case sensitivity flags
          - _ci - Case insensitive - disables Level 3 (default).
          - _cs — Case sensitive - enables Level 3. Case difference is handled according to tertiary weight, together with fullwidth, circled, square forms. See https://unicode.org/reports/tr10/#Tertiary_Weight_Table for details.
          - _co - Case only - enables a dedicated Level 2.5 only consisting of the case characteristics (upper vs lower), without other tertiary weight forms.

          h2. Identity sensitivity flags
          - _ii - identity insensitive - disables Level 5 (default)
          - _is - identity sensitivy - enables Level 5 (full binary equality)

          h2. Canonical collation names

          The collation name parser will understand all flags (even default once) in any arbitrary order.

          The canonical names (i.e. as displayed in {{SHOW CREATE}} statements or {{I_S}} queries) will print flags in the order described above.

          The accent and case sensitivity flags will always be printed in canonical names, even with default values.

          Other flags will be printed only if they have a non-default value.

          h2. Examples:
          - utf8mb4_1400_as_ci - a generic Unicode-14.0.0 collation, accent sensitive, case insensitive.
          - utf8mb4_cs_1400_nopad_vs_ai_cs_is - a Czech Unicode-14.0.0 collation, with punctuation shifted from Level 1 to Level 4, accent insensitive, case sensitive, identity sensitive.

          h2. Notes
          The language part will intentionally reside between the character set and the version, to avoid confusing languages with flags. E.g. in the last example, the first "cs" stands for Czech language, the second "cs" stands for case sensitivity.
          As of version 10.7, MariaDB understands the following flags in collation names:
          - _ci for case insensitive collations
          - _cs for case sensitive collations
          - {{_nopad\_}} for NO PAD collations

          We eventually want to support all customizations (collation preferences) as described in:
          https://unicode.org/reports/tr10/#Customization

          This new naming convension will encode more flags inside collation names.

          This new naming conversion will be applied to newly added UCA based collations. Old collation names will stay untouched.

          h2. Collation name structure
          The whole collation name structure will consist of the following parts delimited by underscores:

          - Character set name
          - Optional CLDR language name (based on two-letter ISO 639-1 language codes). This part will be omitted if the collation is based on a UCA collation without any language specific rules.
          - Unicode version: two digit major version, one digit minor version, one digit patch version (e.g. 0900 for Unicode-9.0.0 or 1400 for Unicode-14.0.0).
          - Flags, as described below


          h2. PAD flags
          - _pad - NO PAD (default)
          - _nopad - PAD SPACE


          h2. Variable Weighting (punctuation) flags
          - _vn — "Variable non-ignorable" - handles variable characters on Level 1 (default)
          - _vs — "Variable shifted" - shifts punctuation from Level 1 to Level 4 and enables Level 4.
          - _vb — "Variable blanked" - shifts punctuation from Level 1 to Level 4 but does not enable Level 4. (TODO: let's double check what blanked really means in UCA)

          h2. Accent sensitivity flags
          - _ai — Accent insensitive - disables Level 2 (default).
          - _as - Accent sensitive - enables Level 2.

          h2. Case sensitivity flags
          - _ci - Case insensitive - disables Level 3 (default).
          - _cs — Case sensitive - enables Level 3. Case difference is handled according to tertiary weight, together with fullwidth, circled, square forms. See https://unicode.org/reports/tr10/#Tertiary_Weight_Table for details.
          - _co - Case only - enables a dedicated Level 2.5 only consisting of the case characteristics (upper vs lower), without other tertiary weight forms.

          h2. Identity sensitivity flags
          - _ii - identity insensitive - disables Level 5 (default)
          - _is - identity sensitivy - enables Level 5 (full binary equality)

          h2. Canonical collation names

          The collation name parser will understand all flags (even default once) in any arbitrary order.

          The canonical names (i.e. as displayed in {{SHOW CREATE}} statements or {{I_S}} queries) will print flags in the order described above.

          The accent and case sensitivity flags will always be printed in canonical names, even with default values.

          Other flags will be printed only if they have a non-default value.

          h2. Examples:
          - utf8mb4_1400_as_ci - a generic Unicode-14.0.0 collation, accent sensitive, case insensitive.
          - utf8mb4_cs_1400_nopad_vs_ai_cs_is - a Czech Unicode-14.0.0 collation, with punctuation shifted from Level 1 to Level 4, accent insensitive, case sensitive, identity sensitive.

          h2. Notes
          The language part will intentionally reside between the character set and the version, to avoid confusing languages with flags. E.g. in the last example, the first "cs" stands for Czech language, the second "cs" stands for case sensitivity.
          bar Alexander Barkov made changes -
          bar Alexander Barkov made changes -
          Description As of version 10.7, MariaDB understands the following flags in collation names:
          - _ci for case insensitive collations
          - _cs for case sensitive collations
          - {{_nopad\_}} for NO PAD collations

          We eventually want to support all customizations (collation preferences) as described in:
          https://unicode.org/reports/tr10/#Customization

          This new naming convension will encode more flags inside collation names.

          This new naming conversion will be applied to newly added UCA based collations. Old collation names will stay untouched.

          h2. Collation name structure
          The whole collation name structure will consist of the following parts delimited by underscores:

          - Character set name
          - Optional CLDR language name (based on two-letter ISO 639-1 language codes). This part will be omitted if the collation is based on a UCA collation without any language specific rules.
          - Unicode version: two digit major version, one digit minor version, one digit patch version (e.g. 0900 for Unicode-9.0.0 or 1400 for Unicode-14.0.0).
          - Flags, as described below


          h2. PAD flags
          - _pad - NO PAD (default)
          - _nopad - PAD SPACE


          h2. Variable Weighting (punctuation) flags
          - _vn — "Variable non-ignorable" - handles variable characters on Level 1 (default)
          - _vs — "Variable shifted" - shifts punctuation from Level 1 to Level 4 and enables Level 4.
          - _vb — "Variable blanked" - shifts punctuation from Level 1 to Level 4 but does not enable Level 4. (TODO: let's double check what blanked really means in UCA)

          h2. Accent sensitivity flags
          - _ai — Accent insensitive - disables Level 2 (default).
          - _as - Accent sensitive - enables Level 2.

          h2. Case sensitivity flags
          - _ci - Case insensitive - disables Level 3 (default).
          - _cs — Case sensitive - enables Level 3. Case difference is handled according to tertiary weight, together with fullwidth, circled, square forms. See https://unicode.org/reports/tr10/#Tertiary_Weight_Table for details.
          - _co - Case only - enables a dedicated Level 2.5 only consisting of the case characteristics (upper vs lower), without other tertiary weight forms.

          h2. Identity sensitivity flags
          - _ii - identity insensitive - disables Level 5 (default)
          - _is - identity sensitivy - enables Level 5 (full binary equality)

          h2. Canonical collation names

          The collation name parser will understand all flags (even default once) in any arbitrary order.

          The canonical names (i.e. as displayed in {{SHOW CREATE}} statements or {{I_S}} queries) will print flags in the order described above.

          The accent and case sensitivity flags will always be printed in canonical names, even with default values.

          Other flags will be printed only if they have a non-default value.

          h2. Examples:
          - utf8mb4_1400_as_ci - a generic Unicode-14.0.0 collation, accent sensitive, case insensitive.
          - utf8mb4_cs_1400_nopad_vs_ai_cs_is - a Czech Unicode-14.0.0 collation, with punctuation shifted from Level 1 to Level 4, accent insensitive, case sensitive, identity sensitive.

          h2. Notes
          The language part will intentionally reside between the character set and the version, to avoid confusing languages with flags. E.g. in the last example, the first "cs" stands for Czech language, the second "cs" stands for case sensitivity.
          As of version 10.7, MariaDB understands the following flags in collation names:
          - _ci for case insensitive collations
          - _cs for case sensitive collations
          - {{_nopad\_}} for NO PAD collations

          We eventually want to support all customizations (collation preferences) as described in:
          https://unicode.org/reports/tr10/#Customization

          This new naming convension will encode more flags inside collation names.

          This new naming conversion will be applied to newly added UCA based collations. Old collation names will stay untouched.

          h2. Disclaimer
          h2. Collation name structure
          The whole collation name structure will consist of the following parts delimited by underscores:

          - Character set name
          - Optional CLDR language name (based on two-letter ISO 639-1 language codes). This part will be omitted if the collation is based on a UCA collation without any language specific rules.
          - Unicode version: two digit major version, one digit minor version, one digit patch version (e.g. 0900 for Unicode-9.0.0 or 1400 for Unicode-14.0.0).
          - Flags, as described below


          h2. PAD flags
          - _pad - NO PAD (default)
          - _nopad - PAD SPACE

          h2. Variable Weighting (punctuation) flags
          - _vn — "Variable non-ignorable" - handles variable characters on Level 1 (default)
          - _vs — "Variable shifted" - shifts punctuation from Level 1 to Level 4 and enables Level 4.
          - _vb — "Variable blanked" - shifts punctuation from Level 1 to Level 4 but does not enable Level 4. (TODO: let's double check what blanked really means in UCA)

          h2. Accent sensitivity flags
          - _ai — Accent insensitive - disables Level 2 (default).
          - _as - Accent sensitive - enables Level 2.

          h2. Case sensitivity flags
          - _ci - Case insensitive - disables Level 3 (default).
          - _cs — Case sensitive - enables Level 3. Case difference is handled according to tertiary weight, together with fullwidth, circled, square forms. See https://unicode.org/reports/tr10/#Tertiary_Weight_Table for details.
          - _co - Case only - enables a dedicated Level 2.5 only consisting of the case characteristics (upper vs lower), without other tertiary weight forms.

          h2. Identity sensitivity flags
          - _ii - identity insensitive - disables Level 5 (default)
          - _is - identity sensitivy - enables Level 5 (full binary equality)

          h2. Canonical collation names

          The collation name parser will understand all flags (even default once) in any arbitrary order.

          The canonical names (i.e. as displayed in {{SHOW CREATE}} statements or {{I_S}} queries) will print flags in the order described above.

          The accent and case sensitivity flags will always be printed in canonical names, even with default values.

          Other flags will be printed only if they have a non-default value.

          h2. Examples:
          - utf8mb4_1400_as_ci - a generic Unicode-14.0.0 collation, accent sensitive, case insensitive.
          - utf8mb4_cs_1400_nopad_vs_ai_cs_is - a Czech Unicode-14.0.0 collation, with punctuation shifted from Level 1 to Level 4, accent insensitive, case sensitive, identity sensitive.

          h2. Notes
          The language part will intentionally reside between the character set and the version, to avoid confusing languages with flags. E.g. in the last example, the first "cs" stands for Czech language, the second "cs" stands for case sensitivity.

          h2. Disclaimer
          We won't implement all flags mentioned here in several steps under terms of different tasks.

          Variable weighting and Identity sensitivity flags will most likely be implemented later than other flags.

          bar Alexander Barkov made changes -
          Description As of version 10.7, MariaDB understands the following flags in collation names:
          - _ci for case insensitive collations
          - _cs for case sensitive collations
          - {{_nopad\_}} for NO PAD collations

          We eventually want to support all customizations (collation preferences) as described in:
          https://unicode.org/reports/tr10/#Customization

          This new naming convension will encode more flags inside collation names.

          This new naming conversion will be applied to newly added UCA based collations. Old collation names will stay untouched.

          h2. Disclaimer
          h2. Collation name structure
          The whole collation name structure will consist of the following parts delimited by underscores:

          - Character set name
          - Optional CLDR language name (based on two-letter ISO 639-1 language codes). This part will be omitted if the collation is based on a UCA collation without any language specific rules.
          - Unicode version: two digit major version, one digit minor version, one digit patch version (e.g. 0900 for Unicode-9.0.0 or 1400 for Unicode-14.0.0).
          - Flags, as described below


          h2. PAD flags
          - _pad - NO PAD (default)
          - _nopad - PAD SPACE

          h2. Variable Weighting (punctuation) flags
          - _vn — "Variable non-ignorable" - handles variable characters on Level 1 (default)
          - _vs — "Variable shifted" - shifts punctuation from Level 1 to Level 4 and enables Level 4.
          - _vb — "Variable blanked" - shifts punctuation from Level 1 to Level 4 but does not enable Level 4. (TODO: let's double check what blanked really means in UCA)

          h2. Accent sensitivity flags
          - _ai — Accent insensitive - disables Level 2 (default).
          - _as - Accent sensitive - enables Level 2.

          h2. Case sensitivity flags
          - _ci - Case insensitive - disables Level 3 (default).
          - _cs — Case sensitive - enables Level 3. Case difference is handled according to tertiary weight, together with fullwidth, circled, square forms. See https://unicode.org/reports/tr10/#Tertiary_Weight_Table for details.
          - _co - Case only - enables a dedicated Level 2.5 only consisting of the case characteristics (upper vs lower), without other tertiary weight forms.

          h2. Identity sensitivity flags
          - _ii - identity insensitive - disables Level 5 (default)
          - _is - identity sensitivy - enables Level 5 (full binary equality)

          h2. Canonical collation names

          The collation name parser will understand all flags (even default once) in any arbitrary order.

          The canonical names (i.e. as displayed in {{SHOW CREATE}} statements or {{I_S}} queries) will print flags in the order described above.

          The accent and case sensitivity flags will always be printed in canonical names, even with default values.

          Other flags will be printed only if they have a non-default value.

          h2. Examples:
          - utf8mb4_1400_as_ci - a generic Unicode-14.0.0 collation, accent sensitive, case insensitive.
          - utf8mb4_cs_1400_nopad_vs_ai_cs_is - a Czech Unicode-14.0.0 collation, with punctuation shifted from Level 1 to Level 4, accent insensitive, case sensitive, identity sensitive.

          h2. Notes
          The language part will intentionally reside between the character set and the version, to avoid confusing languages with flags. E.g. in the last example, the first "cs" stands for Czech language, the second "cs" stands for case sensitivity.

          h2. Disclaimer
          We won't implement all flags mentioned here in several steps under terms of different tasks.

          Variable weighting and Identity sensitivity flags will most likely be implemented later than other flags.

          As of version 10.7, MariaDB understands the following flags in collation names:
          - _ci for case insensitive collations
          - _cs for case sensitive collations
          - {{_nopad\_}} for NO PAD collations

          We eventually want to support all customizations (collation preferences) as described in:
          https://unicode.org/reports/tr10/#Customization

          This new naming convension will encode more flags inside collation names.

          This new naming conversion will be applied to newly added UCA based collations. Old collation names will stay untouched.

          h2. Disclaimer
          h2. Collation name structure
          The whole collation name structure will consist of the following parts delimited by underscores:

          - Character set name
          - Optional CLDR language name (based on two-letter ISO 639-1 language codes). This part will be omitted if the collation is based on a UCA collation without any language specific rules.
          - Unicode version: two digit major version, one digit minor version, one digit patch version (e.g. 0900 for Unicode-9.0.0 or 1400 for Unicode-14.0.0).
          - Flags, as described below


          h2. PAD flags
          - _pad - NO PAD (default)
          - _nopad - PAD SPACE

          h2. Variable Weighting (punctuation) flags
          - _vn — "Variable non-ignorable" - handles variable characters on Level 1 (default)
          - _vs — "Variable shifted" - shifts punctuation from Level 1 to Level 4 and enables Level 4.
          - _vb — "Variable blanked" - shifts punctuation from Level 1 to Level 4 but does not enable Level 4. (TODO: let's double check what blanked really means in UCA)

          h2. Accent sensitivity flags
          - _ai — Accent insensitive - disables Level 2 (default).
          - _as - Accent sensitive - enables Level 2.

          h2. Case sensitivity flags
          - _ci - Case insensitive - disables Level 3 (default).
          - _cs — Case sensitive - enables Level 3. Case difference is handled according to tertiary weight, together with fullwidth, circled, square forms. See https://unicode.org/reports/tr10/#Tertiary_Weight_Table for details.
          - _co - Case only - enables a dedicated Level 2.5 only consisting of the case characteristics (upper vs lower), without other tertiary weight forms.

          h2. Identity sensitivity flags
          - _ii - identity insensitive - disables Level 5 (default)
          - _is - identity sensitivy - enables Level 5 (full binary equality)

          h2. Canonical collation names

          The collation name parser will understand all flags (even default once) in any arbitrary order.

          The canonical names (i.e. as displayed in {{SHOW CREATE}} statements or {{I_S}} queries) will print flags in the order described above.

          The accent and case sensitivity flags will always be printed in canonical names, even with default values.

          Other flags will be printed only if they have a non-default value.

          h2. Examples:
          - utf8mb4_1400_as_ci - a generic Unicode-14.0.0 collation, accent sensitive, case insensitive.
          - utf8mb4_cs_1400_nopad_vs_ai_cs_is - a Czech Unicode-14.0.0 collation, with punctuation shifted from Level 1 to Level 4, accent insensitive, case sensitive, identity sensitive.

          h2. Notes
          The language part will intentionally reside between the character set and the version, to avoid confusing languages with flags. E.g. in the last example, the first "cs" stands for Czech language, the second "cs" stands for case sensitivity.

          h2. Disclaimer
          We won't implement all flags mentioned here in a single patch. They will be added in steps under terms of different tasks.

          Variable weighting and Identity sensitivity flags will most likely be implemented later than other flags.
          bar Alexander Barkov made changes -
          Description As of version 10.7, MariaDB understands the following flags in collation names:
          - _ci for case insensitive collations
          - _cs for case sensitive collations
          - {{_nopad\_}} for NO PAD collations

          We eventually want to support all customizations (collation preferences) as described in:
          https://unicode.org/reports/tr10/#Customization

          This new naming convension will encode more flags inside collation names.

          This new naming conversion will be applied to newly added UCA based collations. Old collation names will stay untouched.

          h2. Disclaimer
          h2. Collation name structure
          The whole collation name structure will consist of the following parts delimited by underscores:

          - Character set name
          - Optional CLDR language name (based on two-letter ISO 639-1 language codes). This part will be omitted if the collation is based on a UCA collation without any language specific rules.
          - Unicode version: two digit major version, one digit minor version, one digit patch version (e.g. 0900 for Unicode-9.0.0 or 1400 for Unicode-14.0.0).
          - Flags, as described below


          h2. PAD flags
          - _pad - NO PAD (default)
          - _nopad - PAD SPACE

          h2. Variable Weighting (punctuation) flags
          - _vn — "Variable non-ignorable" - handles variable characters on Level 1 (default)
          - _vs — "Variable shifted" - shifts punctuation from Level 1 to Level 4 and enables Level 4.
          - _vb — "Variable blanked" - shifts punctuation from Level 1 to Level 4 but does not enable Level 4. (TODO: let's double check what blanked really means in UCA)

          h2. Accent sensitivity flags
          - _ai — Accent insensitive - disables Level 2 (default).
          - _as - Accent sensitive - enables Level 2.

          h2. Case sensitivity flags
          - _ci - Case insensitive - disables Level 3 (default).
          - _cs — Case sensitive - enables Level 3. Case difference is handled according to tertiary weight, together with fullwidth, circled, square forms. See https://unicode.org/reports/tr10/#Tertiary_Weight_Table for details.
          - _co - Case only - enables a dedicated Level 2.5 only consisting of the case characteristics (upper vs lower), without other tertiary weight forms.

          h2. Identity sensitivity flags
          - _ii - identity insensitive - disables Level 5 (default)
          - _is - identity sensitivy - enables Level 5 (full binary equality)

          h2. Canonical collation names

          The collation name parser will understand all flags (even default once) in any arbitrary order.

          The canonical names (i.e. as displayed in {{SHOW CREATE}} statements or {{I_S}} queries) will print flags in the order described above.

          The accent and case sensitivity flags will always be printed in canonical names, even with default values.

          Other flags will be printed only if they have a non-default value.

          h2. Examples:
          - utf8mb4_1400_as_ci - a generic Unicode-14.0.0 collation, accent sensitive, case insensitive.
          - utf8mb4_cs_1400_nopad_vs_ai_cs_is - a Czech Unicode-14.0.0 collation, with punctuation shifted from Level 1 to Level 4, accent insensitive, case sensitive, identity sensitive.

          h2. Notes
          The language part will intentionally reside between the character set and the version, to avoid confusing languages with flags. E.g. in the last example, the first "cs" stands for Czech language, the second "cs" stands for case sensitivity.

          h2. Disclaimer
          We won't implement all flags mentioned here in a single patch. They will be added in steps under terms of different tasks.

          Variable weighting and Identity sensitivity flags will most likely be implemented later than other flags.
          As of version 10.7, MariaDB understands the following flags in collation names:
          - _ci for case insensitive collations
          - _cs for case sensitive collations
          - {{_nopad\_}} for NO PAD collations

          We eventually want to support all customizations (collation preferences) as described in:
          https://unicode.org/reports/tr10/#Customization

          This new naming convension will encode more flags inside collation names.

          This new naming conversion will be applied to newly added UCA based collations. Old collation names will stay untouched.

          h2. Collation name structure
          The whole collation name structure will consist of the following parts delimited by underscores:

          - Character set name
          - Optional CLDR language name (based on two-letter ISO 639-1 language codes). This part will be omitted if the collation is based on a UCA collation without any language specific rules.
          - Unicode version: two digit major version, one digit minor version, one digit patch version (e.g. 0900 for Unicode-9.0.0 or 1400 for Unicode-14.0.0).
          - Flags, as described below


          h2. PAD flags
          - _pad - NO PAD (default)
          - _nopad - PAD SPACE

          h2. Variable Weighting (punctuation) flags
          - _vn — "Variable non-ignorable" - handles variable characters on Level 1 (default)
          - _vs — "Variable shifted" - shifts punctuation from Level 1 to Level 4 and enables Level 4.
          - _vb — "Variable blanked" - shifts punctuation from Level 1 to Level 4 but does not enable Level 4. (TODO: let's double check what blanked really means in UCA)

          h2. Accent sensitivity flags
          - _ai — Accent insensitive - disables Level 2 (default).
          - _as - Accent sensitive - enables Level 2.

          h2. Case sensitivity flags
          - _ci - Case insensitive - disables Level 3 (default).
          - _cs — Case sensitive - enables Level 3. Case difference is handled according to tertiary weight, together with fullwidth, circled, square forms. See https://unicode.org/reports/tr10/#Tertiary_Weight_Table for details.
          - _co - Case only - enables a dedicated Level 2.5 only consisting of the case characteristics (upper vs lower), without other tertiary weight forms.

          h2. Identity sensitivity flags
          - _ii - identity insensitive - disables Level 5 (default)
          - _is - identity sensitivy - enables Level 5 (full binary equality)

          h2. Canonical collation names

          The collation name parser will understand all flags (even default once) in any arbitrary order.

          The canonical names (i.e. as displayed in {{SHOW CREATE}} statements or {{I_S}} queries) will print flags in the order described above.

          The accent and case sensitivity flags will always be printed in canonical names, even with default values.

          Other flags will be printed only if they have a non-default value.

          h2. Examples:
          - utf8mb4_1400_as_ci - a generic Unicode-14.0.0 collation, accent sensitive, case insensitive.
          - utf8mb4_cs_1400_nopad_vs_ai_cs_is - a Czech Unicode-14.0.0 collation, with punctuation shifted from Level 1 to Level 4, accent insensitive, case sensitive, identity sensitive.

          h2. Notes
          The language part will intentionally reside between the character set and the version, to avoid confusing languages with flags. E.g. in the last example, the first "cs" stands for Czech language, the second "cs" stands for case sensitivity.

          h2. Disclaimer
          We won't implement all flags mentioned here in a single patch. They will be added in steps under terms of different tasks.

          Variable weighting and Identity sensitivity flags will most likely be implemented later than other flags.
          bar Alexander Barkov made changes -
          Description As of version 10.7, MariaDB understands the following flags in collation names:
          - _ci for case insensitive collations
          - _cs for case sensitive collations
          - {{_nopad\_}} for NO PAD collations

          We eventually want to support all customizations (collation preferences) as described in:
          https://unicode.org/reports/tr10/#Customization

          This new naming convension will encode more flags inside collation names.

          This new naming conversion will be applied to newly added UCA based collations. Old collation names will stay untouched.

          h2. Collation name structure
          The whole collation name structure will consist of the following parts delimited by underscores:

          - Character set name
          - Optional CLDR language name (based on two-letter ISO 639-1 language codes). This part will be omitted if the collation is based on a UCA collation without any language specific rules.
          - Unicode version: two digit major version, one digit minor version, one digit patch version (e.g. 0900 for Unicode-9.0.0 or 1400 for Unicode-14.0.0).
          - Flags, as described below


          h2. PAD flags
          - _pad - NO PAD (default)
          - _nopad - PAD SPACE

          h2. Variable Weighting (punctuation) flags
          - _vn — "Variable non-ignorable" - handles variable characters on Level 1 (default)
          - _vs — "Variable shifted" - shifts punctuation from Level 1 to Level 4 and enables Level 4.
          - _vb — "Variable blanked" - shifts punctuation from Level 1 to Level 4 but does not enable Level 4. (TODO: let's double check what blanked really means in UCA)

          h2. Accent sensitivity flags
          - _ai — Accent insensitive - disables Level 2 (default).
          - _as - Accent sensitive - enables Level 2.

          h2. Case sensitivity flags
          - _ci - Case insensitive - disables Level 3 (default).
          - _cs — Case sensitive - enables Level 3. Case difference is handled according to tertiary weight, together with fullwidth, circled, square forms. See https://unicode.org/reports/tr10/#Tertiary_Weight_Table for details.
          - _co - Case only - enables a dedicated Level 2.5 only consisting of the case characteristics (upper vs lower), without other tertiary weight forms.

          h2. Identity sensitivity flags
          - _ii - identity insensitive - disables Level 5 (default)
          - _is - identity sensitivy - enables Level 5 (full binary equality)

          h2. Canonical collation names

          The collation name parser will understand all flags (even default once) in any arbitrary order.

          The canonical names (i.e. as displayed in {{SHOW CREATE}} statements or {{I_S}} queries) will print flags in the order described above.

          The accent and case sensitivity flags will always be printed in canonical names, even with default values.

          Other flags will be printed only if they have a non-default value.

          h2. Examples:
          - utf8mb4_1400_as_ci - a generic Unicode-14.0.0 collation, accent sensitive, case insensitive.
          - utf8mb4_cs_1400_nopad_vs_ai_cs_is - a Czech Unicode-14.0.0 collation, with punctuation shifted from Level 1 to Level 4, accent insensitive, case sensitive, identity sensitive.

          h2. Notes
          The language part will intentionally reside between the character set and the version, to avoid confusing languages with flags. E.g. in the last example, the first "cs" stands for Czech language, the second "cs" stands for case sensitivity.

          h2. Disclaimer
          We won't implement all flags mentioned here in a single patch. They will be added in steps under terms of different tasks.

          Variable weighting and Identity sensitivity flags will most likely be implemented later than other flags.
          As of version 10.7, MariaDB understands the following flags in collation names:
          - _ci for case insensitive collations
          - _cs for case sensitive collations
          - {{_nopad\_}} for NO PAD collations

          We eventually want to support all customizations (collation preferences) as described in:
          https://unicode.org/reports/tr10/#Customization

          This new naming convension will encode more flags inside collation names.

          This new naming conversion will be applied to newly added UCA based collations. Old collation names will stay untouched.

          h2. Collation name structure
          The whole collation name structure will consist of the following parts delimited by underscores:

          - Character set name
          - Optional CLDR language name (based on two-letter ISO 639-1 language codes). This part will be omitted if the collation is based on a UCA collation without any language specific rules.
          - Unicode version: two digit major version, one digit minor version, one digit patch version (e.g. 0900 for Unicode-9.0.0 or 1400 for Unicode-14.0.0).
          - Flags, as described below


          h2. PAD flags
          - _pad - NO PAD (default)
          - _nopad - PAD SPACE

          h2. Variable Weighting (punctuation) flags
          - _vn — "Variable non-ignorable" - handles variable characters on Level 1 (default)
          - _vs — "Variable shifted" - shifts punctuation from Level 1 to Level 4 and enables Level 4.
          - _vb — "Variable blanked" - variable collation elements are reset so that all weights (except for the identical level) are zero.

          h2. Accent sensitivity flags
          - _ai — Accent insensitive - disables Level 2 (default).
          - _as - Accent sensitive - enables Level 2.

          h2. Case sensitivity flags
          - _ci - Case insensitive - disables Level 3 (default).
          - _cs — Case sensitive - enables Level 3. Case difference is handled according to tertiary weight, together with fullwidth, circled, square forms. See https://unicode.org/reports/tr10/#Tertiary_Weight_Table for details.
          - _co - Case only - enables a dedicated Level 2.5 only consisting of the case characteristics (upper vs lower), without other tertiary weight forms.

          h2. Identity sensitivity flags
          - _ii - identity insensitive - disables Level 5 (default)
          - _is - identity sensitivy - enables Level 5 (full binary equality)

          h2. Canonical collation names

          The collation name parser will understand all flags (even default once) in any arbitrary order.

          The canonical names (i.e. as displayed in {{SHOW CREATE}} statements or {{I_S}} queries) will print flags in the order described above.

          The accent and case sensitivity flags will always be printed in canonical names, even with default values.

          Other flags will be printed only if they have a non-default value.

          h2. Examples:
          - utf8mb4_1400_as_ci - a generic Unicode-14.0.0 collation, accent sensitive, case insensitive.
          - utf8mb4_cs_1400_nopad_vs_ai_cs_is - a Czech Unicode-14.0.0 collation, with punctuation shifted from Level 1 to Level 4, accent insensitive, case sensitive, identity sensitive.

          h2. Notes
          The language part will intentionally reside between the character set and the version, to avoid confusing languages with flags. E.g. in the last example, the first "cs" stands for Czech language, the second "cs" stands for case sensitivity.

          h2. Disclaimer
          We won't implement all flags mentioned here in a single patch. They will be added in steps under terms of different tasks.

          Variable weighting and Identity sensitivity flags will most likely be implemented later than other flags.
          bar Alexander Barkov made changes -
          Summary New collation naming convention New naming convention for UCA collations
          bar Alexander Barkov made changes -
          bar Alexander Barkov made changes -
          bar Alexander Barkov made changes -
          bar Alexander Barkov made changes -
          serg Sergei Golubchik made changes -
          Fix Version/s 10.9 [ 26905 ]
          Fix Version/s 10.8 [ 26121 ]
          bar Alexander Barkov made changes -
          bar Alexander Barkov made changes -
          Description As of version 10.7, MariaDB understands the following flags in collation names:
          - _ci for case insensitive collations
          - _cs for case sensitive collations
          - {{_nopad\_}} for NO PAD collations

          We eventually want to support all customizations (collation preferences) as described in:
          https://unicode.org/reports/tr10/#Customization

          This new naming convension will encode more flags inside collation names.

          This new naming conversion will be applied to newly added UCA based collations. Old collation names will stay untouched.

          h2. Collation name structure
          The whole collation name structure will consist of the following parts delimited by underscores:

          - Character set name
          - Optional CLDR language name (based on two-letter ISO 639-1 language codes). This part will be omitted if the collation is based on a UCA collation without any language specific rules.
          - Unicode version: two digit major version, one digit minor version, one digit patch version (e.g. 0900 for Unicode-9.0.0 or 1400 for Unicode-14.0.0).
          - Flags, as described below


          h2. PAD flags
          - _pad - NO PAD (default)
          - _nopad - PAD SPACE

          h2. Variable Weighting (punctuation) flags
          - _vn — "Variable non-ignorable" - handles variable characters on Level 1 (default)
          - _vs — "Variable shifted" - shifts punctuation from Level 1 to Level 4 and enables Level 4.
          - _vb — "Variable blanked" - variable collation elements are reset so that all weights (except for the identical level) are zero.

          h2. Accent sensitivity flags
          - _ai — Accent insensitive - disables Level 2 (default).
          - _as - Accent sensitive - enables Level 2.

          h2. Case sensitivity flags
          - _ci - Case insensitive - disables Level 3 (default).
          - _cs — Case sensitive - enables Level 3. Case difference is handled according to tertiary weight, together with fullwidth, circled, square forms. See https://unicode.org/reports/tr10/#Tertiary_Weight_Table for details.
          - _co - Case only - enables a dedicated Level 2.5 only consisting of the case characteristics (upper vs lower), without other tertiary weight forms.

          h2. Identity sensitivity flags
          - _ii - identity insensitive - disables Level 5 (default)
          - _is - identity sensitivy - enables Level 5 (full binary equality)

          h2. Canonical collation names

          The collation name parser will understand all flags (even default once) in any arbitrary order.

          The canonical names (i.e. as displayed in {{SHOW CREATE}} statements or {{I_S}} queries) will print flags in the order described above.

          The accent and case sensitivity flags will always be printed in canonical names, even with default values.

          Other flags will be printed only if they have a non-default value.

          h2. Examples:
          - utf8mb4_1400_as_ci - a generic Unicode-14.0.0 collation, accent sensitive, case insensitive.
          - utf8mb4_cs_1400_nopad_vs_ai_cs_is - a Czech Unicode-14.0.0 collation, with punctuation shifted from Level 1 to Level 4, accent insensitive, case sensitive, identity sensitive.

          h2. Notes
          The language part will intentionally reside between the character set and the version, to avoid confusing languages with flags. E.g. in the last example, the first "cs" stands for Czech language, the second "cs" stands for case sensitivity.

          h2. Disclaimer
          We won't implement all flags mentioned here in a single patch. They will be added in steps under terms of different tasks.

          Variable weighting and Identity sensitivity flags will most likely be implemented later than other flags.
          As of version 10.7, MariaDB understands the following flags in collation names:
          - _ci for case insensitive collations
          - _cs for case sensitive collations
          - {{_nopad\_}} for NO PAD collations

          We eventually want to support all customizations (collation preferences) as described in:
          https://unicode.org/reports/tr10/#Customization

          This new naming convention will encode more flags inside collation names.

          This new naming conversion will be applied to newly added UCA based collations. Old collation names will stay untouched.

          h2. Collation name structure
          The whole collation name structure will consist of the following parts delimited by underscores:

          - Character set name
          - Unicode collation algorithm version: letters "uca" followed by two digit major version, one digit minor version, one digit patch version (e.g. uca1400 for Unicode-14.0.0).
          - Optional tailoring name (usually a language name). This part will be omitted if the collation is based on a UCA collation without any language specific rules.
          - Flags, as described below


          h2. PAD flags
          - _pad - NO PAD (default)
          - _nopad - PAD SPACE

          h2. Variable Weighting (punctuation) flags
          - _vn — "Variable non-ignorable" - handles variable characters on Level 1 (default)
          - _vs — "Variable shifted" - shifts punctuation from Level 1 to Level 4 and enables Level 4.
          - _vb — "Variable blanked" - variable collation elements are reset so that all weights (except for the identical level) are zero.

          h2. Accent sensitivity flags
          - _ai — Accent insensitive - disables Level.
          - _as - Accent sensitive - enables Level 2.

          h2. Case sensitivity flags
          - _ci - Case insensitive - disables Level 3.
          - _cs — Case sensitive - enables Level 3. Case difference is handled according to tertiary weight, together with fullwidth, circled, square forms. See https://unicode.org/reports/tr10/#Tertiary_Weight_Table for details.
          - _co - Case only - enables a dedicated Level 2.5 only consisting of the case characteristics (upper vs lower), without other tertiary weight forms.

          h2. Identity sensitivity flags
          - _ii - identity insensitive - disables Level 5 (default)
          - _is - identity sensitivy - enables Level 5 (full binary equality)

          h2. Canonical collation names

          The collation name parser will understand flags in the described above order, e.g.
          - _as_ci - correct
          - _ci_as - incorrect

          The canonical names (i.e. as displayed in {{SHOW CREATE}} statements or {{I_S}} queries) will also print flags in the order described above.

          The accent and case sensitivity flags will always be printed in canonical names, even with default values.

          Other flags will be printed only if they have a non-default value.

          h2. Examples:
          - utf8mb4_uca1400_as_ci - a generic Unicode-14.0.0 collation, accent sensitive, case insensitive.
          - utf8mb4_uca1400_czech_nopad_vs_ai_cs_is - a Czech Unicode-14.0.0 collation, with punctuation shifted from Level 1 to Level 4, accent insensitive, case sensitive, identity sensitive.

          h2. Disclaimer
          We won't implement all flags mentioned here in a single patch. They will be added in steps under terms of different tasks.

          Variable weighting and Identity sensitivity flags will most likely be implemented later than other flags.
          serg Sergei Golubchik made changes -
          Fix Version/s 10.10 [ 27530 ]
          Fix Version/s 10.9 [ 26905 ]
          serg Sergei Golubchik made changes -
          Fix Version/s 10.11 [ 27614 ]
          Fix Version/s 10.10 [ 27530 ]
          ralf.gebhardt Ralf Gebhardt made changes -
          Fix Version/s 11.2 [ 28603 ]
          Fix Version/s 10.11 [ 27614 ]
          ralf.gebhardt Ralf Gebhardt made changes -
          Fix Version/s 11.3 [ 28565 ]
          Fix Version/s 11.2 [ 28603 ]
          serg Sergei Golubchik made changes -
          Fix Version/s 11.4 [ 29301 ]
          Fix Version/s 11.3 [ 28565 ]
          serg Sergei Golubchik made changes -
          Fix Version/s 11.5 [ 29506 ]
          Fix Version/s 11.4 [ 29301 ]
          serg Sergei Golubchik made changes -
          Issue Type Task [ 3 ] New Feature [ 2 ]
          ralf.gebhardt Ralf Gebhardt made changes -
          Fix Version/s 11.6 [ 29515 ]
          Fix Version/s 11.5 [ 29506 ]
          ralf.gebhardt Ralf Gebhardt made changes -
          Fix Version/s 11.6 [ 29515 ]

          People

            bar Alexander Barkov
            bar Alexander Barkov
            Votes:
            1 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.