Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-31340

Remove MY_COLLATION_HANDLER::strcasecmp()

Details

    Description

      There are two parallel comparison systems in MariaDB collation library, implemented as virtual functions in MY_COLLATION_HANDLER:

      • Comparison according to the collation, provided by these functions

          int     (*strnncoll)(CHARSET_INFO *,
                               const uchar *, size_t, const uchar *, size_t, my_bool);
          int     (*strnncollsp)(CHARSET_INFO *,
                                 const uchar *, size_t, const uchar *, size_t);
         
          int     (*strnncollsp_nchars)(CHARSET_INFO *,
                                        const uchar *str1, size_t len1,
                                        const uchar *str2, size_t len2,
                                        size_t nchars,
                                        uint flags);
        

      • Comparison in case insensitive (but accent sensitive) style, implemented by this function:

          int  (*strcasecmp)(CHARSET_INFO *, const char *, const char *);
        

        Note, accent and case sensitivity of the collation does not matter. strcasecmp() always works using accent sensitive case insensitive comparison style.

      These two parallel systems are redundant.

      Note, strcasecmp() is used mostly to compare identifiers, while the functions of the first group are used to compare data.

      Let's get rid of the second comparison system:
      1. Remove MY_COLLATION_HANDLER::strcasecmp()
      2. Introduce a new collation utf8mb4_general1400_as_ci. Note, it should work for the entire Unicode range U+0000..U+10FFFF.
      3. Replace all calls for:

      system_charset_info->coll->strcasecmp()
      

      to calls for

      my_charset_utf8mb4_general1400_as_ci->coll->strnncoll***()
      

      The change would generally be quite mechanic. However, there is one small problem: strcasecmp() accepts 0-terminated strings, while the strnncoll-alike functions accept the pointer and the length. So some refactoring will be needed. Note, Monty earlier changed many MariaDB C data types to use LEX_CSTRING (instead of just a const char pointer) to store names. So this part of the current task will switch some more C data types to LEX_CSTRING.

      tolower vs toupper comparison

      Another option is to implement utf8mb3_general1400_as_ci which will compare upper cases (instead of lower cases).

      The difference is in a few dozen BMP characters. This script finds all those characters:

      CREATE OR REPLACE TABLE t1 (a CHAR(1) CHARACTER SET utf8mb4 COLLATE utf8mb4_uca1400_ai_ci) ENGINE=MyISAM;
      DELIMITER $$
      FOR i IN 0..0xFFFF
      DO
        INSERT INTO t1 VALUES (CHAR(i USING ucs2));
      END FOR;
      $$
      DELIMITER ;
      ALTER TABLE t1
        ADD has_casefolding INT DEFAULT (BINARY LOWER(a)<>a OR BINARY UPPER(a)<>a),
        ADD KEY(has_casefolding, a);
       
      CREATE OR REPLACE TABLE t21 AS
      SELECT
        HEX(t1.a) AS hex_a, HEX(t2.a) AS hex_b,
        BINARY LOWER(t1.a)=LOWER(t2.a) eq_lower,
        BINARY UPPER(t1.a)=UPPER(t2.a) AS eq_upper,
        t1.a AS a, t2.a AS b
      FROM
        t1 t1, t1 t2
      WHERE
        t1.has_casefolding=1
      AND (BINARY LOWER(t1.a)=LOWER(t2.a))<>(BINARY UPPER(t1.a)=UPPER(t2.a));
       
      SELECT
        HEX(CONVERT(a USING ucs2)) AS unicode_a,
        HEX(CONVERT(b USING ucs2)) AS unicode_b,
        t21.* FROM t21;
      

      +-----------+-----------+--------+--------+----------+----------+------+------+
      | unicode_a | unicode_b | hex_a  | hex_b  | eq_lower | eq_upper | a    | b    |
      +-----------+-----------+--------+--------+----------+----------+------+------+
      |      1E9E |      00DF | E1BA9E |   C39F |        1 |        0 |    ẞ |    ß |
      |      0399 |      0345 |   CE99 |   CD85 |        0 |        1 |    Ι |    ͅ |
      |      03B9 |      0345 |   CEB9 |   CD85 |        0 |        1 |    ι |    ͅ |
      |      1FBE |      0345 | E1BEBE |   CD85 |        0 |        1 |    ι |    ͅ |
      |      212B |      00C5 | E284AB |   C385 |        1 |        0 |    Å |    Å |
      |      212B |      00E5 | E284AB |   C3A5 |        1 |        0 |    Å |    å |
      |      00C5 |      212B |   C385 | E284AB |        1 |        0 |    Å |    Å |
      |      00E5 |      212B |   C3A5 | E284AB |        1 |        0 |    å |    Å |
      |      0130 |      0049 |   C4B0 |     49 |        1 |        0 |    İ |    I |
      |      0131 |      0049 |   C4B1 |     49 |        0 |        1 |    ı |    I |
      |      0130 |      0069 |   C4B0 |     69 |        1 |        0 |    İ |    i |
      |      0131 |      0069 |   C4B1 |     69 |        0 |        1 |    ı |    i |
      |      0049 |      0130 |     49 |   C4B0 |        1 |        0 |    I |    İ |
      |      0069 |      0130 |     69 |   C4B0 |        1 |        0 |    i |    İ |
      |      0049 |      0131 |     49 |   C4B1 |        0 |        1 |    I |    ı |
      |      0069 |      0131 |     69 |   C4B1 |        0 |        1 |    i |    ı |
      |      212A |      004B | E284AA |     4B |        1 |        0 |    K |    K |
      |      212A |      006B | E284AA |     6B |        1 |        0 |    K |    k |
      |      004B |      212A |     4B | E284AA |        1 |        0 |    K |    K |
      |      006B |      212A |     6B | E284AA |        1 |        0 |    k |    K |
      |      017F |      0053 |   C5BF |     53 |        0 |        1 |    ſ |    S |
      |      017F |      0073 |   C5BF |     73 |        0 |        1 |    ſ |    s |
      |      0053 |      017F |     53 |   C5BF |        0 |        1 |    S |    ſ |
      |      0073 |      017F |     73 |   C5BF |        0 |        1 |    s |    ſ |
      |      1E9B |      1E60 | E1BA9B | E1B9A0 |        0 |        1 |    ẛ |    Ṡ |
      |      1E9B |      1E61 | E1BA9B | E1B9A1 |        0 |        1 |    ẛ |    ṡ |
      |      1E60 |      1E9B | E1B9A0 | E1BA9B |        0 |        1 |    Ṡ |    ẛ |
      |      1E61 |      1E9B | E1B9A1 | E1BA9B |        0 |        1 |    ṡ |    ẛ |
      |      03D0 |      0392 |   CF90 |   CE92 |        0 |        1 |    ϐ |    Β |
      |      03D0 |      03B2 |   CF90 |   CEB2 |        0 |        1 |    ϐ |    β |
      |      0392 |      03D0 |   CE92 |   CF90 |        0 |        1 |    Β |    ϐ |
      |      03B2 |      03D0 |   CEB2 |   CF90 |        0 |        1 |    β |    ϐ |
      |      03F5 |      0395 |   CFB5 |   CE95 |        0 |        1 |    ϵ |    Ε |
      |      03F5 |      03B5 |   CFB5 |   CEB5 |        0 |        1 |    ϵ |    ε |
      |      0395 |      03F5 |   CE95 |   CFB5 |        0 |        1 |    Ε |    ϵ |
      |      03B5 |      03F5 |   CEB5 |   CFB5 |        0 |        1 |    ε |    ϵ |
      |      03D1 |      0398 |   CF91 |   CE98 |        0 |        1 |    ϑ |    Θ |
      |      03F4 |      0398 |   CFB4 |   CE98 |        1 |        0 |    ϴ |    Θ |
      |      03D1 |      03B8 |   CF91 |   CEB8 |        0 |        1 |    ϑ |    θ |
      |      03F4 |      03B8 |   CFB4 |   CEB8 |        1 |        0 |    ϴ |    θ |
      |      0398 |      03D1 |   CE98 |   CF91 |        0 |        1 |    Θ |    ϑ |
      |      03B8 |      03D1 |   CEB8 |   CF91 |        0 |        1 |    θ |    ϑ |
      |      0398 |      03F4 |   CE98 |   CFB4 |        1 |        0 |    Θ |    ϴ |
      |      03B8 |      03F4 |   CEB8 |   CFB4 |        1 |        0 |    θ |    ϴ |
      |      0345 |      0399 |   CD85 |   CE99 |        0 |        1 |    ͅ |    Ι |
      |      1FBE |      0399 | E1BEBE |   CE99 |        0 |        1 |    ι |    Ι |
      |      0345 |      03B9 |   CD85 |   CEB9 |        0 |        1 |    ͅ |    ι |
      |      1FBE |      03B9 | E1BEBE |   CEB9 |        0 |        1 |    ι |    ι |
      |      0345 |      1FBE |   CD85 | E1BEBE |        0 |        1 |    ͅ |    ι |
      |      0399 |      1FBE |   CE99 | E1BEBE |        0 |        1 |    Ι |    ι |
      |      03B9 |      1FBE |   CEB9 | E1BEBE |        0 |        1 |    ι |    ι |
      |      03F0 |      039A |   CFB0 |   CE9A |        0 |        1 |    ϰ |    Κ |
      |      03F0 |      03BA |   CFB0 |   CEBA |        0 |        1 |    ϰ |    κ |
      |      039A |      03F0 |   CE9A |   CFB0 |        0 |        1 |    Κ |    ϰ |
      |      03BA |      03F0 |   CEBA |   CFB0 |        0 |        1 |    κ |    ϰ |
      |      039C |      00B5 |   CE9C |   C2B5 |        0 |        1 |    Μ |    µ |
      |      03BC |      00B5 |   CEBC |   C2B5 |        0 |        1 |    μ |    µ |
      |      00B5 |      039C |   C2B5 |   CE9C |        0 |        1 |    µ |    Μ |
      |      00B5 |      03BC |   C2B5 |   CEBC |        0 |        1 |    µ |    μ |
      |      03D6 |      03A0 |   CF96 |   CEA0 |        0 |        1 |    ϖ |    Π |
      |      03D6 |      03C0 |   CF96 |   CF80 |        0 |        1 |    ϖ |    π |
      |      03A0 |      03D6 |   CEA0 |   CF96 |        0 |        1 |    Π |    ϖ |
      |      03C0 |      03D6 |   CF80 |   CF96 |        0 |        1 |    π |    ϖ |
      |      03F1 |      03A1 |   CFB1 |   CEA1 |        0 |        1 |    ϱ |    Ρ |
      |      03F1 |      03C1 |   CFB1 |   CF81 |        0 |        1 |    ϱ |    ρ |
      |      03A1 |      03F1 |   CEA1 |   CFB1 |        0 |        1 |    Ρ |    ϱ |
      |      03C1 |      03F1 |   CF81 |   CFB1 |        0 |        1 |    ρ |    ϱ |
      |      03C2 |      03A3 |   CF82 |   CEA3 |        0 |        1 |    ς |    Σ |
      |      03A3 |      03C2 |   CEA3 |   CF82 |        0 |        1 |    Σ |    ς |
      |      03C3 |      03C2 |   CF83 |   CF82 |        0 |        1 |    σ |    ς |
      |      03C2 |      03C3 |   CF82 |   CF83 |        0 |        1 |    ς |    σ |
      |      03D5 |      03A6 |   CF95 |   CEA6 |        0 |        1 |    ϕ |    Φ |
      |      03D5 |      03C6 |   CF95 |   CF86 |        0 |        1 |    ϕ |    φ |
      |      03A6 |      03D5 |   CEA6 |   CF95 |        0 |        1 |    Φ |    ϕ |
      |      03C6 |      03D5 |   CF86 |   CF95 |        0 |        1 |    φ |    ϕ |
      |      2126 |      03A9 | E284A6 |   CEA9 |        1 |        0 |    Ω |    Ω |
      |      2126 |      03C9 | E284A6 |   CF89 |        1 |        0 |    Ω |    ω |
      |      03A9 |      2126 |   CEA9 | E284A6 |        1 |        0 |    Ω |    Ω |
      |      03C9 |      2126 |   CF89 | E284A6 |        1 |        0 |    ω |    Ω |
      |      1C80 |      0412 | E1B280 |   D092 |        0 |        1 |    ᲀ |    В |
      |      1C80 |      0432 | E1B280 |   D0B2 |        0 |        1 |    ᲀ |    в |
      |      0412 |      1C80 |   D092 | E1B280 |        0 |        1 |    В |    ᲀ |
      |      0432 |      1C80 |   D0B2 | E1B280 |        0 |        1 |    в |    ᲀ |
      |      1C81 |      0414 | E1B281 |   D094 |        0 |        1 |    ᲁ |    Д |
      |      1C81 |      0434 | E1B281 |   D0B4 |        0 |        1 |    ᲁ |    д |
      |      0414 |      1C81 |   D094 | E1B281 |        0 |        1 |    Д |    ᲁ |
      |      0434 |      1C81 |   D0B4 | E1B281 |        0 |        1 |    д |    ᲁ |
      |      1C82 |      041E | E1B282 |   D09E |        0 |        1 |    ᲂ |    О |
      |      1C82 |      043E | E1B282 |   D0BE |        0 |        1 |    ᲂ |    о |
      |      041E |      1C82 |   D09E | E1B282 |        0 |        1 |    О |    ᲂ |
      |      043E |      1C82 |   D0BE | E1B282 |        0 |        1 |    о |    ᲂ |
      |      1C83 |      0421 | E1B283 |   D0A1 |        0 |        1 |    ᲃ |    С |
      |      1C83 |      0441 | E1B283 |   D181 |        0 |        1 |    ᲃ |    с |
      |      0421 |      1C83 |   D0A1 | E1B283 |        0 |        1 |    С |    ᲃ |
      |      0441 |      1C83 |   D181 | E1B283 |        0 |        1 |    с |    ᲃ |
      |      1C84 |      0422 | E1B284 |   D0A2 |        0 |        1 |    ᲄ |    Т |
      |      1C85 |      0422 | E1B285 |   D0A2 |        0 |        1 |    ᲅ |    Т |
      |      1C84 |      0442 | E1B284 |   D182 |        0 |        1 |    ᲄ |    т |
      |      1C85 |      0442 | E1B285 |   D182 |        0 |        1 |    ᲅ |    т |
      |      0422 |      1C84 |   D0A2 | E1B284 |        0 |        1 |    Т |    ᲄ |
      |      0442 |      1C84 |   D182 | E1B284 |        0 |        1 |    т |    ᲄ |
      |      1C85 |      1C84 | E1B285 | E1B284 |        0 |        1 |    ᲅ |    ᲄ |
      |      0422 |      1C85 |   D0A2 | E1B285 |        0 |        1 |    Т |    ᲅ |
      |      0442 |      1C85 |   D182 | E1B285 |        0 |        1 |    т |    ᲅ |
      |      1C84 |      1C85 | E1B284 | E1B285 |        0 |        1 |    ᲄ |    ᲅ |
      |      A64A |      1C88 | EA998A | E1B288 |        0 |        1 |    Ꙋ |    ᲈ |
      |      A64B |      1C88 | EA998B | E1B288 |        0 |        1 |    ꙋ |    ᲈ |
      |      1C88 |      A64A | E1B288 | EA998A |        0 |        1 |    ᲈ |    Ꙋ |
      |      1C88 |      A64B | E1B288 | EA998B |        0 |        1 |    ᲈ |    ꙋ |
      |      1C86 |      042A | E1B286 |   D0AA |        0 |        1 |    ᲆ |    Ъ |
      |      1C86 |      044A | E1B286 |   D18A |        0 |        1 |    ᲆ |    ъ |
      |      042A |      1C86 |   D0AA | E1B286 |        0 |        1 |    Ъ |    ᲆ |
      |      044A |      1C86 |   D18A | E1B286 |        0 |        1 |    ъ |    ᲆ |
      |      1C87 |      0462 | E1B287 |   D1A2 |        0 |        1 |    ᲇ |    Ѣ |
      |      1C87 |      0463 | E1B287 |   D1A3 |        0 |        1 |    ᲇ |    ѣ |
      |      0462 |      1C87 |   D1A2 | E1B287 |        0 |        1 |    Ѣ |    ᲇ |
      |      0463 |      1C87 |   D1A3 | E1B287 |        0 |        1 |    ѣ |    ᲇ |
      +-----------+-----------+--------+--------+----------+----------+------+------+
      

      Note, toupper comparison considers more pairs as equal than tolower comparison:

      SELECT SUM(eq_lower), SUM(eq_upper) FROM t21;
      

      +---------------+---------------+
      | SUM(eq_lower) | SUM(eq_upper) |
      +---------------+---------------+
      |            21 |            96 |
      +---------------+---------------+
      

      A more compact table with distinct pairs (the character with a smaller code point is on the left)

      SELECT
        HEX(CONVERT(a USING ucs2)) AS unicode_a,
        HEX(CONVERT(b USING ucs2)) AS unicode_b,
        HEX(a), HEX(b),
        BINARY LOWER(a)=LOWER(b) AS eq_lower,
        BINARY UPPER(a)=UPPER(b) AS eq_upper,
        a,b
      FROM
      (
        SELECT DISTINCT IF(BINARY a<b,a,b) AS a,IF(binary a>=b,a,b) AS b from t21
      ) d1
      ORDER BY eq_lower, unicode_a, unicode_b;
      

      +-----------+-----------+--------+--------+----------+----------+-----+-----+
      | unicode_a | unicode_b | HEX(a) | HEX(b) | eq_lower | eq_upper | a   | b   |
      +-----------+-----------+--------+--------+----------+----------+-----+-----+
      |      0049 |      0131 |     49 |   C4B1 |        0 |        1 |   I |   ı |
      |      0053 |      017F |     53 |   C5BF |        0 |        1 |   S |   ſ |
      |      00B5 |      039C |   C2B5 |   CE9C |        0 |        1 |   µ |   Μ |
      |      0345 |      0399 |   CD85 |   CE99 |        0 |        1 |   ͅ |   Ι |
      |      0392 |      03D0 |   CE92 |   CF90 |        0 |        1 |   Β |   ϐ |
      |      0395 |      03F5 |   CE95 |   CFB5 |        0 |        1 |   Ε |   ϵ |
      |      0398 |      03D1 |   CE98 |   CF91 |        0 |        1 |   Θ |   ϑ |
      |      0399 |      1FBE |   CE99 | E1BEBE |        0 |        1 |   Ι |   ι |
      |      039A |      03F0 |   CE9A |   CFB0 |        0 |        1 |   Κ |   ϰ |
      |      03A0 |      03D6 |   CEA0 |   CF96 |        0 |        1 |   Π |   ϖ |
      |      03A1 |      03F1 |   CEA1 |   CFB1 |        0 |        1 |   Ρ |   ϱ |
      |      03A3 |      03C2 |   CEA3 |   CF82 |        0 |        1 |   Σ |   ς |
      |      03A6 |      03D5 |   CEA6 |   CF95 |        0 |        1 |   Φ |   ϕ |
      |      0412 |      1C80 |   D092 | E1B280 |        0 |        1 |   В |   ᲀ |
      |      0414 |      1C81 |   D094 | E1B281 |        0 |        1 |   Д |   ᲁ |
      |      041E |      1C82 |   D09E | E1B282 |        0 |        1 |   О |   ᲂ |
      |      0421 |      1C83 |   D0A1 | E1B283 |        0 |        1 |   С |   ᲃ |
      |      0422 |      1C84 |   D0A2 | E1B284 |        0 |        1 |   Т |   ᲄ |
      |      042A |      1C86 |   D0AA | E1B286 |        0 |        1 |   Ъ |   ᲆ |
      |      0462 |      1C87 |   D1A2 | E1B287 |        0 |        1 |   Ѣ |   ᲇ |
      |      1C88 |      A64A | E1B288 | EA998A |        0 |        1 |   ᲈ |   Ꙋ |
      |      0049 |      0130 |     49 |   C4B0 |        1 |        0 |   I |   İ |
      |      004B |      212A |     4B | E284AA |        1 |        0 |   K |   K |
      |      00C5 |      212B |   C385 | E284AB |        1 |        0 |   Å |   Å |
      |      00DF |      1E9E |   C39F | E1BA9E |        1 |        0 |   ß |   ẞ |
      |      03A9 |      2126 |   CEA9 | E284A6 |        1 |        0 |   Ω |   Ω |
      +-----------+-----------+--------+--------+----------+----------+-----+-----+
      26 rows in set (0.005 sec)
      

      Cons of toupper vs tolower comparison

      • The tolower variant will compare exactly like strcasecmp() did
      • The toupper variant will compare close to how utf8mb3_general_ci works

      Attachments

        Issue Links

          Activity

            bar Alexander Barkov created issue -
            bar Alexander Barkov made changes -
            Field Original Value New Value
            bar Alexander Barkov made changes -
            Priority Major [ 3 ] Critical [ 2 ]
            bar Alexander Barkov made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            serg Sergei Golubchik made changes -
            Fix Version/s 11.3 [ 28565 ]
            Fix Version/s 11.2 [ 28603 ]
            serg Sergei Golubchik made changes -
            Fix Version/s 11.4 [ 29301 ]
            Fix Version/s 11.3 [ 28565 ]
            bar Alexander Barkov made changes -
            Assignee Alexander Barkov [ bar ] Sergei Golubchik [ serg ]
            Status In Progress [ 3 ] In Review [ 10002 ]
            julien.fritsch Julien Fritsch made changes -
            Issue Type Task [ 3 ] New Feature [ 2 ]
            ralf.gebhardt Ralf Gebhardt made changes -
            Issue Type New Feature [ 2 ] Task [ 3 ]
            bar Alexander Barkov made changes -
            Description There are two parallel comparison systems in MariaDB collation library, implemented as virtual functions in MY_COLLATION_HANDLER:

            - Comparison according to the collation, provided by these functions
            {code:cpp}
              int (*strnncoll)(CHARSET_INFO *,
                                   const uchar *, size_t, const uchar *, size_t, my_bool);
              int (*strnncollsp)(CHARSET_INFO *,
                                     const uchar *, size_t, const uchar *, size_t);

              int (*strnncollsp_nchars)(CHARSET_INFO *,
                                            const uchar *str1, size_t len1,
                                            const uchar *str2, size_t len2,
                                            size_t nchars,
                                            uint flags);
            {code}


            - Comparison in case insensitive (but accent sensitive) style, implemented by this function:
            {code:cpp}
              int (*strcasecmp)(CHARSET_INFO *, const char *, const char *);
            {code}
            Note, accent and case sensitivity of the collation does not matter. strcasecmp() always works using accent sensitive case insensitive comparison style.

            These two parallel systems are redundant.

            Note, strcasecmp() is used mostly to compare identifiers, while the functions of the first group are used to compare data.

            Let's get rid of the second comparison system:
            1. Remove MY_COLLATION_HANDLER::strcasecmp()
            2. Introduce a new collation utf8mb4_tolower_as_ci. Note, it should work for the entire Unicode range U+0000..U+10FFFF.
            3. Replace all calls for:
            {code:cpp}
            system_charset_info->coll->strcasecmp()
            {code}
            to calls for
            {code:cpp}
            my_charset_utf8mb4_tolower_as_ci->coll->strnncoll***()
            {code}

            The change would generally be quite mechanic. However, there is one small problem: strcasecmp() accepts 0-terminated strings, while the strnncoll-alike functions accept the pointer and the length. So some refactoring will be needed. Note, Monty earlier changed many MariaDB C data types to use LEX_CSTRING (instead of just a const char pointer) to store names. So this part of the current task will switch some more C data types to LEX_CSTRING.


            There are two parallel comparison systems in MariaDB collation library, implemented as virtual functions in MY_COLLATION_HANDLER:

            - Comparison according to the collation, provided by these functions
            {code:cpp}
              int (*strnncoll)(CHARSET_INFO *,
                                   const uchar *, size_t, const uchar *, size_t, my_bool);
              int (*strnncollsp)(CHARSET_INFO *,
                                     const uchar *, size_t, const uchar *, size_t);

              int (*strnncollsp_nchars)(CHARSET_INFO *,
                                            const uchar *str1, size_t len1,
                                            const uchar *str2, size_t len2,
                                            size_t nchars,
                                            uint flags);
            {code}


            - Comparison in case insensitive (but accent sensitive) style, implemented by this function:
            {code:cpp}
              int (*strcasecmp)(CHARSET_INFO *, const char *, const char *);
            {code}
            Note, accent and case sensitivity of the collation does not matter. strcasecmp() always works using accent sensitive case insensitive comparison style.

            These two parallel systems are redundant.

            Note, strcasecmp() is used mostly to compare identifiers, while the functions of the first group are used to compare data.

            Let's get rid of the second comparison system:
            1. Remove MY_COLLATION_HANDLER::strcasecmp()
            2. Introduce a new collation utf8mb4_tolower_as_ci. Note, it should work for the entire Unicode range U+0000..U+10FFFF.
            3. Replace all calls for:
            {code:cpp}
            system_charset_info->coll->strcasecmp()
            {code}
            to calls for
            {code:cpp}
            my_charset_utf8mb4_tolower_as_ci->coll->strnncoll***()
            {code}

            The change would generally be quite mechanic. However, there is one small problem: strcasecmp() accepts 0-terminated strings, while the strnncoll-alike functions accept the pointer and the length. So some refactoring will be needed. Note, Monty earlier changed many MariaDB C data types to use LEX_CSTRING (instead of just a const char pointer) to store names. So this part of the current task will switch some more C data types to LEX_CSTRING.

            h2. tolower vs toupper comparison

            Another option is to implement utf8mb3_toupper_ci which will compare upper cases (instead of lower cases).


            The difference is in a few dozen characters. This script finds all those characters:
            {code:sql}
            CREATE OR REPLACE TABLE t1 (a CHAR(1) CHARACTER SET utf8mb4 COLLATE utf8mb4_uca1400_ai_ci) ENGINE=MyISAM;
            DELIMITER $$
            FOR i IN 0..0xFFFF
            DO
              INSERT INTO t1 VALUES (CHAR(i USING ucs2));
            END FOR;
            $$
            DELIMITER ;
            ALTER TABLE t1
              ADD has_casefolding INT DEFAULT (BINARY LOWER(a)<>a OR BINARY UPPER(a)<>a),
              ADD KEY(has_casefolding, a);

            CREATE OR REPLACE TABLE t21 AS
            SELECT
              HEX(t1.a) AS hex_a, HEX(t2.a) AS hex_b,
              BINARY LOWER(t1.a)=LOWER(t2.a) eq_lower,
              BINARY UPPER(t1.a)=UPPER(t2.a) AS eq_upper,
              t1.a AS a, t2.a AS b
            FROM
              t1 t1, t1 t2
            WHERE
              t1.has_casefolding=1
            AND (BINARY LOWER(t1.a)=LOWER(t2.a))<>(BINARY UPPER(t1.a)=UPPER(t2.a));

            SELECT * FROM t21;
            {code}
            {noformat}
            +--------+--------+----------+----------+------+------+
            | hex_a | hex_b | eq_lower | eq_upper | a | b |
            +--------+--------+----------+----------+------+------+
            | E1BA9E | C39F | 1 | 0 | ẞ | ß |
            | CE99 | CD85 | 0 | 1 | Ι | ͅ |
            | CEB9 | CD85 | 0 | 1 | ι | ͅ |
            | E1BEBE | CD85 | 0 | 1 | ι | ͅ |
            | E284AB | C385 | 1 | 0 | Å | Å |
            | E284AB | C3A5 | 1 | 0 | Å | å |
            | C385 | E284AB | 1 | 0 | Å | Å |
            | C3A5 | E284AB | 1 | 0 | å | Å |
            | C4B0 | 49 | 1 | 0 | İ | I |
            | C4B1 | 49 | 0 | 1 | ı | I |
            | C4B0 | 69 | 1 | 0 | İ | i |
            | C4B1 | 69 | 0 | 1 | ı | i |
            | 49 | C4B0 | 1 | 0 | I | İ |
            | 69 | C4B0 | 1 | 0 | i | İ |
            | 49 | C4B1 | 0 | 1 | I | ı |
            | 69 | C4B1 | 0 | 1 | i | ı |
            | E284AA | 4B | 1 | 0 | K | K |
            | E284AA | 6B | 1 | 0 | K | k |
            | 4B | E284AA | 1 | 0 | K | K |
            | 6B | E284AA | 1 | 0 | k | K |
            | C5BF | 53 | 0 | 1 | ſ | S |
            | C5BF | 73 | 0 | 1 | ſ | s |
            | 53 | C5BF | 0 | 1 | S | ſ |
            | 73 | C5BF | 0 | 1 | s | ſ |
            | E1BA9B | E1B9A0 | 0 | 1 | ẛ | Ṡ |
            | E1BA9B | E1B9A1 | 0 | 1 | ẛ | ṡ |
            | E1B9A0 | E1BA9B | 0 | 1 | Ṡ | ẛ |
            | E1B9A1 | E1BA9B | 0 | 1 | ṡ | ẛ |
            | CF90 | CE92 | 0 | 1 | ϐ | Β |
            | CF90 | CEB2 | 0 | 1 | ϐ | β |
            | CE92 | CF90 | 0 | 1 | Β | ϐ |
            | CEB2 | CF90 | 0 | 1 | β | ϐ |
            | CFB5 | CE95 | 0 | 1 | ϵ | Ε |
            | CFB5 | CEB5 | 0 | 1 | ϵ | ε |
            | CE95 | CFB5 | 0 | 1 | Ε | ϵ |
            | CEB5 | CFB5 | 0 | 1 | ε | ϵ |
            | CF91 | CE98 | 0 | 1 | ϑ | Θ |
            | CFB4 | CE98 | 1 | 0 | ϴ | Θ |
            | CF91 | CEB8 | 0 | 1 | ϑ | θ |
            | CFB4 | CEB8 | 1 | 0 | ϴ | θ |
            | CE98 | CF91 | 0 | 1 | Θ | ϑ |
            | CEB8 | CF91 | 0 | 1 | θ | ϑ |
            | CE98 | CFB4 | 1 | 0 | Θ | ϴ |
            | CEB8 | CFB4 | 1 | 0 | θ | ϴ |
            | CD85 | CE99 | 0 | 1 | ͅ | Ι |
            | E1BEBE | CE99 | 0 | 1 | ι | Ι |
            | CD85 | CEB9 | 0 | 1 | ͅ | ι |
            | E1BEBE | CEB9 | 0 | 1 | ι | ι |
            | CD85 | E1BEBE | 0 | 1 | ͅ | ι |
            | CE99 | E1BEBE | 0 | 1 | Ι | ι |
            | CEB9 | E1BEBE | 0 | 1 | ι | ι |
            | CFB0 | CE9A | 0 | 1 | ϰ | Κ |
            | CFB0 | CEBA | 0 | 1 | ϰ | κ |
            | CE9A | CFB0 | 0 | 1 | Κ | ϰ |
            | CEBA | CFB0 | 0 | 1 | κ | ϰ |
            | CE9C | C2B5 | 0 | 1 | Μ | µ |
            | CEBC | C2B5 | 0 | 1 | μ | µ |
            | C2B5 | CE9C | 0 | 1 | µ | Μ |
            | C2B5 | CEBC | 0 | 1 | µ | μ |
            | CF96 | CEA0 | 0 | 1 | ϖ | Π |
            | CF96 | CF80 | 0 | 1 | ϖ | π |
            | CEA0 | CF96 | 0 | 1 | Π | ϖ |
            | CF80 | CF96 | 0 | 1 | π | ϖ |
            | CFB1 | CEA1 | 0 | 1 | ϱ | Ρ |
            | CFB1 | CF81 | 0 | 1 | ϱ | ρ |
            | CEA1 | CFB1 | 0 | 1 | Ρ | ϱ |
            | CF81 | CFB1 | 0 | 1 | ρ | ϱ |
            | CF82 | CEA3 | 0 | 1 | ς | Σ |
            | CEA3 | CF82 | 0 | 1 | Σ | ς |
            | CF83 | CF82 | 0 | 1 | σ | ς |
            | CF82 | CF83 | 0 | 1 | ς | σ |
            | CF95 | CEA6 | 0 | 1 | ϕ | Φ |
            | CF95 | CF86 | 0 | 1 | ϕ | φ |
            | CEA6 | CF95 | 0 | 1 | Φ | ϕ |
            | CF86 | CF95 | 0 | 1 | φ | ϕ |
            | E284A6 | CEA9 | 1 | 0 | Ω | Ω |
            | E284A6 | CF89 | 1 | 0 | Ω | ω |
            | CEA9 | E284A6 | 1 | 0 | Ω | Ω |
            | CF89 | E284A6 | 1 | 0 | ω | Ω |
            | E1B280 | D092 | 0 | 1 | ᲀ | В |
            | E1B280 | D0B2 | 0 | 1 | ᲀ | в |
            | D092 | E1B280 | 0 | 1 | В | ᲀ |
            | D0B2 | E1B280 | 0 | 1 | в | ᲀ |
            | E1B281 | D094 | 0 | 1 | ᲁ | Д |
            | E1B281 | D0B4 | 0 | 1 | ᲁ | д |
            | D094 | E1B281 | 0 | 1 | Д | ᲁ |
            | D0B4 | E1B281 | 0 | 1 | д | ᲁ |
            | E1B282 | D09E | 0 | 1 | ᲂ | О |
            | E1B282 | D0BE | 0 | 1 | ᲂ | о |
            | D09E | E1B282 | 0 | 1 | О | ᲂ |
            | D0BE | E1B282 | 0 | 1 | о | ᲂ |
            | E1B283 | D0A1 | 0 | 1 | ᲃ | С |
            | E1B283 | D181 | 0 | 1 | ᲃ | с |
            | D0A1 | E1B283 | 0 | 1 | С | ᲃ |
            | D181 | E1B283 | 0 | 1 | с | ᲃ |
            | E1B284 | D0A2 | 0 | 1 | ᲄ | Т |
            | E1B285 | D0A2 | 0 | 1 | ᲅ | Т |
            | E1B284 | D182 | 0 | 1 | ᲄ | т |
            | E1B285 | D182 | 0 | 1 | ᲅ | т |
            | D0A2 | E1B284 | 0 | 1 | Т | ᲄ |
            | D182 | E1B284 | 0 | 1 | т | ᲄ |
            | E1B285 | E1B284 | 0 | 1 | ᲅ | ᲄ |
            | D0A2 | E1B285 | 0 | 1 | Т | ᲅ |
            | D182 | E1B285 | 0 | 1 | т | ᲅ |
            | E1B284 | E1B285 | 0 | 1 | ᲄ | ᲅ |
            | EA998A | E1B288 | 0 | 1 | Ꙋ | ᲈ |
            | EA998B | E1B288 | 0 | 1 | ꙋ | ᲈ |
            | E1B288 | EA998A | 0 | 1 | ᲈ | Ꙋ |
            | E1B288 | EA998B | 0 | 1 | ᲈ | ꙋ |
            | E1B286 | D0AA | 0 | 1 | ᲆ | Ъ |
            | E1B286 | D18A | 0 | 1 | ᲆ | ъ |
            | D0AA | E1B286 | 0 | 1 | Ъ | ᲆ |
            | D18A | E1B286 | 0 | 1 | ъ | ᲆ |
            | E1B287 | D1A2 | 0 | 1 | ᲇ | Ѣ |
            | E1B287 | D1A3 | 0 | 1 | ᲇ | ѣ |
            | D1A2 | E1B287 | 0 | 1 | Ѣ | ᲇ |
            | D1A3 | E1B287 | 0 | 1 | ѣ | ᲇ |
            +--------+--------+----------+----------+------+------+
            {noformat}

            bar Alexander Barkov made changes -
            Description There are two parallel comparison systems in MariaDB collation library, implemented as virtual functions in MY_COLLATION_HANDLER:

            - Comparison according to the collation, provided by these functions
            {code:cpp}
              int (*strnncoll)(CHARSET_INFO *,
                                   const uchar *, size_t, const uchar *, size_t, my_bool);
              int (*strnncollsp)(CHARSET_INFO *,
                                     const uchar *, size_t, const uchar *, size_t);

              int (*strnncollsp_nchars)(CHARSET_INFO *,
                                            const uchar *str1, size_t len1,
                                            const uchar *str2, size_t len2,
                                            size_t nchars,
                                            uint flags);
            {code}


            - Comparison in case insensitive (but accent sensitive) style, implemented by this function:
            {code:cpp}
              int (*strcasecmp)(CHARSET_INFO *, const char *, const char *);
            {code}
            Note, accent and case sensitivity of the collation does not matter. strcasecmp() always works using accent sensitive case insensitive comparison style.

            These two parallel systems are redundant.

            Note, strcasecmp() is used mostly to compare identifiers, while the functions of the first group are used to compare data.

            Let's get rid of the second comparison system:
            1. Remove MY_COLLATION_HANDLER::strcasecmp()
            2. Introduce a new collation utf8mb4_tolower_as_ci. Note, it should work for the entire Unicode range U+0000..U+10FFFF.
            3. Replace all calls for:
            {code:cpp}
            system_charset_info->coll->strcasecmp()
            {code}
            to calls for
            {code:cpp}
            my_charset_utf8mb4_tolower_as_ci->coll->strnncoll***()
            {code}

            The change would generally be quite mechanic. However, there is one small problem: strcasecmp() accepts 0-terminated strings, while the strnncoll-alike functions accept the pointer and the length. So some refactoring will be needed. Note, Monty earlier changed many MariaDB C data types to use LEX_CSTRING (instead of just a const char pointer) to store names. So this part of the current task will switch some more C data types to LEX_CSTRING.

            h2. tolower vs toupper comparison

            Another option is to implement utf8mb3_toupper_ci which will compare upper cases (instead of lower cases).


            The difference is in a few dozen characters. This script finds all those characters:
            {code:sql}
            CREATE OR REPLACE TABLE t1 (a CHAR(1) CHARACTER SET utf8mb4 COLLATE utf8mb4_uca1400_ai_ci) ENGINE=MyISAM;
            DELIMITER $$
            FOR i IN 0..0xFFFF
            DO
              INSERT INTO t1 VALUES (CHAR(i USING ucs2));
            END FOR;
            $$
            DELIMITER ;
            ALTER TABLE t1
              ADD has_casefolding INT DEFAULT (BINARY LOWER(a)<>a OR BINARY UPPER(a)<>a),
              ADD KEY(has_casefolding, a);

            CREATE OR REPLACE TABLE t21 AS
            SELECT
              HEX(t1.a) AS hex_a, HEX(t2.a) AS hex_b,
              BINARY LOWER(t1.a)=LOWER(t2.a) eq_lower,
              BINARY UPPER(t1.a)=UPPER(t2.a) AS eq_upper,
              t1.a AS a, t2.a AS b
            FROM
              t1 t1, t1 t2
            WHERE
              t1.has_casefolding=1
            AND (BINARY LOWER(t1.a)=LOWER(t2.a))<>(BINARY UPPER(t1.a)=UPPER(t2.a));

            SELECT * FROM t21;
            {code}
            {noformat}
            +--------+--------+----------+----------+------+------+
            | hex_a | hex_b | eq_lower | eq_upper | a | b |
            +--------+--------+----------+----------+------+------+
            | E1BA9E | C39F | 1 | 0 | ẞ | ß |
            | CE99 | CD85 | 0 | 1 | Ι | ͅ |
            | CEB9 | CD85 | 0 | 1 | ι | ͅ |
            | E1BEBE | CD85 | 0 | 1 | ι | ͅ |
            | E284AB | C385 | 1 | 0 | Å | Å |
            | E284AB | C3A5 | 1 | 0 | Å | å |
            | C385 | E284AB | 1 | 0 | Å | Å |
            | C3A5 | E284AB | 1 | 0 | å | Å |
            | C4B0 | 49 | 1 | 0 | İ | I |
            | C4B1 | 49 | 0 | 1 | ı | I |
            | C4B0 | 69 | 1 | 0 | İ | i |
            | C4B1 | 69 | 0 | 1 | ı | i |
            | 49 | C4B0 | 1 | 0 | I | İ |
            | 69 | C4B0 | 1 | 0 | i | İ |
            | 49 | C4B1 | 0 | 1 | I | ı |
            | 69 | C4B1 | 0 | 1 | i | ı |
            | E284AA | 4B | 1 | 0 | K | K |
            | E284AA | 6B | 1 | 0 | K | k |
            | 4B | E284AA | 1 | 0 | K | K |
            | 6B | E284AA | 1 | 0 | k | K |
            | C5BF | 53 | 0 | 1 | ſ | S |
            | C5BF | 73 | 0 | 1 | ſ | s |
            | 53 | C5BF | 0 | 1 | S | ſ |
            | 73 | C5BF | 0 | 1 | s | ſ |
            | E1BA9B | E1B9A0 | 0 | 1 | ẛ | Ṡ |
            | E1BA9B | E1B9A1 | 0 | 1 | ẛ | ṡ |
            | E1B9A0 | E1BA9B | 0 | 1 | Ṡ | ẛ |
            | E1B9A1 | E1BA9B | 0 | 1 | ṡ | ẛ |
            | CF90 | CE92 | 0 | 1 | ϐ | Β |
            | CF90 | CEB2 | 0 | 1 | ϐ | β |
            | CE92 | CF90 | 0 | 1 | Β | ϐ |
            | CEB2 | CF90 | 0 | 1 | β | ϐ |
            | CFB5 | CE95 | 0 | 1 | ϵ | Ε |
            | CFB5 | CEB5 | 0 | 1 | ϵ | ε |
            | CE95 | CFB5 | 0 | 1 | Ε | ϵ |
            | CEB5 | CFB5 | 0 | 1 | ε | ϵ |
            | CF91 | CE98 | 0 | 1 | ϑ | Θ |
            | CFB4 | CE98 | 1 | 0 | ϴ | Θ |
            | CF91 | CEB8 | 0 | 1 | ϑ | θ |
            | CFB4 | CEB8 | 1 | 0 | ϴ | θ |
            | CE98 | CF91 | 0 | 1 | Θ | ϑ |
            | CEB8 | CF91 | 0 | 1 | θ | ϑ |
            | CE98 | CFB4 | 1 | 0 | Θ | ϴ |
            | CEB8 | CFB4 | 1 | 0 | θ | ϴ |
            | CD85 | CE99 | 0 | 1 | ͅ | Ι |
            | E1BEBE | CE99 | 0 | 1 | ι | Ι |
            | CD85 | CEB9 | 0 | 1 | ͅ | ι |
            | E1BEBE | CEB9 | 0 | 1 | ι | ι |
            | CD85 | E1BEBE | 0 | 1 | ͅ | ι |
            | CE99 | E1BEBE | 0 | 1 | Ι | ι |
            | CEB9 | E1BEBE | 0 | 1 | ι | ι |
            | CFB0 | CE9A | 0 | 1 | ϰ | Κ |
            | CFB0 | CEBA | 0 | 1 | ϰ | κ |
            | CE9A | CFB0 | 0 | 1 | Κ | ϰ |
            | CEBA | CFB0 | 0 | 1 | κ | ϰ |
            | CE9C | C2B5 | 0 | 1 | Μ | µ |
            | CEBC | C2B5 | 0 | 1 | μ | µ |
            | C2B5 | CE9C | 0 | 1 | µ | Μ |
            | C2B5 | CEBC | 0 | 1 | µ | μ |
            | CF96 | CEA0 | 0 | 1 | ϖ | Π |
            | CF96 | CF80 | 0 | 1 | ϖ | π |
            | CEA0 | CF96 | 0 | 1 | Π | ϖ |
            | CF80 | CF96 | 0 | 1 | π | ϖ |
            | CFB1 | CEA1 | 0 | 1 | ϱ | Ρ |
            | CFB1 | CF81 | 0 | 1 | ϱ | ρ |
            | CEA1 | CFB1 | 0 | 1 | Ρ | ϱ |
            | CF81 | CFB1 | 0 | 1 | ρ | ϱ |
            | CF82 | CEA3 | 0 | 1 | ς | Σ |
            | CEA3 | CF82 | 0 | 1 | Σ | ς |
            | CF83 | CF82 | 0 | 1 | σ | ς |
            | CF82 | CF83 | 0 | 1 | ς | σ |
            | CF95 | CEA6 | 0 | 1 | ϕ | Φ |
            | CF95 | CF86 | 0 | 1 | ϕ | φ |
            | CEA6 | CF95 | 0 | 1 | Φ | ϕ |
            | CF86 | CF95 | 0 | 1 | φ | ϕ |
            | E284A6 | CEA9 | 1 | 0 | Ω | Ω |
            | E284A6 | CF89 | 1 | 0 | Ω | ω |
            | CEA9 | E284A6 | 1 | 0 | Ω | Ω |
            | CF89 | E284A6 | 1 | 0 | ω | Ω |
            | E1B280 | D092 | 0 | 1 | ᲀ | В |
            | E1B280 | D0B2 | 0 | 1 | ᲀ | в |
            | D092 | E1B280 | 0 | 1 | В | ᲀ |
            | D0B2 | E1B280 | 0 | 1 | в | ᲀ |
            | E1B281 | D094 | 0 | 1 | ᲁ | Д |
            | E1B281 | D0B4 | 0 | 1 | ᲁ | д |
            | D094 | E1B281 | 0 | 1 | Д | ᲁ |
            | D0B4 | E1B281 | 0 | 1 | д | ᲁ |
            | E1B282 | D09E | 0 | 1 | ᲂ | О |
            | E1B282 | D0BE | 0 | 1 | ᲂ | о |
            | D09E | E1B282 | 0 | 1 | О | ᲂ |
            | D0BE | E1B282 | 0 | 1 | о | ᲂ |
            | E1B283 | D0A1 | 0 | 1 | ᲃ | С |
            | E1B283 | D181 | 0 | 1 | ᲃ | с |
            | D0A1 | E1B283 | 0 | 1 | С | ᲃ |
            | D181 | E1B283 | 0 | 1 | с | ᲃ |
            | E1B284 | D0A2 | 0 | 1 | ᲄ | Т |
            | E1B285 | D0A2 | 0 | 1 | ᲅ | Т |
            | E1B284 | D182 | 0 | 1 | ᲄ | т |
            | E1B285 | D182 | 0 | 1 | ᲅ | т |
            | D0A2 | E1B284 | 0 | 1 | Т | ᲄ |
            | D182 | E1B284 | 0 | 1 | т | ᲄ |
            | E1B285 | E1B284 | 0 | 1 | ᲅ | ᲄ |
            | D0A2 | E1B285 | 0 | 1 | Т | ᲅ |
            | D182 | E1B285 | 0 | 1 | т | ᲅ |
            | E1B284 | E1B285 | 0 | 1 | ᲄ | ᲅ |
            | EA998A | E1B288 | 0 | 1 | Ꙋ | ᲈ |
            | EA998B | E1B288 | 0 | 1 | ꙋ | ᲈ |
            | E1B288 | EA998A | 0 | 1 | ᲈ | Ꙋ |
            | E1B288 | EA998B | 0 | 1 | ᲈ | ꙋ |
            | E1B286 | D0AA | 0 | 1 | ᲆ | Ъ |
            | E1B286 | D18A | 0 | 1 | ᲆ | ъ |
            | D0AA | E1B286 | 0 | 1 | Ъ | ᲆ |
            | D18A | E1B286 | 0 | 1 | ъ | ᲆ |
            | E1B287 | D1A2 | 0 | 1 | ᲇ | Ѣ |
            | E1B287 | D1A3 | 0 | 1 | ᲇ | ѣ |
            | D1A2 | E1B287 | 0 | 1 | Ѣ | ᲇ |
            | D1A3 | E1B287 | 0 | 1 | ѣ | ᲇ |
            +--------+--------+----------+----------+------+------+
            {noformat}

            There are two parallel comparison systems in MariaDB collation library, implemented as virtual functions in MY_COLLATION_HANDLER:

            - Comparison according to the collation, provided by these functions
            {code:cpp}
              int (*strnncoll)(CHARSET_INFO *,
                                   const uchar *, size_t, const uchar *, size_t, my_bool);
              int (*strnncollsp)(CHARSET_INFO *,
                                     const uchar *, size_t, const uchar *, size_t);

              int (*strnncollsp_nchars)(CHARSET_INFO *,
                                            const uchar *str1, size_t len1,
                                            const uchar *str2, size_t len2,
                                            size_t nchars,
                                            uint flags);
            {code}


            - Comparison in case insensitive (but accent sensitive) style, implemented by this function:
            {code:cpp}
              int (*strcasecmp)(CHARSET_INFO *, const char *, const char *);
            {code}
            Note, accent and case sensitivity of the collation does not matter. strcasecmp() always works using accent sensitive case insensitive comparison style.

            These two parallel systems are redundant.

            Note, strcasecmp() is used mostly to compare identifiers, while the functions of the first group are used to compare data.

            Let's get rid of the second comparison system:
            1. Remove MY_COLLATION_HANDLER::strcasecmp()
            2. Introduce a new collation utf8mb4_tolower_as_ci. Note, it should work for the entire Unicode range U+0000..U+10FFFF.
            3. Replace all calls for:
            {code:cpp}
            system_charset_info->coll->strcasecmp()
            {code}
            to calls for
            {code:cpp}
            my_charset_utf8mb4_tolower_as_ci->coll->strnncoll***()
            {code}

            The change would generally be quite mechanic. However, there is one small problem: strcasecmp() accepts 0-terminated strings, while the strnncoll-alike functions accept the pointer and the length. So some refactoring will be needed. Note, Monty earlier changed many MariaDB C data types to use LEX_CSTRING (instead of just a const char pointer) to store names. So this part of the current task will switch some more C data types to LEX_CSTRING.

            h2. tolower vs toupper comparison

            Another option is to implement utf8mb3_toupper_ci which will compare upper cases (instead of lower cases).


            The difference is in a few dozen BMP characters. This script finds all those characters:
            {code:sql}
            CREATE OR REPLACE TABLE t1 (a CHAR(1) CHARACTER SET utf8mb4 COLLATE utf8mb4_uca1400_ai_ci) ENGINE=MyISAM;
            DELIMITER $$
            FOR i IN 0..0xFFFF
            DO
              INSERT INTO t1 VALUES (CHAR(i USING ucs2));
            END FOR;
            $$
            DELIMITER ;
            ALTER TABLE t1
              ADD has_casefolding INT DEFAULT (BINARY LOWER(a)<>a OR BINARY UPPER(a)<>a),
              ADD KEY(has_casefolding, a);

            CREATE OR REPLACE TABLE t21 AS
            SELECT
              HEX(t1.a) AS hex_a, HEX(t2.a) AS hex_b,
              BINARY LOWER(t1.a)=LOWER(t2.a) eq_lower,
              BINARY UPPER(t1.a)=UPPER(t2.a) AS eq_upper,
              t1.a AS a, t2.a AS b
            FROM
              t1 t1, t1 t2
            WHERE
              t1.has_casefolding=1
            AND (BINARY LOWER(t1.a)=LOWER(t2.a))<>(BINARY UPPER(t1.a)=UPPER(t2.a));

            SELECT * FROM t21;
            {code}
            {noformat}
            +--------+--------+----------+----------+------+------+
            | hex_a | hex_b | eq_lower | eq_upper | a | b |
            +--------+--------+----------+----------+------+------+
            | E1BA9E | C39F | 1 | 0 | ẞ | ß |
            | CE99 | CD85 | 0 | 1 | Ι | ͅ |
            | CEB9 | CD85 | 0 | 1 | ι | ͅ |
            | E1BEBE | CD85 | 0 | 1 | ι | ͅ |
            | E284AB | C385 | 1 | 0 | Å | Å |
            | E284AB | C3A5 | 1 | 0 | Å | å |
            | C385 | E284AB | 1 | 0 | Å | Å |
            | C3A5 | E284AB | 1 | 0 | å | Å |
            | C4B0 | 49 | 1 | 0 | İ | I |
            | C4B1 | 49 | 0 | 1 | ı | I |
            | C4B0 | 69 | 1 | 0 | İ | i |
            | C4B1 | 69 | 0 | 1 | ı | i |
            | 49 | C4B0 | 1 | 0 | I | İ |
            | 69 | C4B0 | 1 | 0 | i | İ |
            | 49 | C4B1 | 0 | 1 | I | ı |
            | 69 | C4B1 | 0 | 1 | i | ı |
            | E284AA | 4B | 1 | 0 | K | K |
            | E284AA | 6B | 1 | 0 | K | k |
            | 4B | E284AA | 1 | 0 | K | K |
            | 6B | E284AA | 1 | 0 | k | K |
            | C5BF | 53 | 0 | 1 | ſ | S |
            | C5BF | 73 | 0 | 1 | ſ | s |
            | 53 | C5BF | 0 | 1 | S | ſ |
            | 73 | C5BF | 0 | 1 | s | ſ |
            | E1BA9B | E1B9A0 | 0 | 1 | ẛ | Ṡ |
            | E1BA9B | E1B9A1 | 0 | 1 | ẛ | ṡ |
            | E1B9A0 | E1BA9B | 0 | 1 | Ṡ | ẛ |
            | E1B9A1 | E1BA9B | 0 | 1 | ṡ | ẛ |
            | CF90 | CE92 | 0 | 1 | ϐ | Β |
            | CF90 | CEB2 | 0 | 1 | ϐ | β |
            | CE92 | CF90 | 0 | 1 | Β | ϐ |
            | CEB2 | CF90 | 0 | 1 | β | ϐ |
            | CFB5 | CE95 | 0 | 1 | ϵ | Ε |
            | CFB5 | CEB5 | 0 | 1 | ϵ | ε |
            | CE95 | CFB5 | 0 | 1 | Ε | ϵ |
            | CEB5 | CFB5 | 0 | 1 | ε | ϵ |
            | CF91 | CE98 | 0 | 1 | ϑ | Θ |
            | CFB4 | CE98 | 1 | 0 | ϴ | Θ |
            | CF91 | CEB8 | 0 | 1 | ϑ | θ |
            | CFB4 | CEB8 | 1 | 0 | ϴ | θ |
            | CE98 | CF91 | 0 | 1 | Θ | ϑ |
            | CEB8 | CF91 | 0 | 1 | θ | ϑ |
            | CE98 | CFB4 | 1 | 0 | Θ | ϴ |
            | CEB8 | CFB4 | 1 | 0 | θ | ϴ |
            | CD85 | CE99 | 0 | 1 | ͅ | Ι |
            | E1BEBE | CE99 | 0 | 1 | ι | Ι |
            | CD85 | CEB9 | 0 | 1 | ͅ | ι |
            | E1BEBE | CEB9 | 0 | 1 | ι | ι |
            | CD85 | E1BEBE | 0 | 1 | ͅ | ι |
            | CE99 | E1BEBE | 0 | 1 | Ι | ι |
            | CEB9 | E1BEBE | 0 | 1 | ι | ι |
            | CFB0 | CE9A | 0 | 1 | ϰ | Κ |
            | CFB0 | CEBA | 0 | 1 | ϰ | κ |
            | CE9A | CFB0 | 0 | 1 | Κ | ϰ |
            | CEBA | CFB0 | 0 | 1 | κ | ϰ |
            | CE9C | C2B5 | 0 | 1 | Μ | µ |
            | CEBC | C2B5 | 0 | 1 | μ | µ |
            | C2B5 | CE9C | 0 | 1 | µ | Μ |
            | C2B5 | CEBC | 0 | 1 | µ | μ |
            | CF96 | CEA0 | 0 | 1 | ϖ | Π |
            | CF96 | CF80 | 0 | 1 | ϖ | π |
            | CEA0 | CF96 | 0 | 1 | Π | ϖ |
            | CF80 | CF96 | 0 | 1 | π | ϖ |
            | CFB1 | CEA1 | 0 | 1 | ϱ | Ρ |
            | CFB1 | CF81 | 0 | 1 | ϱ | ρ |
            | CEA1 | CFB1 | 0 | 1 | Ρ | ϱ |
            | CF81 | CFB1 | 0 | 1 | ρ | ϱ |
            | CF82 | CEA3 | 0 | 1 | ς | Σ |
            | CEA3 | CF82 | 0 | 1 | Σ | ς |
            | CF83 | CF82 | 0 | 1 | σ | ς |
            | CF82 | CF83 | 0 | 1 | ς | σ |
            | CF95 | CEA6 | 0 | 1 | ϕ | Φ |
            | CF95 | CF86 | 0 | 1 | ϕ | φ |
            | CEA6 | CF95 | 0 | 1 | Φ | ϕ |
            | CF86 | CF95 | 0 | 1 | φ | ϕ |
            | E284A6 | CEA9 | 1 | 0 | Ω | Ω |
            | E284A6 | CF89 | 1 | 0 | Ω | ω |
            | CEA9 | E284A6 | 1 | 0 | Ω | Ω |
            | CF89 | E284A6 | 1 | 0 | ω | Ω |
            | E1B280 | D092 | 0 | 1 | ᲀ | В |
            | E1B280 | D0B2 | 0 | 1 | ᲀ | в |
            | D092 | E1B280 | 0 | 1 | В | ᲀ |
            | D0B2 | E1B280 | 0 | 1 | в | ᲀ |
            | E1B281 | D094 | 0 | 1 | ᲁ | Д |
            | E1B281 | D0B4 | 0 | 1 | ᲁ | д |
            | D094 | E1B281 | 0 | 1 | Д | ᲁ |
            | D0B4 | E1B281 | 0 | 1 | д | ᲁ |
            | E1B282 | D09E | 0 | 1 | ᲂ | О |
            | E1B282 | D0BE | 0 | 1 | ᲂ | о |
            | D09E | E1B282 | 0 | 1 | О | ᲂ |
            | D0BE | E1B282 | 0 | 1 | о | ᲂ |
            | E1B283 | D0A1 | 0 | 1 | ᲃ | С |
            | E1B283 | D181 | 0 | 1 | ᲃ | с |
            | D0A1 | E1B283 | 0 | 1 | С | ᲃ |
            | D181 | E1B283 | 0 | 1 | с | ᲃ |
            | E1B284 | D0A2 | 0 | 1 | ᲄ | Т |
            | E1B285 | D0A2 | 0 | 1 | ᲅ | Т |
            | E1B284 | D182 | 0 | 1 | ᲄ | т |
            | E1B285 | D182 | 0 | 1 | ᲅ | т |
            | D0A2 | E1B284 | 0 | 1 | Т | ᲄ |
            | D182 | E1B284 | 0 | 1 | т | ᲄ |
            | E1B285 | E1B284 | 0 | 1 | ᲅ | ᲄ |
            | D0A2 | E1B285 | 0 | 1 | Т | ᲅ |
            | D182 | E1B285 | 0 | 1 | т | ᲅ |
            | E1B284 | E1B285 | 0 | 1 | ᲄ | ᲅ |
            | EA998A | E1B288 | 0 | 1 | Ꙋ | ᲈ |
            | EA998B | E1B288 | 0 | 1 | ꙋ | ᲈ |
            | E1B288 | EA998A | 0 | 1 | ᲈ | Ꙋ |
            | E1B288 | EA998B | 0 | 1 | ᲈ | ꙋ |
            | E1B286 | D0AA | 0 | 1 | ᲆ | Ъ |
            | E1B286 | D18A | 0 | 1 | ᲆ | ъ |
            | D0AA | E1B286 | 0 | 1 | Ъ | ᲆ |
            | D18A | E1B286 | 0 | 1 | ъ | ᲆ |
            | E1B287 | D1A2 | 0 | 1 | ᲇ | Ѣ |
            | E1B287 | D1A3 | 0 | 1 | ᲇ | ѣ |
            | D1A2 | E1B287 | 0 | 1 | Ѣ | ᲇ |
            | D1A3 | E1B287 | 0 | 1 | ѣ | ᲇ |
            +--------+--------+----------+----------+------+------+
            {noformat}


            Note, toupper comparison considers more pairs as equal than tolower comparison:
            {code:sql}
            SELECT SUM(eq_lower), SUM(eq_upper) FROM t21;
            {code}
            {noformat}
            +---------------+---------------+
            | SUM(eq_lower) | SUM(eq_upper) |
            +---------------+---------------+
            | 21 | 96 |
            +---------------+---------------+
            {noformat}


            - The tolower variant will compare exactly like strcasecmp() did
            - The toupper variant will compare close to how utf8mb3_general_ci works
            bar Alexander Barkov made changes -
            Description There are two parallel comparison systems in MariaDB collation library, implemented as virtual functions in MY_COLLATION_HANDLER:

            - Comparison according to the collation, provided by these functions
            {code:cpp}
              int (*strnncoll)(CHARSET_INFO *,
                                   const uchar *, size_t, const uchar *, size_t, my_bool);
              int (*strnncollsp)(CHARSET_INFO *,
                                     const uchar *, size_t, const uchar *, size_t);

              int (*strnncollsp_nchars)(CHARSET_INFO *,
                                            const uchar *str1, size_t len1,
                                            const uchar *str2, size_t len2,
                                            size_t nchars,
                                            uint flags);
            {code}


            - Comparison in case insensitive (but accent sensitive) style, implemented by this function:
            {code:cpp}
              int (*strcasecmp)(CHARSET_INFO *, const char *, const char *);
            {code}
            Note, accent and case sensitivity of the collation does not matter. strcasecmp() always works using accent sensitive case insensitive comparison style.

            These two parallel systems are redundant.

            Note, strcasecmp() is used mostly to compare identifiers, while the functions of the first group are used to compare data.

            Let's get rid of the second comparison system:
            1. Remove MY_COLLATION_HANDLER::strcasecmp()
            2. Introduce a new collation utf8mb4_tolower_as_ci. Note, it should work for the entire Unicode range U+0000..U+10FFFF.
            3. Replace all calls for:
            {code:cpp}
            system_charset_info->coll->strcasecmp()
            {code}
            to calls for
            {code:cpp}
            my_charset_utf8mb4_tolower_as_ci->coll->strnncoll***()
            {code}

            The change would generally be quite mechanic. However, there is one small problem: strcasecmp() accepts 0-terminated strings, while the strnncoll-alike functions accept the pointer and the length. So some refactoring will be needed. Note, Monty earlier changed many MariaDB C data types to use LEX_CSTRING (instead of just a const char pointer) to store names. So this part of the current task will switch some more C data types to LEX_CSTRING.

            h2. tolower vs toupper comparison

            Another option is to implement utf8mb3_toupper_ci which will compare upper cases (instead of lower cases).


            The difference is in a few dozen BMP characters. This script finds all those characters:
            {code:sql}
            CREATE OR REPLACE TABLE t1 (a CHAR(1) CHARACTER SET utf8mb4 COLLATE utf8mb4_uca1400_ai_ci) ENGINE=MyISAM;
            DELIMITER $$
            FOR i IN 0..0xFFFF
            DO
              INSERT INTO t1 VALUES (CHAR(i USING ucs2));
            END FOR;
            $$
            DELIMITER ;
            ALTER TABLE t1
              ADD has_casefolding INT DEFAULT (BINARY LOWER(a)<>a OR BINARY UPPER(a)<>a),
              ADD KEY(has_casefolding, a);

            CREATE OR REPLACE TABLE t21 AS
            SELECT
              HEX(t1.a) AS hex_a, HEX(t2.a) AS hex_b,
              BINARY LOWER(t1.a)=LOWER(t2.a) eq_lower,
              BINARY UPPER(t1.a)=UPPER(t2.a) AS eq_upper,
              t1.a AS a, t2.a AS b
            FROM
              t1 t1, t1 t2
            WHERE
              t1.has_casefolding=1
            AND (BINARY LOWER(t1.a)=LOWER(t2.a))<>(BINARY UPPER(t1.a)=UPPER(t2.a));

            SELECT * FROM t21;
            {code}
            {noformat}
            +--------+--------+----------+----------+------+------+
            | hex_a | hex_b | eq_lower | eq_upper | a | b |
            +--------+--------+----------+----------+------+------+
            | E1BA9E | C39F | 1 | 0 | ẞ | ß |
            | CE99 | CD85 | 0 | 1 | Ι | ͅ |
            | CEB9 | CD85 | 0 | 1 | ι | ͅ |
            | E1BEBE | CD85 | 0 | 1 | ι | ͅ |
            | E284AB | C385 | 1 | 0 | Å | Å |
            | E284AB | C3A5 | 1 | 0 | Å | å |
            | C385 | E284AB | 1 | 0 | Å | Å |
            | C3A5 | E284AB | 1 | 0 | å | Å |
            | C4B0 | 49 | 1 | 0 | İ | I |
            | C4B1 | 49 | 0 | 1 | ı | I |
            | C4B0 | 69 | 1 | 0 | İ | i |
            | C4B1 | 69 | 0 | 1 | ı | i |
            | 49 | C4B0 | 1 | 0 | I | İ |
            | 69 | C4B0 | 1 | 0 | i | İ |
            | 49 | C4B1 | 0 | 1 | I | ı |
            | 69 | C4B1 | 0 | 1 | i | ı |
            | E284AA | 4B | 1 | 0 | K | K |
            | E284AA | 6B | 1 | 0 | K | k |
            | 4B | E284AA | 1 | 0 | K | K |
            | 6B | E284AA | 1 | 0 | k | K |
            | C5BF | 53 | 0 | 1 | ſ | S |
            | C5BF | 73 | 0 | 1 | ſ | s |
            | 53 | C5BF | 0 | 1 | S | ſ |
            | 73 | C5BF | 0 | 1 | s | ſ |
            | E1BA9B | E1B9A0 | 0 | 1 | ẛ | Ṡ |
            | E1BA9B | E1B9A1 | 0 | 1 | ẛ | ṡ |
            | E1B9A0 | E1BA9B | 0 | 1 | Ṡ | ẛ |
            | E1B9A1 | E1BA9B | 0 | 1 | ṡ | ẛ |
            | CF90 | CE92 | 0 | 1 | ϐ | Β |
            | CF90 | CEB2 | 0 | 1 | ϐ | β |
            | CE92 | CF90 | 0 | 1 | Β | ϐ |
            | CEB2 | CF90 | 0 | 1 | β | ϐ |
            | CFB5 | CE95 | 0 | 1 | ϵ | Ε |
            | CFB5 | CEB5 | 0 | 1 | ϵ | ε |
            | CE95 | CFB5 | 0 | 1 | Ε | ϵ |
            | CEB5 | CFB5 | 0 | 1 | ε | ϵ |
            | CF91 | CE98 | 0 | 1 | ϑ | Θ |
            | CFB4 | CE98 | 1 | 0 | ϴ | Θ |
            | CF91 | CEB8 | 0 | 1 | ϑ | θ |
            | CFB4 | CEB8 | 1 | 0 | ϴ | θ |
            | CE98 | CF91 | 0 | 1 | Θ | ϑ |
            | CEB8 | CF91 | 0 | 1 | θ | ϑ |
            | CE98 | CFB4 | 1 | 0 | Θ | ϴ |
            | CEB8 | CFB4 | 1 | 0 | θ | ϴ |
            | CD85 | CE99 | 0 | 1 | ͅ | Ι |
            | E1BEBE | CE99 | 0 | 1 | ι | Ι |
            | CD85 | CEB9 | 0 | 1 | ͅ | ι |
            | E1BEBE | CEB9 | 0 | 1 | ι | ι |
            | CD85 | E1BEBE | 0 | 1 | ͅ | ι |
            | CE99 | E1BEBE | 0 | 1 | Ι | ι |
            | CEB9 | E1BEBE | 0 | 1 | ι | ι |
            | CFB0 | CE9A | 0 | 1 | ϰ | Κ |
            | CFB0 | CEBA | 0 | 1 | ϰ | κ |
            | CE9A | CFB0 | 0 | 1 | Κ | ϰ |
            | CEBA | CFB0 | 0 | 1 | κ | ϰ |
            | CE9C | C2B5 | 0 | 1 | Μ | µ |
            | CEBC | C2B5 | 0 | 1 | μ | µ |
            | C2B5 | CE9C | 0 | 1 | µ | Μ |
            | C2B5 | CEBC | 0 | 1 | µ | μ |
            | CF96 | CEA0 | 0 | 1 | ϖ | Π |
            | CF96 | CF80 | 0 | 1 | ϖ | π |
            | CEA0 | CF96 | 0 | 1 | Π | ϖ |
            | CF80 | CF96 | 0 | 1 | π | ϖ |
            | CFB1 | CEA1 | 0 | 1 | ϱ | Ρ |
            | CFB1 | CF81 | 0 | 1 | ϱ | ρ |
            | CEA1 | CFB1 | 0 | 1 | Ρ | ϱ |
            | CF81 | CFB1 | 0 | 1 | ρ | ϱ |
            | CF82 | CEA3 | 0 | 1 | ς | Σ |
            | CEA3 | CF82 | 0 | 1 | Σ | ς |
            | CF83 | CF82 | 0 | 1 | σ | ς |
            | CF82 | CF83 | 0 | 1 | ς | σ |
            | CF95 | CEA6 | 0 | 1 | ϕ | Φ |
            | CF95 | CF86 | 0 | 1 | ϕ | φ |
            | CEA6 | CF95 | 0 | 1 | Φ | ϕ |
            | CF86 | CF95 | 0 | 1 | φ | ϕ |
            | E284A6 | CEA9 | 1 | 0 | Ω | Ω |
            | E284A6 | CF89 | 1 | 0 | Ω | ω |
            | CEA9 | E284A6 | 1 | 0 | Ω | Ω |
            | CF89 | E284A6 | 1 | 0 | ω | Ω |
            | E1B280 | D092 | 0 | 1 | ᲀ | В |
            | E1B280 | D0B2 | 0 | 1 | ᲀ | в |
            | D092 | E1B280 | 0 | 1 | В | ᲀ |
            | D0B2 | E1B280 | 0 | 1 | в | ᲀ |
            | E1B281 | D094 | 0 | 1 | ᲁ | Д |
            | E1B281 | D0B4 | 0 | 1 | ᲁ | д |
            | D094 | E1B281 | 0 | 1 | Д | ᲁ |
            | D0B4 | E1B281 | 0 | 1 | д | ᲁ |
            | E1B282 | D09E | 0 | 1 | ᲂ | О |
            | E1B282 | D0BE | 0 | 1 | ᲂ | о |
            | D09E | E1B282 | 0 | 1 | О | ᲂ |
            | D0BE | E1B282 | 0 | 1 | о | ᲂ |
            | E1B283 | D0A1 | 0 | 1 | ᲃ | С |
            | E1B283 | D181 | 0 | 1 | ᲃ | с |
            | D0A1 | E1B283 | 0 | 1 | С | ᲃ |
            | D181 | E1B283 | 0 | 1 | с | ᲃ |
            | E1B284 | D0A2 | 0 | 1 | ᲄ | Т |
            | E1B285 | D0A2 | 0 | 1 | ᲅ | Т |
            | E1B284 | D182 | 0 | 1 | ᲄ | т |
            | E1B285 | D182 | 0 | 1 | ᲅ | т |
            | D0A2 | E1B284 | 0 | 1 | Т | ᲄ |
            | D182 | E1B284 | 0 | 1 | т | ᲄ |
            | E1B285 | E1B284 | 0 | 1 | ᲅ | ᲄ |
            | D0A2 | E1B285 | 0 | 1 | Т | ᲅ |
            | D182 | E1B285 | 0 | 1 | т | ᲅ |
            | E1B284 | E1B285 | 0 | 1 | ᲄ | ᲅ |
            | EA998A | E1B288 | 0 | 1 | Ꙋ | ᲈ |
            | EA998B | E1B288 | 0 | 1 | ꙋ | ᲈ |
            | E1B288 | EA998A | 0 | 1 | ᲈ | Ꙋ |
            | E1B288 | EA998B | 0 | 1 | ᲈ | ꙋ |
            | E1B286 | D0AA | 0 | 1 | ᲆ | Ъ |
            | E1B286 | D18A | 0 | 1 | ᲆ | ъ |
            | D0AA | E1B286 | 0 | 1 | Ъ | ᲆ |
            | D18A | E1B286 | 0 | 1 | ъ | ᲆ |
            | E1B287 | D1A2 | 0 | 1 | ᲇ | Ѣ |
            | E1B287 | D1A3 | 0 | 1 | ᲇ | ѣ |
            | D1A2 | E1B287 | 0 | 1 | Ѣ | ᲇ |
            | D1A3 | E1B287 | 0 | 1 | ѣ | ᲇ |
            +--------+--------+----------+----------+------+------+
            {noformat}


            Note, toupper comparison considers more pairs as equal than tolower comparison:
            {code:sql}
            SELECT SUM(eq_lower), SUM(eq_upper) FROM t21;
            {code}
            {noformat}
            +---------------+---------------+
            | SUM(eq_lower) | SUM(eq_upper) |
            +---------------+---------------+
            | 21 | 96 |
            +---------------+---------------+
            {noformat}


            - The tolower variant will compare exactly like strcasecmp() did
            - The toupper variant will compare close to how utf8mb3_general_ci works
            There are two parallel comparison systems in MariaDB collation library, implemented as virtual functions in MY_COLLATION_HANDLER:

            - Comparison according to the collation, provided by these functions
            {code:cpp}
              int (*strnncoll)(CHARSET_INFO *,
                                   const uchar *, size_t, const uchar *, size_t, my_bool);
              int (*strnncollsp)(CHARSET_INFO *,
                                     const uchar *, size_t, const uchar *, size_t);

              int (*strnncollsp_nchars)(CHARSET_INFO *,
                                            const uchar *str1, size_t len1,
                                            const uchar *str2, size_t len2,
                                            size_t nchars,
                                            uint flags);
            {code}


            - Comparison in case insensitive (but accent sensitive) style, implemented by this function:
            {code:cpp}
              int (*strcasecmp)(CHARSET_INFO *, const char *, const char *);
            {code}
            Note, accent and case sensitivity of the collation does not matter. strcasecmp() always works using accent sensitive case insensitive comparison style.

            These two parallel systems are redundant.

            Note, strcasecmp() is used mostly to compare identifiers, while the functions of the first group are used to compare data.

            Let's get rid of the second comparison system:
            1. Remove MY_COLLATION_HANDLER::strcasecmp()
            2. Introduce a new collation utf8mb4_tolower_as_ci. Note, it should work for the entire Unicode range U+0000..U+10FFFF.
            3. Replace all calls for:
            {code:cpp}
            system_charset_info->coll->strcasecmp()
            {code}
            to calls for
            {code:cpp}
            my_charset_utf8mb4_tolower_as_ci->coll->strnncoll***()
            {code}

            The change would generally be quite mechanic. However, there is one small problem: strcasecmp() accepts 0-terminated strings, while the strnncoll-alike functions accept the pointer and the length. So some refactoring will be needed. Note, Monty earlier changed many MariaDB C data types to use LEX_CSTRING (instead of just a const char pointer) to store names. So this part of the current task will switch some more C data types to LEX_CSTRING.

            h2. tolower vs toupper comparison

            Another option is to implement utf8mb3_toupper_ci which will compare upper cases (instead of lower cases).


            The difference is in a few dozen BMP characters. This script finds all those characters:
            {code:sql}
            CREATE OR REPLACE TABLE t1 (a CHAR(1) CHARACTER SET utf8mb4 COLLATE utf8mb4_uca1400_ai_ci) ENGINE=MyISAM;
            DELIMITER $$
            FOR i IN 0..0xFFFF
            DO
              INSERT INTO t1 VALUES (CHAR(i USING ucs2));
            END FOR;
            $$
            DELIMITER ;
            ALTER TABLE t1
              ADD has_casefolding INT DEFAULT (BINARY LOWER(a)<>a OR BINARY UPPER(a)<>a),
              ADD KEY(has_casefolding, a);

            CREATE OR REPLACE TABLE t21 AS
            SELECT
              HEX(t1.a) AS hex_a, HEX(t2.a) AS hex_b,
              BINARY LOWER(t1.a)=LOWER(t2.a) eq_lower,
              BINARY UPPER(t1.a)=UPPER(t2.a) AS eq_upper,
              t1.a AS a, t2.a AS b
            FROM
              t1 t1, t1 t2
            WHERE
              t1.has_casefolding=1
            AND (BINARY LOWER(t1.a)=LOWER(t2.a))<>(BINARY UPPER(t1.a)=UPPER(t2.a));

            SELECT * FROM t21;
            {code}
            {noformat}
            +--------+--------+----------+----------+------+------+
            | hex_a | hex_b | eq_lower | eq_upper | a | b |
            +--------+--------+----------+----------+------+------+
            | E1BA9E | C39F | 1 | 0 | ẞ | ß |
            | CE99 | CD85 | 0 | 1 | Ι | ͅ |
            | CEB9 | CD85 | 0 | 1 | ι | ͅ |
            | E1BEBE | CD85 | 0 | 1 | ι | ͅ |
            | E284AB | C385 | 1 | 0 | Å | Å |
            | E284AB | C3A5 | 1 | 0 | Å | å |
            | C385 | E284AB | 1 | 0 | Å | Å |
            | C3A5 | E284AB | 1 | 0 | å | Å |
            | C4B0 | 49 | 1 | 0 | İ | I |
            | C4B1 | 49 | 0 | 1 | ı | I |
            | C4B0 | 69 | 1 | 0 | İ | i |
            | C4B1 | 69 | 0 | 1 | ı | i |
            | 49 | C4B0 | 1 | 0 | I | İ |
            | 69 | C4B0 | 1 | 0 | i | İ |
            | 49 | C4B1 | 0 | 1 | I | ı |
            | 69 | C4B1 | 0 | 1 | i | ı |
            | E284AA | 4B | 1 | 0 | K | K |
            | E284AA | 6B | 1 | 0 | K | k |
            | 4B | E284AA | 1 | 0 | K | K |
            | 6B | E284AA | 1 | 0 | k | K |
            | C5BF | 53 | 0 | 1 | ſ | S |
            | C5BF | 73 | 0 | 1 | ſ | s |
            | 53 | C5BF | 0 | 1 | S | ſ |
            | 73 | C5BF | 0 | 1 | s | ſ |
            | E1BA9B | E1B9A0 | 0 | 1 | ẛ | Ṡ |
            | E1BA9B | E1B9A1 | 0 | 1 | ẛ | ṡ |
            | E1B9A0 | E1BA9B | 0 | 1 | Ṡ | ẛ |
            | E1B9A1 | E1BA9B | 0 | 1 | ṡ | ẛ |
            | CF90 | CE92 | 0 | 1 | ϐ | Β |
            | CF90 | CEB2 | 0 | 1 | ϐ | β |
            | CE92 | CF90 | 0 | 1 | Β | ϐ |
            | CEB2 | CF90 | 0 | 1 | β | ϐ |
            | CFB5 | CE95 | 0 | 1 | ϵ | Ε |
            | CFB5 | CEB5 | 0 | 1 | ϵ | ε |
            | CE95 | CFB5 | 0 | 1 | Ε | ϵ |
            | CEB5 | CFB5 | 0 | 1 | ε | ϵ |
            | CF91 | CE98 | 0 | 1 | ϑ | Θ |
            | CFB4 | CE98 | 1 | 0 | ϴ | Θ |
            | CF91 | CEB8 | 0 | 1 | ϑ | θ |
            | CFB4 | CEB8 | 1 | 0 | ϴ | θ |
            | CE98 | CF91 | 0 | 1 | Θ | ϑ |
            | CEB8 | CF91 | 0 | 1 | θ | ϑ |
            | CE98 | CFB4 | 1 | 0 | Θ | ϴ |
            | CEB8 | CFB4 | 1 | 0 | θ | ϴ |
            | CD85 | CE99 | 0 | 1 | ͅ | Ι |
            | E1BEBE | CE99 | 0 | 1 | ι | Ι |
            | CD85 | CEB9 | 0 | 1 | ͅ | ι |
            | E1BEBE | CEB9 | 0 | 1 | ι | ι |
            | CD85 | E1BEBE | 0 | 1 | ͅ | ι |
            | CE99 | E1BEBE | 0 | 1 | Ι | ι |
            | CEB9 | E1BEBE | 0 | 1 | ι | ι |
            | CFB0 | CE9A | 0 | 1 | ϰ | Κ |
            | CFB0 | CEBA | 0 | 1 | ϰ | κ |
            | CE9A | CFB0 | 0 | 1 | Κ | ϰ |
            | CEBA | CFB0 | 0 | 1 | κ | ϰ |
            | CE9C | C2B5 | 0 | 1 | Μ | µ |
            | CEBC | C2B5 | 0 | 1 | μ | µ |
            | C2B5 | CE9C | 0 | 1 | µ | Μ |
            | C2B5 | CEBC | 0 | 1 | µ | μ |
            | CF96 | CEA0 | 0 | 1 | ϖ | Π |
            | CF96 | CF80 | 0 | 1 | ϖ | π |
            | CEA0 | CF96 | 0 | 1 | Π | ϖ |
            | CF80 | CF96 | 0 | 1 | π | ϖ |
            | CFB1 | CEA1 | 0 | 1 | ϱ | Ρ |
            | CFB1 | CF81 | 0 | 1 | ϱ | ρ |
            | CEA1 | CFB1 | 0 | 1 | Ρ | ϱ |
            | CF81 | CFB1 | 0 | 1 | ρ | ϱ |
            | CF82 | CEA3 | 0 | 1 | ς | Σ |
            | CEA3 | CF82 | 0 | 1 | Σ | ς |
            | CF83 | CF82 | 0 | 1 | σ | ς |
            | CF82 | CF83 | 0 | 1 | ς | σ |
            | CF95 | CEA6 | 0 | 1 | ϕ | Φ |
            | CF95 | CF86 | 0 | 1 | ϕ | φ |
            | CEA6 | CF95 | 0 | 1 | Φ | ϕ |
            | CF86 | CF95 | 0 | 1 | φ | ϕ |
            | E284A6 | CEA9 | 1 | 0 | Ω | Ω |
            | E284A6 | CF89 | 1 | 0 | Ω | ω |
            | CEA9 | E284A6 | 1 | 0 | Ω | Ω |
            | CF89 | E284A6 | 1 | 0 | ω | Ω |
            | E1B280 | D092 | 0 | 1 | ᲀ | В |
            | E1B280 | D0B2 | 0 | 1 | ᲀ | в |
            | D092 | E1B280 | 0 | 1 | В | ᲀ |
            | D0B2 | E1B280 | 0 | 1 | в | ᲀ |
            | E1B281 | D094 | 0 | 1 | ᲁ | Д |
            | E1B281 | D0B4 | 0 | 1 | ᲁ | д |
            | D094 | E1B281 | 0 | 1 | Д | ᲁ |
            | D0B4 | E1B281 | 0 | 1 | д | ᲁ |
            | E1B282 | D09E | 0 | 1 | ᲂ | О |
            | E1B282 | D0BE | 0 | 1 | ᲂ | о |
            | D09E | E1B282 | 0 | 1 | О | ᲂ |
            | D0BE | E1B282 | 0 | 1 | о | ᲂ |
            | E1B283 | D0A1 | 0 | 1 | ᲃ | С |
            | E1B283 | D181 | 0 | 1 | ᲃ | с |
            | D0A1 | E1B283 | 0 | 1 | С | ᲃ |
            | D181 | E1B283 | 0 | 1 | с | ᲃ |
            | E1B284 | D0A2 | 0 | 1 | ᲄ | Т |
            | E1B285 | D0A2 | 0 | 1 | ᲅ | Т |
            | E1B284 | D182 | 0 | 1 | ᲄ | т |
            | E1B285 | D182 | 0 | 1 | ᲅ | т |
            | D0A2 | E1B284 | 0 | 1 | Т | ᲄ |
            | D182 | E1B284 | 0 | 1 | т | ᲄ |
            | E1B285 | E1B284 | 0 | 1 | ᲅ | ᲄ |
            | D0A2 | E1B285 | 0 | 1 | Т | ᲅ |
            | D182 | E1B285 | 0 | 1 | т | ᲅ |
            | E1B284 | E1B285 | 0 | 1 | ᲄ | ᲅ |
            | EA998A | E1B288 | 0 | 1 | Ꙋ | ᲈ |
            | EA998B | E1B288 | 0 | 1 | ꙋ | ᲈ |
            | E1B288 | EA998A | 0 | 1 | ᲈ | Ꙋ |
            | E1B288 | EA998B | 0 | 1 | ᲈ | ꙋ |
            | E1B286 | D0AA | 0 | 1 | ᲆ | Ъ |
            | E1B286 | D18A | 0 | 1 | ᲆ | ъ |
            | D0AA | E1B286 | 0 | 1 | Ъ | ᲆ |
            | D18A | E1B286 | 0 | 1 | ъ | ᲆ |
            | E1B287 | D1A2 | 0 | 1 | ᲇ | Ѣ |
            | E1B287 | D1A3 | 0 | 1 | ᲇ | ѣ |
            | D1A2 | E1B287 | 0 | 1 | Ѣ | ᲇ |
            | D1A3 | E1B287 | 0 | 1 | ѣ | ᲇ |
            +--------+--------+----------+----------+------+------+
            {noformat}


            Note, toupper comparison considers more pairs as equal than tolower comparison:
            {code:sql}
            SELECT SUM(eq_lower), SUM(eq_upper) FROM t21;
            {code}
            {noformat}
            +---------------+---------------+
            | SUM(eq_lower) | SUM(eq_upper) |
            +---------------+---------------+
            | 21 | 96 |
            +---------------+---------------+
            {noformat}

            h2. Cons of toupper vs tolower comparison
            - The tolower variant will compare exactly like strcasecmp() did
            - The toupper variant will compare close to how utf8mb3_general_ci works
            bar Alexander Barkov made changes -
            Description There are two parallel comparison systems in MariaDB collation library, implemented as virtual functions in MY_COLLATION_HANDLER:

            - Comparison according to the collation, provided by these functions
            {code:cpp}
              int (*strnncoll)(CHARSET_INFO *,
                                   const uchar *, size_t, const uchar *, size_t, my_bool);
              int (*strnncollsp)(CHARSET_INFO *,
                                     const uchar *, size_t, const uchar *, size_t);

              int (*strnncollsp_nchars)(CHARSET_INFO *,
                                            const uchar *str1, size_t len1,
                                            const uchar *str2, size_t len2,
                                            size_t nchars,
                                            uint flags);
            {code}


            - Comparison in case insensitive (but accent sensitive) style, implemented by this function:
            {code:cpp}
              int (*strcasecmp)(CHARSET_INFO *, const char *, const char *);
            {code}
            Note, accent and case sensitivity of the collation does not matter. strcasecmp() always works using accent sensitive case insensitive comparison style.

            These two parallel systems are redundant.

            Note, strcasecmp() is used mostly to compare identifiers, while the functions of the first group are used to compare data.

            Let's get rid of the second comparison system:
            1. Remove MY_COLLATION_HANDLER::strcasecmp()
            2. Introduce a new collation utf8mb4_tolower_as_ci. Note, it should work for the entire Unicode range U+0000..U+10FFFF.
            3. Replace all calls for:
            {code:cpp}
            system_charset_info->coll->strcasecmp()
            {code}
            to calls for
            {code:cpp}
            my_charset_utf8mb4_tolower_as_ci->coll->strnncoll***()
            {code}

            The change would generally be quite mechanic. However, there is one small problem: strcasecmp() accepts 0-terminated strings, while the strnncoll-alike functions accept the pointer and the length. So some refactoring will be needed. Note, Monty earlier changed many MariaDB C data types to use LEX_CSTRING (instead of just a const char pointer) to store names. So this part of the current task will switch some more C data types to LEX_CSTRING.

            h2. tolower vs toupper comparison

            Another option is to implement utf8mb3_toupper_ci which will compare upper cases (instead of lower cases).


            The difference is in a few dozen BMP characters. This script finds all those characters:
            {code:sql}
            CREATE OR REPLACE TABLE t1 (a CHAR(1) CHARACTER SET utf8mb4 COLLATE utf8mb4_uca1400_ai_ci) ENGINE=MyISAM;
            DELIMITER $$
            FOR i IN 0..0xFFFF
            DO
              INSERT INTO t1 VALUES (CHAR(i USING ucs2));
            END FOR;
            $$
            DELIMITER ;
            ALTER TABLE t1
              ADD has_casefolding INT DEFAULT (BINARY LOWER(a)<>a OR BINARY UPPER(a)<>a),
              ADD KEY(has_casefolding, a);

            CREATE OR REPLACE TABLE t21 AS
            SELECT
              HEX(t1.a) AS hex_a, HEX(t2.a) AS hex_b,
              BINARY LOWER(t1.a)=LOWER(t2.a) eq_lower,
              BINARY UPPER(t1.a)=UPPER(t2.a) AS eq_upper,
              t1.a AS a, t2.a AS b
            FROM
              t1 t1, t1 t2
            WHERE
              t1.has_casefolding=1
            AND (BINARY LOWER(t1.a)=LOWER(t2.a))<>(BINARY UPPER(t1.a)=UPPER(t2.a));

            SELECT * FROM t21;
            {code}
            {noformat}
            +--------+--------+----------+----------+------+------+
            | hex_a | hex_b | eq_lower | eq_upper | a | b |
            +--------+--------+----------+----------+------+------+
            | E1BA9E | C39F | 1 | 0 | ẞ | ß |
            | CE99 | CD85 | 0 | 1 | Ι | ͅ |
            | CEB9 | CD85 | 0 | 1 | ι | ͅ |
            | E1BEBE | CD85 | 0 | 1 | ι | ͅ |
            | E284AB | C385 | 1 | 0 | Å | Å |
            | E284AB | C3A5 | 1 | 0 | Å | å |
            | C385 | E284AB | 1 | 0 | Å | Å |
            | C3A5 | E284AB | 1 | 0 | å | Å |
            | C4B0 | 49 | 1 | 0 | İ | I |
            | C4B1 | 49 | 0 | 1 | ı | I |
            | C4B0 | 69 | 1 | 0 | İ | i |
            | C4B1 | 69 | 0 | 1 | ı | i |
            | 49 | C4B0 | 1 | 0 | I | İ |
            | 69 | C4B0 | 1 | 0 | i | İ |
            | 49 | C4B1 | 0 | 1 | I | ı |
            | 69 | C4B1 | 0 | 1 | i | ı |
            | E284AA | 4B | 1 | 0 | K | K |
            | E284AA | 6B | 1 | 0 | K | k |
            | 4B | E284AA | 1 | 0 | K | K |
            | 6B | E284AA | 1 | 0 | k | K |
            | C5BF | 53 | 0 | 1 | ſ | S |
            | C5BF | 73 | 0 | 1 | ſ | s |
            | 53 | C5BF | 0 | 1 | S | ſ |
            | 73 | C5BF | 0 | 1 | s | ſ |
            | E1BA9B | E1B9A0 | 0 | 1 | ẛ | Ṡ |
            | E1BA9B | E1B9A1 | 0 | 1 | ẛ | ṡ |
            | E1B9A0 | E1BA9B | 0 | 1 | Ṡ | ẛ |
            | E1B9A1 | E1BA9B | 0 | 1 | ṡ | ẛ |
            | CF90 | CE92 | 0 | 1 | ϐ | Β |
            | CF90 | CEB2 | 0 | 1 | ϐ | β |
            | CE92 | CF90 | 0 | 1 | Β | ϐ |
            | CEB2 | CF90 | 0 | 1 | β | ϐ |
            | CFB5 | CE95 | 0 | 1 | ϵ | Ε |
            | CFB5 | CEB5 | 0 | 1 | ϵ | ε |
            | CE95 | CFB5 | 0 | 1 | Ε | ϵ |
            | CEB5 | CFB5 | 0 | 1 | ε | ϵ |
            | CF91 | CE98 | 0 | 1 | ϑ | Θ |
            | CFB4 | CE98 | 1 | 0 | ϴ | Θ |
            | CF91 | CEB8 | 0 | 1 | ϑ | θ |
            | CFB4 | CEB8 | 1 | 0 | ϴ | θ |
            | CE98 | CF91 | 0 | 1 | Θ | ϑ |
            | CEB8 | CF91 | 0 | 1 | θ | ϑ |
            | CE98 | CFB4 | 1 | 0 | Θ | ϴ |
            | CEB8 | CFB4 | 1 | 0 | θ | ϴ |
            | CD85 | CE99 | 0 | 1 | ͅ | Ι |
            | E1BEBE | CE99 | 0 | 1 | ι | Ι |
            | CD85 | CEB9 | 0 | 1 | ͅ | ι |
            | E1BEBE | CEB9 | 0 | 1 | ι | ι |
            | CD85 | E1BEBE | 0 | 1 | ͅ | ι |
            | CE99 | E1BEBE | 0 | 1 | Ι | ι |
            | CEB9 | E1BEBE | 0 | 1 | ι | ι |
            | CFB0 | CE9A | 0 | 1 | ϰ | Κ |
            | CFB0 | CEBA | 0 | 1 | ϰ | κ |
            | CE9A | CFB0 | 0 | 1 | Κ | ϰ |
            | CEBA | CFB0 | 0 | 1 | κ | ϰ |
            | CE9C | C2B5 | 0 | 1 | Μ | µ |
            | CEBC | C2B5 | 0 | 1 | μ | µ |
            | C2B5 | CE9C | 0 | 1 | µ | Μ |
            | C2B5 | CEBC | 0 | 1 | µ | μ |
            | CF96 | CEA0 | 0 | 1 | ϖ | Π |
            | CF96 | CF80 | 0 | 1 | ϖ | π |
            | CEA0 | CF96 | 0 | 1 | Π | ϖ |
            | CF80 | CF96 | 0 | 1 | π | ϖ |
            | CFB1 | CEA1 | 0 | 1 | ϱ | Ρ |
            | CFB1 | CF81 | 0 | 1 | ϱ | ρ |
            | CEA1 | CFB1 | 0 | 1 | Ρ | ϱ |
            | CF81 | CFB1 | 0 | 1 | ρ | ϱ |
            | CF82 | CEA3 | 0 | 1 | ς | Σ |
            | CEA3 | CF82 | 0 | 1 | Σ | ς |
            | CF83 | CF82 | 0 | 1 | σ | ς |
            | CF82 | CF83 | 0 | 1 | ς | σ |
            | CF95 | CEA6 | 0 | 1 | ϕ | Φ |
            | CF95 | CF86 | 0 | 1 | ϕ | φ |
            | CEA6 | CF95 | 0 | 1 | Φ | ϕ |
            | CF86 | CF95 | 0 | 1 | φ | ϕ |
            | E284A6 | CEA9 | 1 | 0 | Ω | Ω |
            | E284A6 | CF89 | 1 | 0 | Ω | ω |
            | CEA9 | E284A6 | 1 | 0 | Ω | Ω |
            | CF89 | E284A6 | 1 | 0 | ω | Ω |
            | E1B280 | D092 | 0 | 1 | ᲀ | В |
            | E1B280 | D0B2 | 0 | 1 | ᲀ | в |
            | D092 | E1B280 | 0 | 1 | В | ᲀ |
            | D0B2 | E1B280 | 0 | 1 | в | ᲀ |
            | E1B281 | D094 | 0 | 1 | ᲁ | Д |
            | E1B281 | D0B4 | 0 | 1 | ᲁ | д |
            | D094 | E1B281 | 0 | 1 | Д | ᲁ |
            | D0B4 | E1B281 | 0 | 1 | д | ᲁ |
            | E1B282 | D09E | 0 | 1 | ᲂ | О |
            | E1B282 | D0BE | 0 | 1 | ᲂ | о |
            | D09E | E1B282 | 0 | 1 | О | ᲂ |
            | D0BE | E1B282 | 0 | 1 | о | ᲂ |
            | E1B283 | D0A1 | 0 | 1 | ᲃ | С |
            | E1B283 | D181 | 0 | 1 | ᲃ | с |
            | D0A1 | E1B283 | 0 | 1 | С | ᲃ |
            | D181 | E1B283 | 0 | 1 | с | ᲃ |
            | E1B284 | D0A2 | 0 | 1 | ᲄ | Т |
            | E1B285 | D0A2 | 0 | 1 | ᲅ | Т |
            | E1B284 | D182 | 0 | 1 | ᲄ | т |
            | E1B285 | D182 | 0 | 1 | ᲅ | т |
            | D0A2 | E1B284 | 0 | 1 | Т | ᲄ |
            | D182 | E1B284 | 0 | 1 | т | ᲄ |
            | E1B285 | E1B284 | 0 | 1 | ᲅ | ᲄ |
            | D0A2 | E1B285 | 0 | 1 | Т | ᲅ |
            | D182 | E1B285 | 0 | 1 | т | ᲅ |
            | E1B284 | E1B285 | 0 | 1 | ᲄ | ᲅ |
            | EA998A | E1B288 | 0 | 1 | Ꙋ | ᲈ |
            | EA998B | E1B288 | 0 | 1 | ꙋ | ᲈ |
            | E1B288 | EA998A | 0 | 1 | ᲈ | Ꙋ |
            | E1B288 | EA998B | 0 | 1 | ᲈ | ꙋ |
            | E1B286 | D0AA | 0 | 1 | ᲆ | Ъ |
            | E1B286 | D18A | 0 | 1 | ᲆ | ъ |
            | D0AA | E1B286 | 0 | 1 | Ъ | ᲆ |
            | D18A | E1B286 | 0 | 1 | ъ | ᲆ |
            | E1B287 | D1A2 | 0 | 1 | ᲇ | Ѣ |
            | E1B287 | D1A3 | 0 | 1 | ᲇ | ѣ |
            | D1A2 | E1B287 | 0 | 1 | Ѣ | ᲇ |
            | D1A3 | E1B287 | 0 | 1 | ѣ | ᲇ |
            +--------+--------+----------+----------+------+------+
            {noformat}


            Note, toupper comparison considers more pairs as equal than tolower comparison:
            {code:sql}
            SELECT SUM(eq_lower), SUM(eq_upper) FROM t21;
            {code}
            {noformat}
            +---------------+---------------+
            | SUM(eq_lower) | SUM(eq_upper) |
            +---------------+---------------+
            | 21 | 96 |
            +---------------+---------------+
            {noformat}

            h2. Cons of toupper vs tolower comparison
            - The tolower variant will compare exactly like strcasecmp() did
            - The toupper variant will compare close to how utf8mb3_general_ci works
            There are two parallel comparison systems in MariaDB collation library, implemented as virtual functions in MY_COLLATION_HANDLER:

            - Comparison according to the collation, provided by these functions
            {code:cpp}
              int (*strnncoll)(CHARSET_INFO *,
                                   const uchar *, size_t, const uchar *, size_t, my_bool);
              int (*strnncollsp)(CHARSET_INFO *,
                                     const uchar *, size_t, const uchar *, size_t);

              int (*strnncollsp_nchars)(CHARSET_INFO *,
                                            const uchar *str1, size_t len1,
                                            const uchar *str2, size_t len2,
                                            size_t nchars,
                                            uint flags);
            {code}


            - Comparison in case insensitive (but accent sensitive) style, implemented by this function:
            {code:cpp}
              int (*strcasecmp)(CHARSET_INFO *, const char *, const char *);
            {code}
            Note, accent and case sensitivity of the collation does not matter. strcasecmp() always works using accent sensitive case insensitive comparison style.

            These two parallel systems are redundant.

            Note, strcasecmp() is used mostly to compare identifiers, while the functions of the first group are used to compare data.

            Let's get rid of the second comparison system:
            1. Remove MY_COLLATION_HANDLER::strcasecmp()
            2. Introduce a new collation utf8mb4_tolower_as_ci. Note, it should work for the entire Unicode range U+0000..U+10FFFF.
            3. Replace all calls for:
            {code:cpp}
            system_charset_info->coll->strcasecmp()
            {code}
            to calls for
            {code:cpp}
            my_charset_utf8mb4_tolower_as_ci->coll->strnncoll***()
            {code}

            The change would generally be quite mechanic. However, there is one small problem: strcasecmp() accepts 0-terminated strings, while the strnncoll-alike functions accept the pointer and the length. So some refactoring will be needed. Note, Monty earlier changed many MariaDB C data types to use LEX_CSTRING (instead of just a const char pointer) to store names. So this part of the current task will switch some more C data types to LEX_CSTRING.

            h2. tolower vs toupper comparison

            Another option is to implement utf8mb3_toupper_ci which will compare upper cases (instead of lower cases).


            The difference is in a few dozen BMP characters. This script finds all those characters:
            {code:sql}
            CREATE OR REPLACE TABLE t1 (a CHAR(1) CHARACTER SET utf8mb4 COLLATE utf8mb4_uca1400_ai_ci) ENGINE=MyISAM;
            DELIMITER $$
            FOR i IN 0..0xFFFF
            DO
              INSERT INTO t1 VALUES (CHAR(i USING ucs2));
            END FOR;
            $$
            DELIMITER ;
            ALTER TABLE t1
              ADD has_casefolding INT DEFAULT (BINARY LOWER(a)<>a OR BINARY UPPER(a)<>a),
              ADD KEY(has_casefolding, a);

            CREATE OR REPLACE TABLE t21 AS
            SELECT
              HEX(t1.a) AS hex_a, HEX(t2.a) AS hex_b,
              BINARY LOWER(t1.a)=LOWER(t2.a) eq_lower,
              BINARY UPPER(t1.a)=UPPER(t2.a) AS eq_upper,
              t1.a AS a, t2.a AS b
            FROM
              t1 t1, t1 t2
            WHERE
              t1.has_casefolding=1
            AND (BINARY LOWER(t1.a)=LOWER(t2.a))<>(BINARY UPPER(t1.a)=UPPER(t2.a));

            SELECT
              HEX(CONVERT(a USING ucs2)) AS unicode_a,
              HEX(CONVERT(b USING ucs2)) AS unicode_b,
              t21.* FROM t21;
            {code}
            {noformat}
            +-----------+-----------+--------+--------+----------+----------+------+------+
            | unicode_a | unicode_b | hex_a | hex_b | eq_lower | eq_upper | a | b |
            +-----------+-----------+--------+--------+----------+----------+------+------+
            | 1E9E | 00DF | E1BA9E | C39F | 1 | 0 | ẞ | ß |
            | 0399 | 0345 | CE99 | CD85 | 0 | 1 | Ι | ͅ |
            | 03B9 | 0345 | CEB9 | CD85 | 0 | 1 | ι | ͅ |
            | 1FBE | 0345 | E1BEBE | CD85 | 0 | 1 | ι | ͅ |
            | 212B | 00C5 | E284AB | C385 | 1 | 0 | Å | Å |
            | 212B | 00E5 | E284AB | C3A5 | 1 | 0 | Å | å |
            | 00C5 | 212B | C385 | E284AB | 1 | 0 | Å | Å |
            | 00E5 | 212B | C3A5 | E284AB | 1 | 0 | å | Å |
            | 0130 | 0049 | C4B0 | 49 | 1 | 0 | İ | I |
            | 0131 | 0049 | C4B1 | 49 | 0 | 1 | ı | I |
            | 0130 | 0069 | C4B0 | 69 | 1 | 0 | İ | i |
            | 0131 | 0069 | C4B1 | 69 | 0 | 1 | ı | i |
            | 0049 | 0130 | 49 | C4B0 | 1 | 0 | I | İ |
            | 0069 | 0130 | 69 | C4B0 | 1 | 0 | i | İ |
            | 0049 | 0131 | 49 | C4B1 | 0 | 1 | I | ı |
            | 0069 | 0131 | 69 | C4B1 | 0 | 1 | i | ı |
            | 212A | 004B | E284AA | 4B | 1 | 0 | K | K |
            | 212A | 006B | E284AA | 6B | 1 | 0 | K | k |
            | 004B | 212A | 4B | E284AA | 1 | 0 | K | K |
            | 006B | 212A | 6B | E284AA | 1 | 0 | k | K |
            | 017F | 0053 | C5BF | 53 | 0 | 1 | ſ | S |
            | 017F | 0073 | C5BF | 73 | 0 | 1 | ſ | s |
            | 0053 | 017F | 53 | C5BF | 0 | 1 | S | ſ |
            | 0073 | 017F | 73 | C5BF | 0 | 1 | s | ſ |
            | 1E9B | 1E60 | E1BA9B | E1B9A0 | 0 | 1 | ẛ | Ṡ |
            | 1E9B | 1E61 | E1BA9B | E1B9A1 | 0 | 1 | ẛ | ṡ |
            | 1E60 | 1E9B | E1B9A0 | E1BA9B | 0 | 1 | Ṡ | ẛ |
            | 1E61 | 1E9B | E1B9A1 | E1BA9B | 0 | 1 | ṡ | ẛ |
            | 03D0 | 0392 | CF90 | CE92 | 0 | 1 | ϐ | Β |
            | 03D0 | 03B2 | CF90 | CEB2 | 0 | 1 | ϐ | β |
            | 0392 | 03D0 | CE92 | CF90 | 0 | 1 | Β | ϐ |
            | 03B2 | 03D0 | CEB2 | CF90 | 0 | 1 | β | ϐ |
            | 03F5 | 0395 | CFB5 | CE95 | 0 | 1 | ϵ | Ε |
            | 03F5 | 03B5 | CFB5 | CEB5 | 0 | 1 | ϵ | ε |
            | 0395 | 03F5 | CE95 | CFB5 | 0 | 1 | Ε | ϵ |
            | 03B5 | 03F5 | CEB5 | CFB5 | 0 | 1 | ε | ϵ |
            | 03D1 | 0398 | CF91 | CE98 | 0 | 1 | ϑ | Θ |
            | 03F4 | 0398 | CFB4 | CE98 | 1 | 0 | ϴ | Θ |
            | 03D1 | 03B8 | CF91 | CEB8 | 0 | 1 | ϑ | θ |
            | 03F4 | 03B8 | CFB4 | CEB8 | 1 | 0 | ϴ | θ |
            | 0398 | 03D1 | CE98 | CF91 | 0 | 1 | Θ | ϑ |
            | 03B8 | 03D1 | CEB8 | CF91 | 0 | 1 | θ | ϑ |
            | 0398 | 03F4 | CE98 | CFB4 | 1 | 0 | Θ | ϴ |
            | 03B8 | 03F4 | CEB8 | CFB4 | 1 | 0 | θ | ϴ |
            | 0345 | 0399 | CD85 | CE99 | 0 | 1 | ͅ | Ι |
            | 1FBE | 0399 | E1BEBE | CE99 | 0 | 1 | ι | Ι |
            | 0345 | 03B9 | CD85 | CEB9 | 0 | 1 | ͅ | ι |
            | 1FBE | 03B9 | E1BEBE | CEB9 | 0 | 1 | ι | ι |
            | 0345 | 1FBE | CD85 | E1BEBE | 0 | 1 | ͅ | ι |
            | 0399 | 1FBE | CE99 | E1BEBE | 0 | 1 | Ι | ι |
            | 03B9 | 1FBE | CEB9 | E1BEBE | 0 | 1 | ι | ι |
            | 03F0 | 039A | CFB0 | CE9A | 0 | 1 | ϰ | Κ |
            | 03F0 | 03BA | CFB0 | CEBA | 0 | 1 | ϰ | κ |
            | 039A | 03F0 | CE9A | CFB0 | 0 | 1 | Κ | ϰ |
            | 03BA | 03F0 | CEBA | CFB0 | 0 | 1 | κ | ϰ |
            | 039C | 00B5 | CE9C | C2B5 | 0 | 1 | Μ | µ |
            | 03BC | 00B5 | CEBC | C2B5 | 0 | 1 | μ | µ |
            | 00B5 | 039C | C2B5 | CE9C | 0 | 1 | µ | Μ |
            | 00B5 | 03BC | C2B5 | CEBC | 0 | 1 | µ | μ |
            | 03D6 | 03A0 | CF96 | CEA0 | 0 | 1 | ϖ | Π |
            | 03D6 | 03C0 | CF96 | CF80 | 0 | 1 | ϖ | π |
            | 03A0 | 03D6 | CEA0 | CF96 | 0 | 1 | Π | ϖ |
            | 03C0 | 03D6 | CF80 | CF96 | 0 | 1 | π | ϖ |
            | 03F1 | 03A1 | CFB1 | CEA1 | 0 | 1 | ϱ | Ρ |
            | 03F1 | 03C1 | CFB1 | CF81 | 0 | 1 | ϱ | ρ |
            | 03A1 | 03F1 | CEA1 | CFB1 | 0 | 1 | Ρ | ϱ |
            | 03C1 | 03F1 | CF81 | CFB1 | 0 | 1 | ρ | ϱ |
            | 03C2 | 03A3 | CF82 | CEA3 | 0 | 1 | ς | Σ |
            | 03A3 | 03C2 | CEA3 | CF82 | 0 | 1 | Σ | ς |
            | 03C3 | 03C2 | CF83 | CF82 | 0 | 1 | σ | ς |
            | 03C2 | 03C3 | CF82 | CF83 | 0 | 1 | ς | σ |
            | 03D5 | 03A6 | CF95 | CEA6 | 0 | 1 | ϕ | Φ |
            | 03D5 | 03C6 | CF95 | CF86 | 0 | 1 | ϕ | φ |
            | 03A6 | 03D5 | CEA6 | CF95 | 0 | 1 | Φ | ϕ |
            | 03C6 | 03D5 | CF86 | CF95 | 0 | 1 | φ | ϕ |
            | 2126 | 03A9 | E284A6 | CEA9 | 1 | 0 | Ω | Ω |
            | 2126 | 03C9 | E284A6 | CF89 | 1 | 0 | Ω | ω |
            | 03A9 | 2126 | CEA9 | E284A6 | 1 | 0 | Ω | Ω |
            | 03C9 | 2126 | CF89 | E284A6 | 1 | 0 | ω | Ω |
            | 1C80 | 0412 | E1B280 | D092 | 0 | 1 | ᲀ | В |
            | 1C80 | 0432 | E1B280 | D0B2 | 0 | 1 | ᲀ | в |
            | 0412 | 1C80 | D092 | E1B280 | 0 | 1 | В | ᲀ |
            | 0432 | 1C80 | D0B2 | E1B280 | 0 | 1 | в | ᲀ |
            | 1C81 | 0414 | E1B281 | D094 | 0 | 1 | ᲁ | Д |
            | 1C81 | 0434 | E1B281 | D0B4 | 0 | 1 | ᲁ | д |
            | 0414 | 1C81 | D094 | E1B281 | 0 | 1 | Д | ᲁ |
            | 0434 | 1C81 | D0B4 | E1B281 | 0 | 1 | д | ᲁ |
            | 1C82 | 041E | E1B282 | D09E | 0 | 1 | ᲂ | О |
            | 1C82 | 043E | E1B282 | D0BE | 0 | 1 | ᲂ | о |
            | 041E | 1C82 | D09E | E1B282 | 0 | 1 | О | ᲂ |
            | 043E | 1C82 | D0BE | E1B282 | 0 | 1 | о | ᲂ |
            | 1C83 | 0421 | E1B283 | D0A1 | 0 | 1 | ᲃ | С |
            | 1C83 | 0441 | E1B283 | D181 | 0 | 1 | ᲃ | с |
            | 0421 | 1C83 | D0A1 | E1B283 | 0 | 1 | С | ᲃ |
            | 0441 | 1C83 | D181 | E1B283 | 0 | 1 | с | ᲃ |
            | 1C84 | 0422 | E1B284 | D0A2 | 0 | 1 | ᲄ | Т |
            | 1C85 | 0422 | E1B285 | D0A2 | 0 | 1 | ᲅ | Т |
            | 1C84 | 0442 | E1B284 | D182 | 0 | 1 | ᲄ | т |
            | 1C85 | 0442 | E1B285 | D182 | 0 | 1 | ᲅ | т |
            | 0422 | 1C84 | D0A2 | E1B284 | 0 | 1 | Т | ᲄ |
            | 0442 | 1C84 | D182 | E1B284 | 0 | 1 | т | ᲄ |
            | 1C85 | 1C84 | E1B285 | E1B284 | 0 | 1 | ᲅ | ᲄ |
            | 0422 | 1C85 | D0A2 | E1B285 | 0 | 1 | Т | ᲅ |
            | 0442 | 1C85 | D182 | E1B285 | 0 | 1 | т | ᲅ |
            | 1C84 | 1C85 | E1B284 | E1B285 | 0 | 1 | ᲄ | ᲅ |
            | A64A | 1C88 | EA998A | E1B288 | 0 | 1 | Ꙋ | ᲈ |
            | A64B | 1C88 | EA998B | E1B288 | 0 | 1 | ꙋ | ᲈ |
            | 1C88 | A64A | E1B288 | EA998A | 0 | 1 | ᲈ | Ꙋ |
            | 1C88 | A64B | E1B288 | EA998B | 0 | 1 | ᲈ | ꙋ |
            | 1C86 | 042A | E1B286 | D0AA | 0 | 1 | ᲆ | Ъ |
            | 1C86 | 044A | E1B286 | D18A | 0 | 1 | ᲆ | ъ |
            | 042A | 1C86 | D0AA | E1B286 | 0 | 1 | Ъ | ᲆ |
            | 044A | 1C86 | D18A | E1B286 | 0 | 1 | ъ | ᲆ |
            | 1C87 | 0462 | E1B287 | D1A2 | 0 | 1 | ᲇ | Ѣ |
            | 1C87 | 0463 | E1B287 | D1A3 | 0 | 1 | ᲇ | ѣ |
            | 0462 | 1C87 | D1A2 | E1B287 | 0 | 1 | Ѣ | ᲇ |
            | 0463 | 1C87 | D1A3 | E1B287 | 0 | 1 | ѣ | ᲇ |
            +-----------+-----------+--------+--------+----------+----------+------+------+
            {noformat}


            Note, toupper comparison considers more pairs as equal than tolower comparison:
            {code:sql}
            SELECT SUM(eq_lower), SUM(eq_upper) FROM t21;
            {code}
            {noformat}
            +---------------+---------------+
            | SUM(eq_lower) | SUM(eq_upper) |
            +---------------+---------------+
            | 21 | 96 |
            +---------------+---------------+
            {noformat}

            h2. Cons of toupper vs tolower comparison
            - The tolower variant will compare exactly like strcasecmp() did
            - The toupper variant will compare close to how utf8mb3_general_ci works
            bar Alexander Barkov made changes -
            Description There are two parallel comparison systems in MariaDB collation library, implemented as virtual functions in MY_COLLATION_HANDLER:

            - Comparison according to the collation, provided by these functions
            {code:cpp}
              int (*strnncoll)(CHARSET_INFO *,
                                   const uchar *, size_t, const uchar *, size_t, my_bool);
              int (*strnncollsp)(CHARSET_INFO *,
                                     const uchar *, size_t, const uchar *, size_t);

              int (*strnncollsp_nchars)(CHARSET_INFO *,
                                            const uchar *str1, size_t len1,
                                            const uchar *str2, size_t len2,
                                            size_t nchars,
                                            uint flags);
            {code}


            - Comparison in case insensitive (but accent sensitive) style, implemented by this function:
            {code:cpp}
              int (*strcasecmp)(CHARSET_INFO *, const char *, const char *);
            {code}
            Note, accent and case sensitivity of the collation does not matter. strcasecmp() always works using accent sensitive case insensitive comparison style.

            These two parallel systems are redundant.

            Note, strcasecmp() is used mostly to compare identifiers, while the functions of the first group are used to compare data.

            Let's get rid of the second comparison system:
            1. Remove MY_COLLATION_HANDLER::strcasecmp()
            2. Introduce a new collation utf8mb4_tolower_as_ci. Note, it should work for the entire Unicode range U+0000..U+10FFFF.
            3. Replace all calls for:
            {code:cpp}
            system_charset_info->coll->strcasecmp()
            {code}
            to calls for
            {code:cpp}
            my_charset_utf8mb4_tolower_as_ci->coll->strnncoll***()
            {code}

            The change would generally be quite mechanic. However, there is one small problem: strcasecmp() accepts 0-terminated strings, while the strnncoll-alike functions accept the pointer and the length. So some refactoring will be needed. Note, Monty earlier changed many MariaDB C data types to use LEX_CSTRING (instead of just a const char pointer) to store names. So this part of the current task will switch some more C data types to LEX_CSTRING.

            h2. tolower vs toupper comparison

            Another option is to implement utf8mb3_toupper_ci which will compare upper cases (instead of lower cases).


            The difference is in a few dozen BMP characters. This script finds all those characters:
            {code:sql}
            CREATE OR REPLACE TABLE t1 (a CHAR(1) CHARACTER SET utf8mb4 COLLATE utf8mb4_uca1400_ai_ci) ENGINE=MyISAM;
            DELIMITER $$
            FOR i IN 0..0xFFFF
            DO
              INSERT INTO t1 VALUES (CHAR(i USING ucs2));
            END FOR;
            $$
            DELIMITER ;
            ALTER TABLE t1
              ADD has_casefolding INT DEFAULT (BINARY LOWER(a)<>a OR BINARY UPPER(a)<>a),
              ADD KEY(has_casefolding, a);

            CREATE OR REPLACE TABLE t21 AS
            SELECT
              HEX(t1.a) AS hex_a, HEX(t2.a) AS hex_b,
              BINARY LOWER(t1.a)=LOWER(t2.a) eq_lower,
              BINARY UPPER(t1.a)=UPPER(t2.a) AS eq_upper,
              t1.a AS a, t2.a AS b
            FROM
              t1 t1, t1 t2
            WHERE
              t1.has_casefolding=1
            AND (BINARY LOWER(t1.a)=LOWER(t2.a))<>(BINARY UPPER(t1.a)=UPPER(t2.a));

            SELECT
              HEX(CONVERT(a USING ucs2)) AS unicode_a,
              HEX(CONVERT(b USING ucs2)) AS unicode_b,
              t21.* FROM t21;
            {code}
            {noformat}
            +-----------+-----------+--------+--------+----------+----------+------+------+
            | unicode_a | unicode_b | hex_a | hex_b | eq_lower | eq_upper | a | b |
            +-----------+-----------+--------+--------+----------+----------+------+------+
            | 1E9E | 00DF | E1BA9E | C39F | 1 | 0 | ẞ | ß |
            | 0399 | 0345 | CE99 | CD85 | 0 | 1 | Ι | ͅ |
            | 03B9 | 0345 | CEB9 | CD85 | 0 | 1 | ι | ͅ |
            | 1FBE | 0345 | E1BEBE | CD85 | 0 | 1 | ι | ͅ |
            | 212B | 00C5 | E284AB | C385 | 1 | 0 | Å | Å |
            | 212B | 00E5 | E284AB | C3A5 | 1 | 0 | Å | å |
            | 00C5 | 212B | C385 | E284AB | 1 | 0 | Å | Å |
            | 00E5 | 212B | C3A5 | E284AB | 1 | 0 | å | Å |
            | 0130 | 0049 | C4B0 | 49 | 1 | 0 | İ | I |
            | 0131 | 0049 | C4B1 | 49 | 0 | 1 | ı | I |
            | 0130 | 0069 | C4B0 | 69 | 1 | 0 | İ | i |
            | 0131 | 0069 | C4B1 | 69 | 0 | 1 | ı | i |
            | 0049 | 0130 | 49 | C4B0 | 1 | 0 | I | İ |
            | 0069 | 0130 | 69 | C4B0 | 1 | 0 | i | İ |
            | 0049 | 0131 | 49 | C4B1 | 0 | 1 | I | ı |
            | 0069 | 0131 | 69 | C4B1 | 0 | 1 | i | ı |
            | 212A | 004B | E284AA | 4B | 1 | 0 | K | K |
            | 212A | 006B | E284AA | 6B | 1 | 0 | K | k |
            | 004B | 212A | 4B | E284AA | 1 | 0 | K | K |
            | 006B | 212A | 6B | E284AA | 1 | 0 | k | K |
            | 017F | 0053 | C5BF | 53 | 0 | 1 | ſ | S |
            | 017F | 0073 | C5BF | 73 | 0 | 1 | ſ | s |
            | 0053 | 017F | 53 | C5BF | 0 | 1 | S | ſ |
            | 0073 | 017F | 73 | C5BF | 0 | 1 | s | ſ |
            | 1E9B | 1E60 | E1BA9B | E1B9A0 | 0 | 1 | ẛ | Ṡ |
            | 1E9B | 1E61 | E1BA9B | E1B9A1 | 0 | 1 | ẛ | ṡ |
            | 1E60 | 1E9B | E1B9A0 | E1BA9B | 0 | 1 | Ṡ | ẛ |
            | 1E61 | 1E9B | E1B9A1 | E1BA9B | 0 | 1 | ṡ | ẛ |
            | 03D0 | 0392 | CF90 | CE92 | 0 | 1 | ϐ | Β |
            | 03D0 | 03B2 | CF90 | CEB2 | 0 | 1 | ϐ | β |
            | 0392 | 03D0 | CE92 | CF90 | 0 | 1 | Β | ϐ |
            | 03B2 | 03D0 | CEB2 | CF90 | 0 | 1 | β | ϐ |
            | 03F5 | 0395 | CFB5 | CE95 | 0 | 1 | ϵ | Ε |
            | 03F5 | 03B5 | CFB5 | CEB5 | 0 | 1 | ϵ | ε |
            | 0395 | 03F5 | CE95 | CFB5 | 0 | 1 | Ε | ϵ |
            | 03B5 | 03F5 | CEB5 | CFB5 | 0 | 1 | ε | ϵ |
            | 03D1 | 0398 | CF91 | CE98 | 0 | 1 | ϑ | Θ |
            | 03F4 | 0398 | CFB4 | CE98 | 1 | 0 | ϴ | Θ |
            | 03D1 | 03B8 | CF91 | CEB8 | 0 | 1 | ϑ | θ |
            | 03F4 | 03B8 | CFB4 | CEB8 | 1 | 0 | ϴ | θ |
            | 0398 | 03D1 | CE98 | CF91 | 0 | 1 | Θ | ϑ |
            | 03B8 | 03D1 | CEB8 | CF91 | 0 | 1 | θ | ϑ |
            | 0398 | 03F4 | CE98 | CFB4 | 1 | 0 | Θ | ϴ |
            | 03B8 | 03F4 | CEB8 | CFB4 | 1 | 0 | θ | ϴ |
            | 0345 | 0399 | CD85 | CE99 | 0 | 1 | ͅ | Ι |
            | 1FBE | 0399 | E1BEBE | CE99 | 0 | 1 | ι | Ι |
            | 0345 | 03B9 | CD85 | CEB9 | 0 | 1 | ͅ | ι |
            | 1FBE | 03B9 | E1BEBE | CEB9 | 0 | 1 | ι | ι |
            | 0345 | 1FBE | CD85 | E1BEBE | 0 | 1 | ͅ | ι |
            | 0399 | 1FBE | CE99 | E1BEBE | 0 | 1 | Ι | ι |
            | 03B9 | 1FBE | CEB9 | E1BEBE | 0 | 1 | ι | ι |
            | 03F0 | 039A | CFB0 | CE9A | 0 | 1 | ϰ | Κ |
            | 03F0 | 03BA | CFB0 | CEBA | 0 | 1 | ϰ | κ |
            | 039A | 03F0 | CE9A | CFB0 | 0 | 1 | Κ | ϰ |
            | 03BA | 03F0 | CEBA | CFB0 | 0 | 1 | κ | ϰ |
            | 039C | 00B5 | CE9C | C2B5 | 0 | 1 | Μ | µ |
            | 03BC | 00B5 | CEBC | C2B5 | 0 | 1 | μ | µ |
            | 00B5 | 039C | C2B5 | CE9C | 0 | 1 | µ | Μ |
            | 00B5 | 03BC | C2B5 | CEBC | 0 | 1 | µ | μ |
            | 03D6 | 03A0 | CF96 | CEA0 | 0 | 1 | ϖ | Π |
            | 03D6 | 03C0 | CF96 | CF80 | 0 | 1 | ϖ | π |
            | 03A0 | 03D6 | CEA0 | CF96 | 0 | 1 | Π | ϖ |
            | 03C0 | 03D6 | CF80 | CF96 | 0 | 1 | π | ϖ |
            | 03F1 | 03A1 | CFB1 | CEA1 | 0 | 1 | ϱ | Ρ |
            | 03F1 | 03C1 | CFB1 | CF81 | 0 | 1 | ϱ | ρ |
            | 03A1 | 03F1 | CEA1 | CFB1 | 0 | 1 | Ρ | ϱ |
            | 03C1 | 03F1 | CF81 | CFB1 | 0 | 1 | ρ | ϱ |
            | 03C2 | 03A3 | CF82 | CEA3 | 0 | 1 | ς | Σ |
            | 03A3 | 03C2 | CEA3 | CF82 | 0 | 1 | Σ | ς |
            | 03C3 | 03C2 | CF83 | CF82 | 0 | 1 | σ | ς |
            | 03C2 | 03C3 | CF82 | CF83 | 0 | 1 | ς | σ |
            | 03D5 | 03A6 | CF95 | CEA6 | 0 | 1 | ϕ | Φ |
            | 03D5 | 03C6 | CF95 | CF86 | 0 | 1 | ϕ | φ |
            | 03A6 | 03D5 | CEA6 | CF95 | 0 | 1 | Φ | ϕ |
            | 03C6 | 03D5 | CF86 | CF95 | 0 | 1 | φ | ϕ |
            | 2126 | 03A9 | E284A6 | CEA9 | 1 | 0 | Ω | Ω |
            | 2126 | 03C9 | E284A6 | CF89 | 1 | 0 | Ω | ω |
            | 03A9 | 2126 | CEA9 | E284A6 | 1 | 0 | Ω | Ω |
            | 03C9 | 2126 | CF89 | E284A6 | 1 | 0 | ω | Ω |
            | 1C80 | 0412 | E1B280 | D092 | 0 | 1 | ᲀ | В |
            | 1C80 | 0432 | E1B280 | D0B2 | 0 | 1 | ᲀ | в |
            | 0412 | 1C80 | D092 | E1B280 | 0 | 1 | В | ᲀ |
            | 0432 | 1C80 | D0B2 | E1B280 | 0 | 1 | в | ᲀ |
            | 1C81 | 0414 | E1B281 | D094 | 0 | 1 | ᲁ | Д |
            | 1C81 | 0434 | E1B281 | D0B4 | 0 | 1 | ᲁ | д |
            | 0414 | 1C81 | D094 | E1B281 | 0 | 1 | Д | ᲁ |
            | 0434 | 1C81 | D0B4 | E1B281 | 0 | 1 | д | ᲁ |
            | 1C82 | 041E | E1B282 | D09E | 0 | 1 | ᲂ | О |
            | 1C82 | 043E | E1B282 | D0BE | 0 | 1 | ᲂ | о |
            | 041E | 1C82 | D09E | E1B282 | 0 | 1 | О | ᲂ |
            | 043E | 1C82 | D0BE | E1B282 | 0 | 1 | о | ᲂ |
            | 1C83 | 0421 | E1B283 | D0A1 | 0 | 1 | ᲃ | С |
            | 1C83 | 0441 | E1B283 | D181 | 0 | 1 | ᲃ | с |
            | 0421 | 1C83 | D0A1 | E1B283 | 0 | 1 | С | ᲃ |
            | 0441 | 1C83 | D181 | E1B283 | 0 | 1 | с | ᲃ |
            | 1C84 | 0422 | E1B284 | D0A2 | 0 | 1 | ᲄ | Т |
            | 1C85 | 0422 | E1B285 | D0A2 | 0 | 1 | ᲅ | Т |
            | 1C84 | 0442 | E1B284 | D182 | 0 | 1 | ᲄ | т |
            | 1C85 | 0442 | E1B285 | D182 | 0 | 1 | ᲅ | т |
            | 0422 | 1C84 | D0A2 | E1B284 | 0 | 1 | Т | ᲄ |
            | 0442 | 1C84 | D182 | E1B284 | 0 | 1 | т | ᲄ |
            | 1C85 | 1C84 | E1B285 | E1B284 | 0 | 1 | ᲅ | ᲄ |
            | 0422 | 1C85 | D0A2 | E1B285 | 0 | 1 | Т | ᲅ |
            | 0442 | 1C85 | D182 | E1B285 | 0 | 1 | т | ᲅ |
            | 1C84 | 1C85 | E1B284 | E1B285 | 0 | 1 | ᲄ | ᲅ |
            | A64A | 1C88 | EA998A | E1B288 | 0 | 1 | Ꙋ | ᲈ |
            | A64B | 1C88 | EA998B | E1B288 | 0 | 1 | ꙋ | ᲈ |
            | 1C88 | A64A | E1B288 | EA998A | 0 | 1 | ᲈ | Ꙋ |
            | 1C88 | A64B | E1B288 | EA998B | 0 | 1 | ᲈ | ꙋ |
            | 1C86 | 042A | E1B286 | D0AA | 0 | 1 | ᲆ | Ъ |
            | 1C86 | 044A | E1B286 | D18A | 0 | 1 | ᲆ | ъ |
            | 042A | 1C86 | D0AA | E1B286 | 0 | 1 | Ъ | ᲆ |
            | 044A | 1C86 | D18A | E1B286 | 0 | 1 | ъ | ᲆ |
            | 1C87 | 0462 | E1B287 | D1A2 | 0 | 1 | ᲇ | Ѣ |
            | 1C87 | 0463 | E1B287 | D1A3 | 0 | 1 | ᲇ | ѣ |
            | 0462 | 1C87 | D1A2 | E1B287 | 0 | 1 | Ѣ | ᲇ |
            | 0463 | 1C87 | D1A3 | E1B287 | 0 | 1 | ѣ | ᲇ |
            +-----------+-----------+--------+--------+----------+----------+------+------+
            {noformat}


            Note, toupper comparison considers more pairs as equal than tolower comparison:
            {code:sql}
            SELECT SUM(eq_lower), SUM(eq_upper) FROM t21;
            {code}
            {noformat}
            +---------------+---------------+
            | SUM(eq_lower) | SUM(eq_upper) |
            +---------------+---------------+
            | 21 | 96 |
            +---------------+---------------+
            {noformat}

            h2. Cons of toupper vs tolower comparison
            - The tolower variant will compare exactly like strcasecmp() did
            - The toupper variant will compare close to how utf8mb3_general_ci works
            There are two parallel comparison systems in MariaDB collation library, implemented as virtual functions in MY_COLLATION_HANDLER:

            - Comparison according to the collation, provided by these functions
            {code:cpp}
              int (*strnncoll)(CHARSET_INFO *,
                                   const uchar *, size_t, const uchar *, size_t, my_bool);
              int (*strnncollsp)(CHARSET_INFO *,
                                     const uchar *, size_t, const uchar *, size_t);

              int (*strnncollsp_nchars)(CHARSET_INFO *,
                                            const uchar *str1, size_t len1,
                                            const uchar *str2, size_t len2,
                                            size_t nchars,
                                            uint flags);
            {code}


            - Comparison in case insensitive (but accent sensitive) style, implemented by this function:
            {code:cpp}
              int (*strcasecmp)(CHARSET_INFO *, const char *, const char *);
            {code}
            Note, accent and case sensitivity of the collation does not matter. strcasecmp() always works using accent sensitive case insensitive comparison style.

            These two parallel systems are redundant.

            Note, strcasecmp() is used mostly to compare identifiers, while the functions of the first group are used to compare data.

            Let's get rid of the second comparison system:
            1. Remove MY_COLLATION_HANDLER::strcasecmp()
            2. Introduce a new collation utf8mb4_tolower_as_ci. Note, it should work for the entire Unicode range U+0000..U+10FFFF.
            3. Replace all calls for:
            {code:cpp}
            system_charset_info->coll->strcasecmp()
            {code}
            to calls for
            {code:cpp}
            my_charset_utf8mb4_tolower_as_ci->coll->strnncoll***()
            {code}

            The change would generally be quite mechanic. However, there is one small problem: strcasecmp() accepts 0-terminated strings, while the strnncoll-alike functions accept the pointer and the length. So some refactoring will be needed. Note, Monty earlier changed many MariaDB C data types to use LEX_CSTRING (instead of just a const char pointer) to store names. So this part of the current task will switch some more C data types to LEX_CSTRING.

            h2. tolower vs toupper comparison

            Another option is to implement utf8mb3_toupper_ci which will compare upper cases (instead of lower cases).


            The difference is in a few dozen BMP characters. This script finds all those characters:
            {code:sql}
            CREATE OR REPLACE TABLE t1 (a CHAR(1) CHARACTER SET utf8mb4 COLLATE utf8mb4_uca1400_ai_ci) ENGINE=MyISAM;
            DELIMITER $$
            FOR i IN 0..0xFFFF
            DO
              INSERT INTO t1 VALUES (CHAR(i USING ucs2));
            END FOR;
            $$
            DELIMITER ;
            ALTER TABLE t1
              ADD has_casefolding INT DEFAULT (BINARY LOWER(a)<>a OR BINARY UPPER(a)<>a),
              ADD KEY(has_casefolding, a);

            CREATE OR REPLACE TABLE t21 AS
            SELECT
              HEX(t1.a) AS hex_a, HEX(t2.a) AS hex_b,
              BINARY LOWER(t1.a)=LOWER(t2.a) eq_lower,
              BINARY UPPER(t1.a)=UPPER(t2.a) AS eq_upper,
              t1.a AS a, t2.a AS b
            FROM
              t1 t1, t1 t2
            WHERE
              t1.has_casefolding=1
            AND (BINARY LOWER(t1.a)=LOWER(t2.a))<>(BINARY UPPER(t1.a)=UPPER(t2.a));

            SELECT
              HEX(CONVERT(a USING ucs2)) AS unicode_a,
              HEX(CONVERT(b USING ucs2)) AS unicode_b,
              t21.* FROM t21;
            {code}
            {noformat}
            +-----------+-----------+--------+--------+----------+----------+------+------+
            | unicode_a | unicode_b | hex_a | hex_b | eq_lower | eq_upper | a | b |
            +-----------+-----------+--------+--------+----------+----------+------+------+
            | 1E9E | 00DF | E1BA9E | C39F | 1 | 0 | ẞ | ß |
            | 0399 | 0345 | CE99 | CD85 | 0 | 1 | Ι | ͅ |
            | 03B9 | 0345 | CEB9 | CD85 | 0 | 1 | ι | ͅ |
            | 1FBE | 0345 | E1BEBE | CD85 | 0 | 1 | ι | ͅ |
            | 212B | 00C5 | E284AB | C385 | 1 | 0 | Å | Å |
            | 212B | 00E5 | E284AB | C3A5 | 1 | 0 | Å | å |
            | 00C5 | 212B | C385 | E284AB | 1 | 0 | Å | Å |
            | 00E5 | 212B | C3A5 | E284AB | 1 | 0 | å | Å |
            | 0130 | 0049 | C4B0 | 49 | 1 | 0 | İ | I |
            | 0131 | 0049 | C4B1 | 49 | 0 | 1 | ı | I |
            | 0130 | 0069 | C4B0 | 69 | 1 | 0 | İ | i |
            | 0131 | 0069 | C4B1 | 69 | 0 | 1 | ı | i |
            | 0049 | 0130 | 49 | C4B0 | 1 | 0 | I | İ |
            | 0069 | 0130 | 69 | C4B0 | 1 | 0 | i | İ |
            | 0049 | 0131 | 49 | C4B1 | 0 | 1 | I | ı |
            | 0069 | 0131 | 69 | C4B1 | 0 | 1 | i | ı |
            | 212A | 004B | E284AA | 4B | 1 | 0 | K | K |
            | 212A | 006B | E284AA | 6B | 1 | 0 | K | k |
            | 004B | 212A | 4B | E284AA | 1 | 0 | K | K |
            | 006B | 212A | 6B | E284AA | 1 | 0 | k | K |
            | 017F | 0053 | C5BF | 53 | 0 | 1 | ſ | S |
            | 017F | 0073 | C5BF | 73 | 0 | 1 | ſ | s |
            | 0053 | 017F | 53 | C5BF | 0 | 1 | S | ſ |
            | 0073 | 017F | 73 | C5BF | 0 | 1 | s | ſ |
            | 1E9B | 1E60 | E1BA9B | E1B9A0 | 0 | 1 | ẛ | Ṡ |
            | 1E9B | 1E61 | E1BA9B | E1B9A1 | 0 | 1 | ẛ | ṡ |
            | 1E60 | 1E9B | E1B9A0 | E1BA9B | 0 | 1 | Ṡ | ẛ |
            | 1E61 | 1E9B | E1B9A1 | E1BA9B | 0 | 1 | ṡ | ẛ |
            | 03D0 | 0392 | CF90 | CE92 | 0 | 1 | ϐ | Β |
            | 03D0 | 03B2 | CF90 | CEB2 | 0 | 1 | ϐ | β |
            | 0392 | 03D0 | CE92 | CF90 | 0 | 1 | Β | ϐ |
            | 03B2 | 03D0 | CEB2 | CF90 | 0 | 1 | β | ϐ |
            | 03F5 | 0395 | CFB5 | CE95 | 0 | 1 | ϵ | Ε |
            | 03F5 | 03B5 | CFB5 | CEB5 | 0 | 1 | ϵ | ε |
            | 0395 | 03F5 | CE95 | CFB5 | 0 | 1 | Ε | ϵ |
            | 03B5 | 03F5 | CEB5 | CFB5 | 0 | 1 | ε | ϵ |
            | 03D1 | 0398 | CF91 | CE98 | 0 | 1 | ϑ | Θ |
            | 03F4 | 0398 | CFB4 | CE98 | 1 | 0 | ϴ | Θ |
            | 03D1 | 03B8 | CF91 | CEB8 | 0 | 1 | ϑ | θ |
            | 03F4 | 03B8 | CFB4 | CEB8 | 1 | 0 | ϴ | θ |
            | 0398 | 03D1 | CE98 | CF91 | 0 | 1 | Θ | ϑ |
            | 03B8 | 03D1 | CEB8 | CF91 | 0 | 1 | θ | ϑ |
            | 0398 | 03F4 | CE98 | CFB4 | 1 | 0 | Θ | ϴ |
            | 03B8 | 03F4 | CEB8 | CFB4 | 1 | 0 | θ | ϴ |
            | 0345 | 0399 | CD85 | CE99 | 0 | 1 | ͅ | Ι |
            | 1FBE | 0399 | E1BEBE | CE99 | 0 | 1 | ι | Ι |
            | 0345 | 03B9 | CD85 | CEB9 | 0 | 1 | ͅ | ι |
            | 1FBE | 03B9 | E1BEBE | CEB9 | 0 | 1 | ι | ι |
            | 0345 | 1FBE | CD85 | E1BEBE | 0 | 1 | ͅ | ι |
            | 0399 | 1FBE | CE99 | E1BEBE | 0 | 1 | Ι | ι |
            | 03B9 | 1FBE | CEB9 | E1BEBE | 0 | 1 | ι | ι |
            | 03F0 | 039A | CFB0 | CE9A | 0 | 1 | ϰ | Κ |
            | 03F0 | 03BA | CFB0 | CEBA | 0 | 1 | ϰ | κ |
            | 039A | 03F0 | CE9A | CFB0 | 0 | 1 | Κ | ϰ |
            | 03BA | 03F0 | CEBA | CFB0 | 0 | 1 | κ | ϰ |
            | 039C | 00B5 | CE9C | C2B5 | 0 | 1 | Μ | µ |
            | 03BC | 00B5 | CEBC | C2B5 | 0 | 1 | μ | µ |
            | 00B5 | 039C | C2B5 | CE9C | 0 | 1 | µ | Μ |
            | 00B5 | 03BC | C2B5 | CEBC | 0 | 1 | µ | μ |
            | 03D6 | 03A0 | CF96 | CEA0 | 0 | 1 | ϖ | Π |
            | 03D6 | 03C0 | CF96 | CF80 | 0 | 1 | ϖ | π |
            | 03A0 | 03D6 | CEA0 | CF96 | 0 | 1 | Π | ϖ |
            | 03C0 | 03D6 | CF80 | CF96 | 0 | 1 | π | ϖ |
            | 03F1 | 03A1 | CFB1 | CEA1 | 0 | 1 | ϱ | Ρ |
            | 03F1 | 03C1 | CFB1 | CF81 | 0 | 1 | ϱ | ρ |
            | 03A1 | 03F1 | CEA1 | CFB1 | 0 | 1 | Ρ | ϱ |
            | 03C1 | 03F1 | CF81 | CFB1 | 0 | 1 | ρ | ϱ |
            | 03C2 | 03A3 | CF82 | CEA3 | 0 | 1 | ς | Σ |
            | 03A3 | 03C2 | CEA3 | CF82 | 0 | 1 | Σ | ς |
            | 03C3 | 03C2 | CF83 | CF82 | 0 | 1 | σ | ς |
            | 03C2 | 03C3 | CF82 | CF83 | 0 | 1 | ς | σ |
            | 03D5 | 03A6 | CF95 | CEA6 | 0 | 1 | ϕ | Φ |
            | 03D5 | 03C6 | CF95 | CF86 | 0 | 1 | ϕ | φ |
            | 03A6 | 03D5 | CEA6 | CF95 | 0 | 1 | Φ | ϕ |
            | 03C6 | 03D5 | CF86 | CF95 | 0 | 1 | φ | ϕ |
            | 2126 | 03A9 | E284A6 | CEA9 | 1 | 0 | Ω | Ω |
            | 2126 | 03C9 | E284A6 | CF89 | 1 | 0 | Ω | ω |
            | 03A9 | 2126 | CEA9 | E284A6 | 1 | 0 | Ω | Ω |
            | 03C9 | 2126 | CF89 | E284A6 | 1 | 0 | ω | Ω |
            | 1C80 | 0412 | E1B280 | D092 | 0 | 1 | ᲀ | В |
            | 1C80 | 0432 | E1B280 | D0B2 | 0 | 1 | ᲀ | в |
            | 0412 | 1C80 | D092 | E1B280 | 0 | 1 | В | ᲀ |
            | 0432 | 1C80 | D0B2 | E1B280 | 0 | 1 | в | ᲀ |
            | 1C81 | 0414 | E1B281 | D094 | 0 | 1 | ᲁ | Д |
            | 1C81 | 0434 | E1B281 | D0B4 | 0 | 1 | ᲁ | д |
            | 0414 | 1C81 | D094 | E1B281 | 0 | 1 | Д | ᲁ |
            | 0434 | 1C81 | D0B4 | E1B281 | 0 | 1 | д | ᲁ |
            | 1C82 | 041E | E1B282 | D09E | 0 | 1 | ᲂ | О |
            | 1C82 | 043E | E1B282 | D0BE | 0 | 1 | ᲂ | о |
            | 041E | 1C82 | D09E | E1B282 | 0 | 1 | О | ᲂ |
            | 043E | 1C82 | D0BE | E1B282 | 0 | 1 | о | ᲂ |
            | 1C83 | 0421 | E1B283 | D0A1 | 0 | 1 | ᲃ | С |
            | 1C83 | 0441 | E1B283 | D181 | 0 | 1 | ᲃ | с |
            | 0421 | 1C83 | D0A1 | E1B283 | 0 | 1 | С | ᲃ |
            | 0441 | 1C83 | D181 | E1B283 | 0 | 1 | с | ᲃ |
            | 1C84 | 0422 | E1B284 | D0A2 | 0 | 1 | ᲄ | Т |
            | 1C85 | 0422 | E1B285 | D0A2 | 0 | 1 | ᲅ | Т |
            | 1C84 | 0442 | E1B284 | D182 | 0 | 1 | ᲄ | т |
            | 1C85 | 0442 | E1B285 | D182 | 0 | 1 | ᲅ | т |
            | 0422 | 1C84 | D0A2 | E1B284 | 0 | 1 | Т | ᲄ |
            | 0442 | 1C84 | D182 | E1B284 | 0 | 1 | т | ᲄ |
            | 1C85 | 1C84 | E1B285 | E1B284 | 0 | 1 | ᲅ | ᲄ |
            | 0422 | 1C85 | D0A2 | E1B285 | 0 | 1 | Т | ᲅ |
            | 0442 | 1C85 | D182 | E1B285 | 0 | 1 | т | ᲅ |
            | 1C84 | 1C85 | E1B284 | E1B285 | 0 | 1 | ᲄ | ᲅ |
            | A64A | 1C88 | EA998A | E1B288 | 0 | 1 | Ꙋ | ᲈ |
            | A64B | 1C88 | EA998B | E1B288 | 0 | 1 | ꙋ | ᲈ |
            | 1C88 | A64A | E1B288 | EA998A | 0 | 1 | ᲈ | Ꙋ |
            | 1C88 | A64B | E1B288 | EA998B | 0 | 1 | ᲈ | ꙋ |
            | 1C86 | 042A | E1B286 | D0AA | 0 | 1 | ᲆ | Ъ |
            | 1C86 | 044A | E1B286 | D18A | 0 | 1 | ᲆ | ъ |
            | 042A | 1C86 | D0AA | E1B286 | 0 | 1 | Ъ | ᲆ |
            | 044A | 1C86 | D18A | E1B286 | 0 | 1 | ъ | ᲆ |
            | 1C87 | 0462 | E1B287 | D1A2 | 0 | 1 | ᲇ | Ѣ |
            | 1C87 | 0463 | E1B287 | D1A3 | 0 | 1 | ᲇ | ѣ |
            | 0462 | 1C87 | D1A2 | E1B287 | 0 | 1 | Ѣ | ᲇ |
            | 0463 | 1C87 | D1A3 | E1B287 | 0 | 1 | ѣ | ᲇ |
            +-----------+-----------+--------+--------+----------+----------+------+------+
            {noformat}


            Note, toupper comparison considers more pairs as equal than tolower comparison:
            {code:sql}
            SELECT SUM(eq_lower), SUM(eq_upper) FROM t21;
            {code}
            {noformat}
            +---------------+---------------+
            | SUM(eq_lower) | SUM(eq_upper) |
            +---------------+---------------+
            | 21 | 96 |
            +---------------+---------------+
            {noformat}

            h2. A more compact table with distinct pairs (the character with a smaller code point is on the left)
            {code:sql}
            SELECT
              HEX(CONVERT(a USING ucs2)) AS unicode_a,
              HEX(CONVERT(b USING ucs2)) AS unicode_b,
              HEX(a), HEX(b),
              BINARY LOWER(a)=LOWER(b) AS eq_lower,
              BINARY UPPER(a)=UPPER(b) AS eq_upper,
              a,b
            FROM
            (
              SELECT DISTINCT IF(BINARY a<b,a,b) AS a,IF(binary a>=b,a,b) AS b from t21
            ) d1
            ORDER BY eq_lower, a, b;
            {code}
            {noformat}
            +-----------+-----------+--------+--------+----------+----------+-----+-----+
            | unicode_a | unicode_b | HEX(a) | HEX(b) | eq_lower | eq_upper | a | b |
            +-----------+-----------+--------+--------+----------+----------+-----+-----+
            | 0345 | 0399 | CD85 | CE99 | 0 | 1 | ͅ | Ι |
            | 0049 | 0131 | 49 | C4B1 | 0 | 1 | I | ı |
            | 0053 | 017F | 53 | C5BF | 0 | 1 | S | ſ |
            | 0392 | 03D0 | CE92 | CF90 | 0 | 1 | Β | ϐ |
            | 0395 | 03F5 | CE95 | CFB5 | 0 | 1 | Ε | ϵ |
            | 0398 | 03D1 | CE98 | CF91 | 0 | 1 | Θ | ϑ |
            | 0399 | 1FBE | CE99 | E1BEBE | 0 | 1 | Ι | ι |
            | 039A | 03F0 | CE9A | CFB0 | 0 | 1 | Κ | ϰ |
            | 00B5 | 039C | C2B5 | CE9C | 0 | 1 | µ | Μ |
            | 03A0 | 03D6 | CEA0 | CF96 | 0 | 1 | Π | ϖ |
            | 03A1 | 03F1 | CEA1 | CFB1 | 0 | 1 | Ρ | ϱ |
            | 03A3 | 03C2 | CEA3 | CF82 | 0 | 1 | Σ | ς |
            | 03A6 | 03D5 | CEA6 | CF95 | 0 | 1 | Φ | ϕ |
            | 0412 | 1C80 | D092 | E1B280 | 0 | 1 | В | ᲀ |
            | 0414 | 1C81 | D094 | E1B281 | 0 | 1 | Д | ᲁ |
            | 041E | 1C82 | D09E | E1B282 | 0 | 1 | О | ᲂ |
            | 0421 | 1C83 | D0A1 | E1B283 | 0 | 1 | С | ᲃ |
            | 0422 | 1C84 | D0A2 | E1B284 | 0 | 1 | Т | ᲄ |
            | 1C88 | A64A | E1B288 | EA998A | 0 | 1 | ᲈ | Ꙋ |
            | 042A | 1C86 | D0AA | E1B286 | 0 | 1 | Ъ | ᲆ |
            | 0462 | 1C87 | D1A2 | E1B287 | 0 | 1 | Ѣ | ᲇ |
            | 00C5 | 212B | C385 | E284AB | 1 | 0 | Å | Å |
            | 0049 | 0130 | 49 | C4B0 | 1 | 0 | I | İ |
            | 004B | 212A | 4B | E284AA | 1 | 0 | K | K |
            | 00DF | 1E9E | C39F | E1BA9E | 1 | 0 | ß | ẞ |
            | 03A9 | 2126 | CEA9 | E284A6 | 1 | 0 | Ω | Ω |
            +-----------+-----------+--------+--------+----------+----------+-----+-----+
            26 rows in set (0.005 sec)
            {noformat}

            h2. Cons of toupper vs tolower comparison
            - The tolower variant will compare exactly like strcasecmp() did
            - The toupper variant will compare close to how utf8mb3_general_ci works
            bar Alexander Barkov made changes -
            Description There are two parallel comparison systems in MariaDB collation library, implemented as virtual functions in MY_COLLATION_HANDLER:

            - Comparison according to the collation, provided by these functions
            {code:cpp}
              int (*strnncoll)(CHARSET_INFO *,
                                   const uchar *, size_t, const uchar *, size_t, my_bool);
              int (*strnncollsp)(CHARSET_INFO *,
                                     const uchar *, size_t, const uchar *, size_t);

              int (*strnncollsp_nchars)(CHARSET_INFO *,
                                            const uchar *str1, size_t len1,
                                            const uchar *str2, size_t len2,
                                            size_t nchars,
                                            uint flags);
            {code}


            - Comparison in case insensitive (but accent sensitive) style, implemented by this function:
            {code:cpp}
              int (*strcasecmp)(CHARSET_INFO *, const char *, const char *);
            {code}
            Note, accent and case sensitivity of the collation does not matter. strcasecmp() always works using accent sensitive case insensitive comparison style.

            These two parallel systems are redundant.

            Note, strcasecmp() is used mostly to compare identifiers, while the functions of the first group are used to compare data.

            Let's get rid of the second comparison system:
            1. Remove MY_COLLATION_HANDLER::strcasecmp()
            2. Introduce a new collation utf8mb4_tolower_as_ci. Note, it should work for the entire Unicode range U+0000..U+10FFFF.
            3. Replace all calls for:
            {code:cpp}
            system_charset_info->coll->strcasecmp()
            {code}
            to calls for
            {code:cpp}
            my_charset_utf8mb4_tolower_as_ci->coll->strnncoll***()
            {code}

            The change would generally be quite mechanic. However, there is one small problem: strcasecmp() accepts 0-terminated strings, while the strnncoll-alike functions accept the pointer and the length. So some refactoring will be needed. Note, Monty earlier changed many MariaDB C data types to use LEX_CSTRING (instead of just a const char pointer) to store names. So this part of the current task will switch some more C data types to LEX_CSTRING.

            h2. tolower vs toupper comparison

            Another option is to implement utf8mb3_toupper_ci which will compare upper cases (instead of lower cases).


            The difference is in a few dozen BMP characters. This script finds all those characters:
            {code:sql}
            CREATE OR REPLACE TABLE t1 (a CHAR(1) CHARACTER SET utf8mb4 COLLATE utf8mb4_uca1400_ai_ci) ENGINE=MyISAM;
            DELIMITER $$
            FOR i IN 0..0xFFFF
            DO
              INSERT INTO t1 VALUES (CHAR(i USING ucs2));
            END FOR;
            $$
            DELIMITER ;
            ALTER TABLE t1
              ADD has_casefolding INT DEFAULT (BINARY LOWER(a)<>a OR BINARY UPPER(a)<>a),
              ADD KEY(has_casefolding, a);

            CREATE OR REPLACE TABLE t21 AS
            SELECT
              HEX(t1.a) AS hex_a, HEX(t2.a) AS hex_b,
              BINARY LOWER(t1.a)=LOWER(t2.a) eq_lower,
              BINARY UPPER(t1.a)=UPPER(t2.a) AS eq_upper,
              t1.a AS a, t2.a AS b
            FROM
              t1 t1, t1 t2
            WHERE
              t1.has_casefolding=1
            AND (BINARY LOWER(t1.a)=LOWER(t2.a))<>(BINARY UPPER(t1.a)=UPPER(t2.a));

            SELECT
              HEX(CONVERT(a USING ucs2)) AS unicode_a,
              HEX(CONVERT(b USING ucs2)) AS unicode_b,
              t21.* FROM t21;
            {code}
            {noformat}
            +-----------+-----------+--------+--------+----------+----------+------+------+
            | unicode_a | unicode_b | hex_a | hex_b | eq_lower | eq_upper | a | b |
            +-----------+-----------+--------+--------+----------+----------+------+------+
            | 1E9E | 00DF | E1BA9E | C39F | 1 | 0 | ẞ | ß |
            | 0399 | 0345 | CE99 | CD85 | 0 | 1 | Ι | ͅ |
            | 03B9 | 0345 | CEB9 | CD85 | 0 | 1 | ι | ͅ |
            | 1FBE | 0345 | E1BEBE | CD85 | 0 | 1 | ι | ͅ |
            | 212B | 00C5 | E284AB | C385 | 1 | 0 | Å | Å |
            | 212B | 00E5 | E284AB | C3A5 | 1 | 0 | Å | å |
            | 00C5 | 212B | C385 | E284AB | 1 | 0 | Å | Å |
            | 00E5 | 212B | C3A5 | E284AB | 1 | 0 | å | Å |
            | 0130 | 0049 | C4B0 | 49 | 1 | 0 | İ | I |
            | 0131 | 0049 | C4B1 | 49 | 0 | 1 | ı | I |
            | 0130 | 0069 | C4B0 | 69 | 1 | 0 | İ | i |
            | 0131 | 0069 | C4B1 | 69 | 0 | 1 | ı | i |
            | 0049 | 0130 | 49 | C4B0 | 1 | 0 | I | İ |
            | 0069 | 0130 | 69 | C4B0 | 1 | 0 | i | İ |
            | 0049 | 0131 | 49 | C4B1 | 0 | 1 | I | ı |
            | 0069 | 0131 | 69 | C4B1 | 0 | 1 | i | ı |
            | 212A | 004B | E284AA | 4B | 1 | 0 | K | K |
            | 212A | 006B | E284AA | 6B | 1 | 0 | K | k |
            | 004B | 212A | 4B | E284AA | 1 | 0 | K | K |
            | 006B | 212A | 6B | E284AA | 1 | 0 | k | K |
            | 017F | 0053 | C5BF | 53 | 0 | 1 | ſ | S |
            | 017F | 0073 | C5BF | 73 | 0 | 1 | ſ | s |
            | 0053 | 017F | 53 | C5BF | 0 | 1 | S | ſ |
            | 0073 | 017F | 73 | C5BF | 0 | 1 | s | ſ |
            | 1E9B | 1E60 | E1BA9B | E1B9A0 | 0 | 1 | ẛ | Ṡ |
            | 1E9B | 1E61 | E1BA9B | E1B9A1 | 0 | 1 | ẛ | ṡ |
            | 1E60 | 1E9B | E1B9A0 | E1BA9B | 0 | 1 | Ṡ | ẛ |
            | 1E61 | 1E9B | E1B9A1 | E1BA9B | 0 | 1 | ṡ | ẛ |
            | 03D0 | 0392 | CF90 | CE92 | 0 | 1 | ϐ | Β |
            | 03D0 | 03B2 | CF90 | CEB2 | 0 | 1 | ϐ | β |
            | 0392 | 03D0 | CE92 | CF90 | 0 | 1 | Β | ϐ |
            | 03B2 | 03D0 | CEB2 | CF90 | 0 | 1 | β | ϐ |
            | 03F5 | 0395 | CFB5 | CE95 | 0 | 1 | ϵ | Ε |
            | 03F5 | 03B5 | CFB5 | CEB5 | 0 | 1 | ϵ | ε |
            | 0395 | 03F5 | CE95 | CFB5 | 0 | 1 | Ε | ϵ |
            | 03B5 | 03F5 | CEB5 | CFB5 | 0 | 1 | ε | ϵ |
            | 03D1 | 0398 | CF91 | CE98 | 0 | 1 | ϑ | Θ |
            | 03F4 | 0398 | CFB4 | CE98 | 1 | 0 | ϴ | Θ |
            | 03D1 | 03B8 | CF91 | CEB8 | 0 | 1 | ϑ | θ |
            | 03F4 | 03B8 | CFB4 | CEB8 | 1 | 0 | ϴ | θ |
            | 0398 | 03D1 | CE98 | CF91 | 0 | 1 | Θ | ϑ |
            | 03B8 | 03D1 | CEB8 | CF91 | 0 | 1 | θ | ϑ |
            | 0398 | 03F4 | CE98 | CFB4 | 1 | 0 | Θ | ϴ |
            | 03B8 | 03F4 | CEB8 | CFB4 | 1 | 0 | θ | ϴ |
            | 0345 | 0399 | CD85 | CE99 | 0 | 1 | ͅ | Ι |
            | 1FBE | 0399 | E1BEBE | CE99 | 0 | 1 | ι | Ι |
            | 0345 | 03B9 | CD85 | CEB9 | 0 | 1 | ͅ | ι |
            | 1FBE | 03B9 | E1BEBE | CEB9 | 0 | 1 | ι | ι |
            | 0345 | 1FBE | CD85 | E1BEBE | 0 | 1 | ͅ | ι |
            | 0399 | 1FBE | CE99 | E1BEBE | 0 | 1 | Ι | ι |
            | 03B9 | 1FBE | CEB9 | E1BEBE | 0 | 1 | ι | ι |
            | 03F0 | 039A | CFB0 | CE9A | 0 | 1 | ϰ | Κ |
            | 03F0 | 03BA | CFB0 | CEBA | 0 | 1 | ϰ | κ |
            | 039A | 03F0 | CE9A | CFB0 | 0 | 1 | Κ | ϰ |
            | 03BA | 03F0 | CEBA | CFB0 | 0 | 1 | κ | ϰ |
            | 039C | 00B5 | CE9C | C2B5 | 0 | 1 | Μ | µ |
            | 03BC | 00B5 | CEBC | C2B5 | 0 | 1 | μ | µ |
            | 00B5 | 039C | C2B5 | CE9C | 0 | 1 | µ | Μ |
            | 00B5 | 03BC | C2B5 | CEBC | 0 | 1 | µ | μ |
            | 03D6 | 03A0 | CF96 | CEA0 | 0 | 1 | ϖ | Π |
            | 03D6 | 03C0 | CF96 | CF80 | 0 | 1 | ϖ | π |
            | 03A0 | 03D6 | CEA0 | CF96 | 0 | 1 | Π | ϖ |
            | 03C0 | 03D6 | CF80 | CF96 | 0 | 1 | π | ϖ |
            | 03F1 | 03A1 | CFB1 | CEA1 | 0 | 1 | ϱ | Ρ |
            | 03F1 | 03C1 | CFB1 | CF81 | 0 | 1 | ϱ | ρ |
            | 03A1 | 03F1 | CEA1 | CFB1 | 0 | 1 | Ρ | ϱ |
            | 03C1 | 03F1 | CF81 | CFB1 | 0 | 1 | ρ | ϱ |
            | 03C2 | 03A3 | CF82 | CEA3 | 0 | 1 | ς | Σ |
            | 03A3 | 03C2 | CEA3 | CF82 | 0 | 1 | Σ | ς |
            | 03C3 | 03C2 | CF83 | CF82 | 0 | 1 | σ | ς |
            | 03C2 | 03C3 | CF82 | CF83 | 0 | 1 | ς | σ |
            | 03D5 | 03A6 | CF95 | CEA6 | 0 | 1 | ϕ | Φ |
            | 03D5 | 03C6 | CF95 | CF86 | 0 | 1 | ϕ | φ |
            | 03A6 | 03D5 | CEA6 | CF95 | 0 | 1 | Φ | ϕ |
            | 03C6 | 03D5 | CF86 | CF95 | 0 | 1 | φ | ϕ |
            | 2126 | 03A9 | E284A6 | CEA9 | 1 | 0 | Ω | Ω |
            | 2126 | 03C9 | E284A6 | CF89 | 1 | 0 | Ω | ω |
            | 03A9 | 2126 | CEA9 | E284A6 | 1 | 0 | Ω | Ω |
            | 03C9 | 2126 | CF89 | E284A6 | 1 | 0 | ω | Ω |
            | 1C80 | 0412 | E1B280 | D092 | 0 | 1 | ᲀ | В |
            | 1C80 | 0432 | E1B280 | D0B2 | 0 | 1 | ᲀ | в |
            | 0412 | 1C80 | D092 | E1B280 | 0 | 1 | В | ᲀ |
            | 0432 | 1C80 | D0B2 | E1B280 | 0 | 1 | в | ᲀ |
            | 1C81 | 0414 | E1B281 | D094 | 0 | 1 | ᲁ | Д |
            | 1C81 | 0434 | E1B281 | D0B4 | 0 | 1 | ᲁ | д |
            | 0414 | 1C81 | D094 | E1B281 | 0 | 1 | Д | ᲁ |
            | 0434 | 1C81 | D0B4 | E1B281 | 0 | 1 | д | ᲁ |
            | 1C82 | 041E | E1B282 | D09E | 0 | 1 | ᲂ | О |
            | 1C82 | 043E | E1B282 | D0BE | 0 | 1 | ᲂ | о |
            | 041E | 1C82 | D09E | E1B282 | 0 | 1 | О | ᲂ |
            | 043E | 1C82 | D0BE | E1B282 | 0 | 1 | о | ᲂ |
            | 1C83 | 0421 | E1B283 | D0A1 | 0 | 1 | ᲃ | С |
            | 1C83 | 0441 | E1B283 | D181 | 0 | 1 | ᲃ | с |
            | 0421 | 1C83 | D0A1 | E1B283 | 0 | 1 | С | ᲃ |
            | 0441 | 1C83 | D181 | E1B283 | 0 | 1 | с | ᲃ |
            | 1C84 | 0422 | E1B284 | D0A2 | 0 | 1 | ᲄ | Т |
            | 1C85 | 0422 | E1B285 | D0A2 | 0 | 1 | ᲅ | Т |
            | 1C84 | 0442 | E1B284 | D182 | 0 | 1 | ᲄ | т |
            | 1C85 | 0442 | E1B285 | D182 | 0 | 1 | ᲅ | т |
            | 0422 | 1C84 | D0A2 | E1B284 | 0 | 1 | Т | ᲄ |
            | 0442 | 1C84 | D182 | E1B284 | 0 | 1 | т | ᲄ |
            | 1C85 | 1C84 | E1B285 | E1B284 | 0 | 1 | ᲅ | ᲄ |
            | 0422 | 1C85 | D0A2 | E1B285 | 0 | 1 | Т | ᲅ |
            | 0442 | 1C85 | D182 | E1B285 | 0 | 1 | т | ᲅ |
            | 1C84 | 1C85 | E1B284 | E1B285 | 0 | 1 | ᲄ | ᲅ |
            | A64A | 1C88 | EA998A | E1B288 | 0 | 1 | Ꙋ | ᲈ |
            | A64B | 1C88 | EA998B | E1B288 | 0 | 1 | ꙋ | ᲈ |
            | 1C88 | A64A | E1B288 | EA998A | 0 | 1 | ᲈ | Ꙋ |
            | 1C88 | A64B | E1B288 | EA998B | 0 | 1 | ᲈ | ꙋ |
            | 1C86 | 042A | E1B286 | D0AA | 0 | 1 | ᲆ | Ъ |
            | 1C86 | 044A | E1B286 | D18A | 0 | 1 | ᲆ | ъ |
            | 042A | 1C86 | D0AA | E1B286 | 0 | 1 | Ъ | ᲆ |
            | 044A | 1C86 | D18A | E1B286 | 0 | 1 | ъ | ᲆ |
            | 1C87 | 0462 | E1B287 | D1A2 | 0 | 1 | ᲇ | Ѣ |
            | 1C87 | 0463 | E1B287 | D1A3 | 0 | 1 | ᲇ | ѣ |
            | 0462 | 1C87 | D1A2 | E1B287 | 0 | 1 | Ѣ | ᲇ |
            | 0463 | 1C87 | D1A3 | E1B287 | 0 | 1 | ѣ | ᲇ |
            +-----------+-----------+--------+--------+----------+----------+------+------+
            {noformat}


            Note, toupper comparison considers more pairs as equal than tolower comparison:
            {code:sql}
            SELECT SUM(eq_lower), SUM(eq_upper) FROM t21;
            {code}
            {noformat}
            +---------------+---------------+
            | SUM(eq_lower) | SUM(eq_upper) |
            +---------------+---------------+
            | 21 | 96 |
            +---------------+---------------+
            {noformat}

            h2. A more compact table with distinct pairs (the character with a smaller code point is on the left)
            {code:sql}
            SELECT
              HEX(CONVERT(a USING ucs2)) AS unicode_a,
              HEX(CONVERT(b USING ucs2)) AS unicode_b,
              HEX(a), HEX(b),
              BINARY LOWER(a)=LOWER(b) AS eq_lower,
              BINARY UPPER(a)=UPPER(b) AS eq_upper,
              a,b
            FROM
            (
              SELECT DISTINCT IF(BINARY a<b,a,b) AS a,IF(binary a>=b,a,b) AS b from t21
            ) d1
            ORDER BY eq_lower, a, b;
            {code}
            {noformat}
            +-----------+-----------+--------+--------+----------+----------+-----+-----+
            | unicode_a | unicode_b | HEX(a) | HEX(b) | eq_lower | eq_upper | a | b |
            +-----------+-----------+--------+--------+----------+----------+-----+-----+
            | 0345 | 0399 | CD85 | CE99 | 0 | 1 | ͅ | Ι |
            | 0049 | 0131 | 49 | C4B1 | 0 | 1 | I | ı |
            | 0053 | 017F | 53 | C5BF | 0 | 1 | S | ſ |
            | 0392 | 03D0 | CE92 | CF90 | 0 | 1 | Β | ϐ |
            | 0395 | 03F5 | CE95 | CFB5 | 0 | 1 | Ε | ϵ |
            | 0398 | 03D1 | CE98 | CF91 | 0 | 1 | Θ | ϑ |
            | 0399 | 1FBE | CE99 | E1BEBE | 0 | 1 | Ι | ι |
            | 039A | 03F0 | CE9A | CFB0 | 0 | 1 | Κ | ϰ |
            | 00B5 | 039C | C2B5 | CE9C | 0 | 1 | µ | Μ |
            | 03A0 | 03D6 | CEA0 | CF96 | 0 | 1 | Π | ϖ |
            | 03A1 | 03F1 | CEA1 | CFB1 | 0 | 1 | Ρ | ϱ |
            | 03A3 | 03C2 | CEA3 | CF82 | 0 | 1 | Σ | ς |
            | 03A6 | 03D5 | CEA6 | CF95 | 0 | 1 | Φ | ϕ |
            | 0412 | 1C80 | D092 | E1B280 | 0 | 1 | В | ᲀ |
            | 0414 | 1C81 | D094 | E1B281 | 0 | 1 | Д | ᲁ |
            | 041E | 1C82 | D09E | E1B282 | 0 | 1 | О | ᲂ |
            | 0421 | 1C83 | D0A1 | E1B283 | 0 | 1 | С | ᲃ |
            | 0422 | 1C84 | D0A2 | E1B284 | 0 | 1 | Т | ᲄ |
            | 1C88 | A64A | E1B288 | EA998A | 0 | 1 | ᲈ | Ꙋ |
            | 042A | 1C86 | D0AA | E1B286 | 0 | 1 | Ъ | ᲆ |
            | 0462 | 1C87 | D1A2 | E1B287 | 0 | 1 | Ѣ | ᲇ |
            | 00C5 | 212B | C385 | E284AB | 1 | 0 | Å | Å |
            | 0049 | 0130 | 49 | C4B0 | 1 | 0 | I | İ |
            | 004B | 212A | 4B | E284AA | 1 | 0 | K | K |
            | 00DF | 1E9E | C39F | E1BA9E | 1 | 0 | ß | ẞ |
            | 03A9 | 2126 | CEA9 | E284A6 | 1 | 0 | Ω | Ω |
            +-----------+-----------+--------+--------+----------+----------+-----+-----+
            26 rows in set (0.005 sec)
            {noformat}

            h2. Cons of toupper vs tolower comparison
            - The tolower variant will compare exactly like strcasecmp() did
            - The toupper variant will compare close to how utf8mb3_general_ci works
            There are two parallel comparison systems in MariaDB collation library, implemented as virtual functions in MY_COLLATION_HANDLER:

            - Comparison according to the collation, provided by these functions
            {code:cpp}
              int (*strnncoll)(CHARSET_INFO *,
                                   const uchar *, size_t, const uchar *, size_t, my_bool);
              int (*strnncollsp)(CHARSET_INFO *,
                                     const uchar *, size_t, const uchar *, size_t);

              int (*strnncollsp_nchars)(CHARSET_INFO *,
                                            const uchar *str1, size_t len1,
                                            const uchar *str2, size_t len2,
                                            size_t nchars,
                                            uint flags);
            {code}


            - Comparison in case insensitive (but accent sensitive) style, implemented by this function:
            {code:cpp}
              int (*strcasecmp)(CHARSET_INFO *, const char *, const char *);
            {code}
            Note, accent and case sensitivity of the collation does not matter. strcasecmp() always works using accent sensitive case insensitive comparison style.

            These two parallel systems are redundant.

            Note, strcasecmp() is used mostly to compare identifiers, while the functions of the first group are used to compare data.

            Let's get rid of the second comparison system:
            1. Remove MY_COLLATION_HANDLER::strcasecmp()
            2. Introduce a new collation utf8mb4_tolower_as_ci. Note, it should work for the entire Unicode range U+0000..U+10FFFF.
            3. Replace all calls for:
            {code:cpp}
            system_charset_info->coll->strcasecmp()
            {code}
            to calls for
            {code:cpp}
            my_charset_utf8mb4_tolower_as_ci->coll->strnncoll***()
            {code}

            The change would generally be quite mechanic. However, there is one small problem: strcasecmp() accepts 0-terminated strings, while the strnncoll-alike functions accept the pointer and the length. So some refactoring will be needed. Note, Monty earlier changed many MariaDB C data types to use LEX_CSTRING (instead of just a const char pointer) to store names. So this part of the current task will switch some more C data types to LEX_CSTRING.

            h2. tolower vs toupper comparison

            Another option is to implement utf8mb3_toupper_ci which will compare upper cases (instead of lower cases).


            The difference is in a few dozen BMP characters. This script finds all those characters:
            {code:sql}
            CREATE OR REPLACE TABLE t1 (a CHAR(1) CHARACTER SET utf8mb4 COLLATE utf8mb4_uca1400_ai_ci) ENGINE=MyISAM;
            DELIMITER $$
            FOR i IN 0..0xFFFF
            DO
              INSERT INTO t1 VALUES (CHAR(i USING ucs2));
            END FOR;
            $$
            DELIMITER ;
            ALTER TABLE t1
              ADD has_casefolding INT DEFAULT (BINARY LOWER(a)<>a OR BINARY UPPER(a)<>a),
              ADD KEY(has_casefolding, a);

            CREATE OR REPLACE TABLE t21 AS
            SELECT
              HEX(t1.a) AS hex_a, HEX(t2.a) AS hex_b,
              BINARY LOWER(t1.a)=LOWER(t2.a) eq_lower,
              BINARY UPPER(t1.a)=UPPER(t2.a) AS eq_upper,
              t1.a AS a, t2.a AS b
            FROM
              t1 t1, t1 t2
            WHERE
              t1.has_casefolding=1
            AND (BINARY LOWER(t1.a)=LOWER(t2.a))<>(BINARY UPPER(t1.a)=UPPER(t2.a));

            SELECT
              HEX(CONVERT(a USING ucs2)) AS unicode_a,
              HEX(CONVERT(b USING ucs2)) AS unicode_b,
              t21.* FROM t21;
            {code}
            {noformat}
            +-----------+-----------+--------+--------+----------+----------+------+------+
            | unicode_a | unicode_b | hex_a | hex_b | eq_lower | eq_upper | a | b |
            +-----------+-----------+--------+--------+----------+----------+------+------+
            | 1E9E | 00DF | E1BA9E | C39F | 1 | 0 | ẞ | ß |
            | 0399 | 0345 | CE99 | CD85 | 0 | 1 | Ι | ͅ |
            | 03B9 | 0345 | CEB9 | CD85 | 0 | 1 | ι | ͅ |
            | 1FBE | 0345 | E1BEBE | CD85 | 0 | 1 | ι | ͅ |
            | 212B | 00C5 | E284AB | C385 | 1 | 0 | Å | Å |
            | 212B | 00E5 | E284AB | C3A5 | 1 | 0 | Å | å |
            | 00C5 | 212B | C385 | E284AB | 1 | 0 | Å | Å |
            | 00E5 | 212B | C3A5 | E284AB | 1 | 0 | å | Å |
            | 0130 | 0049 | C4B0 | 49 | 1 | 0 | İ | I |
            | 0131 | 0049 | C4B1 | 49 | 0 | 1 | ı | I |
            | 0130 | 0069 | C4B0 | 69 | 1 | 0 | İ | i |
            | 0131 | 0069 | C4B1 | 69 | 0 | 1 | ı | i |
            | 0049 | 0130 | 49 | C4B0 | 1 | 0 | I | İ |
            | 0069 | 0130 | 69 | C4B0 | 1 | 0 | i | İ |
            | 0049 | 0131 | 49 | C4B1 | 0 | 1 | I | ı |
            | 0069 | 0131 | 69 | C4B1 | 0 | 1 | i | ı |
            | 212A | 004B | E284AA | 4B | 1 | 0 | K | K |
            | 212A | 006B | E284AA | 6B | 1 | 0 | K | k |
            | 004B | 212A | 4B | E284AA | 1 | 0 | K | K |
            | 006B | 212A | 6B | E284AA | 1 | 0 | k | K |
            | 017F | 0053 | C5BF | 53 | 0 | 1 | ſ | S |
            | 017F | 0073 | C5BF | 73 | 0 | 1 | ſ | s |
            | 0053 | 017F | 53 | C5BF | 0 | 1 | S | ſ |
            | 0073 | 017F | 73 | C5BF | 0 | 1 | s | ſ |
            | 1E9B | 1E60 | E1BA9B | E1B9A0 | 0 | 1 | ẛ | Ṡ |
            | 1E9B | 1E61 | E1BA9B | E1B9A1 | 0 | 1 | ẛ | ṡ |
            | 1E60 | 1E9B | E1B9A0 | E1BA9B | 0 | 1 | Ṡ | ẛ |
            | 1E61 | 1E9B | E1B9A1 | E1BA9B | 0 | 1 | ṡ | ẛ |
            | 03D0 | 0392 | CF90 | CE92 | 0 | 1 | ϐ | Β |
            | 03D0 | 03B2 | CF90 | CEB2 | 0 | 1 | ϐ | β |
            | 0392 | 03D0 | CE92 | CF90 | 0 | 1 | Β | ϐ |
            | 03B2 | 03D0 | CEB2 | CF90 | 0 | 1 | β | ϐ |
            | 03F5 | 0395 | CFB5 | CE95 | 0 | 1 | ϵ | Ε |
            | 03F5 | 03B5 | CFB5 | CEB5 | 0 | 1 | ϵ | ε |
            | 0395 | 03F5 | CE95 | CFB5 | 0 | 1 | Ε | ϵ |
            | 03B5 | 03F5 | CEB5 | CFB5 | 0 | 1 | ε | ϵ |
            | 03D1 | 0398 | CF91 | CE98 | 0 | 1 | ϑ | Θ |
            | 03F4 | 0398 | CFB4 | CE98 | 1 | 0 | ϴ | Θ |
            | 03D1 | 03B8 | CF91 | CEB8 | 0 | 1 | ϑ | θ |
            | 03F4 | 03B8 | CFB4 | CEB8 | 1 | 0 | ϴ | θ |
            | 0398 | 03D1 | CE98 | CF91 | 0 | 1 | Θ | ϑ |
            | 03B8 | 03D1 | CEB8 | CF91 | 0 | 1 | θ | ϑ |
            | 0398 | 03F4 | CE98 | CFB4 | 1 | 0 | Θ | ϴ |
            | 03B8 | 03F4 | CEB8 | CFB4 | 1 | 0 | θ | ϴ |
            | 0345 | 0399 | CD85 | CE99 | 0 | 1 | ͅ | Ι |
            | 1FBE | 0399 | E1BEBE | CE99 | 0 | 1 | ι | Ι |
            | 0345 | 03B9 | CD85 | CEB9 | 0 | 1 | ͅ | ι |
            | 1FBE | 03B9 | E1BEBE | CEB9 | 0 | 1 | ι | ι |
            | 0345 | 1FBE | CD85 | E1BEBE | 0 | 1 | ͅ | ι |
            | 0399 | 1FBE | CE99 | E1BEBE | 0 | 1 | Ι | ι |
            | 03B9 | 1FBE | CEB9 | E1BEBE | 0 | 1 | ι | ι |
            | 03F0 | 039A | CFB0 | CE9A | 0 | 1 | ϰ | Κ |
            | 03F0 | 03BA | CFB0 | CEBA | 0 | 1 | ϰ | κ |
            | 039A | 03F0 | CE9A | CFB0 | 0 | 1 | Κ | ϰ |
            | 03BA | 03F0 | CEBA | CFB0 | 0 | 1 | κ | ϰ |
            | 039C | 00B5 | CE9C | C2B5 | 0 | 1 | Μ | µ |
            | 03BC | 00B5 | CEBC | C2B5 | 0 | 1 | μ | µ |
            | 00B5 | 039C | C2B5 | CE9C | 0 | 1 | µ | Μ |
            | 00B5 | 03BC | C2B5 | CEBC | 0 | 1 | µ | μ |
            | 03D6 | 03A0 | CF96 | CEA0 | 0 | 1 | ϖ | Π |
            | 03D6 | 03C0 | CF96 | CF80 | 0 | 1 | ϖ | π |
            | 03A0 | 03D6 | CEA0 | CF96 | 0 | 1 | Π | ϖ |
            | 03C0 | 03D6 | CF80 | CF96 | 0 | 1 | π | ϖ |
            | 03F1 | 03A1 | CFB1 | CEA1 | 0 | 1 | ϱ | Ρ |
            | 03F1 | 03C1 | CFB1 | CF81 | 0 | 1 | ϱ | ρ |
            | 03A1 | 03F1 | CEA1 | CFB1 | 0 | 1 | Ρ | ϱ |
            | 03C1 | 03F1 | CF81 | CFB1 | 0 | 1 | ρ | ϱ |
            | 03C2 | 03A3 | CF82 | CEA3 | 0 | 1 | ς | Σ |
            | 03A3 | 03C2 | CEA3 | CF82 | 0 | 1 | Σ | ς |
            | 03C3 | 03C2 | CF83 | CF82 | 0 | 1 | σ | ς |
            | 03C2 | 03C3 | CF82 | CF83 | 0 | 1 | ς | σ |
            | 03D5 | 03A6 | CF95 | CEA6 | 0 | 1 | ϕ | Φ |
            | 03D5 | 03C6 | CF95 | CF86 | 0 | 1 | ϕ | φ |
            | 03A6 | 03D5 | CEA6 | CF95 | 0 | 1 | Φ | ϕ |
            | 03C6 | 03D5 | CF86 | CF95 | 0 | 1 | φ | ϕ |
            | 2126 | 03A9 | E284A6 | CEA9 | 1 | 0 | Ω | Ω |
            | 2126 | 03C9 | E284A6 | CF89 | 1 | 0 | Ω | ω |
            | 03A9 | 2126 | CEA9 | E284A6 | 1 | 0 | Ω | Ω |
            | 03C9 | 2126 | CF89 | E284A6 | 1 | 0 | ω | Ω |
            | 1C80 | 0412 | E1B280 | D092 | 0 | 1 | ᲀ | В |
            | 1C80 | 0432 | E1B280 | D0B2 | 0 | 1 | ᲀ | в |
            | 0412 | 1C80 | D092 | E1B280 | 0 | 1 | В | ᲀ |
            | 0432 | 1C80 | D0B2 | E1B280 | 0 | 1 | в | ᲀ |
            | 1C81 | 0414 | E1B281 | D094 | 0 | 1 | ᲁ | Д |
            | 1C81 | 0434 | E1B281 | D0B4 | 0 | 1 | ᲁ | д |
            | 0414 | 1C81 | D094 | E1B281 | 0 | 1 | Д | ᲁ |
            | 0434 | 1C81 | D0B4 | E1B281 | 0 | 1 | д | ᲁ |
            | 1C82 | 041E | E1B282 | D09E | 0 | 1 | ᲂ | О |
            | 1C82 | 043E | E1B282 | D0BE | 0 | 1 | ᲂ | о |
            | 041E | 1C82 | D09E | E1B282 | 0 | 1 | О | ᲂ |
            | 043E | 1C82 | D0BE | E1B282 | 0 | 1 | о | ᲂ |
            | 1C83 | 0421 | E1B283 | D0A1 | 0 | 1 | ᲃ | С |
            | 1C83 | 0441 | E1B283 | D181 | 0 | 1 | ᲃ | с |
            | 0421 | 1C83 | D0A1 | E1B283 | 0 | 1 | С | ᲃ |
            | 0441 | 1C83 | D181 | E1B283 | 0 | 1 | с | ᲃ |
            | 1C84 | 0422 | E1B284 | D0A2 | 0 | 1 | ᲄ | Т |
            | 1C85 | 0422 | E1B285 | D0A2 | 0 | 1 | ᲅ | Т |
            | 1C84 | 0442 | E1B284 | D182 | 0 | 1 | ᲄ | т |
            | 1C85 | 0442 | E1B285 | D182 | 0 | 1 | ᲅ | т |
            | 0422 | 1C84 | D0A2 | E1B284 | 0 | 1 | Т | ᲄ |
            | 0442 | 1C84 | D182 | E1B284 | 0 | 1 | т | ᲄ |
            | 1C85 | 1C84 | E1B285 | E1B284 | 0 | 1 | ᲅ | ᲄ |
            | 0422 | 1C85 | D0A2 | E1B285 | 0 | 1 | Т | ᲅ |
            | 0442 | 1C85 | D182 | E1B285 | 0 | 1 | т | ᲅ |
            | 1C84 | 1C85 | E1B284 | E1B285 | 0 | 1 | ᲄ | ᲅ |
            | A64A | 1C88 | EA998A | E1B288 | 0 | 1 | Ꙋ | ᲈ |
            | A64B | 1C88 | EA998B | E1B288 | 0 | 1 | ꙋ | ᲈ |
            | 1C88 | A64A | E1B288 | EA998A | 0 | 1 | ᲈ | Ꙋ |
            | 1C88 | A64B | E1B288 | EA998B | 0 | 1 | ᲈ | ꙋ |
            | 1C86 | 042A | E1B286 | D0AA | 0 | 1 | ᲆ | Ъ |
            | 1C86 | 044A | E1B286 | D18A | 0 | 1 | ᲆ | ъ |
            | 042A | 1C86 | D0AA | E1B286 | 0 | 1 | Ъ | ᲆ |
            | 044A | 1C86 | D18A | E1B286 | 0 | 1 | ъ | ᲆ |
            | 1C87 | 0462 | E1B287 | D1A2 | 0 | 1 | ᲇ | Ѣ |
            | 1C87 | 0463 | E1B287 | D1A3 | 0 | 1 | ᲇ | ѣ |
            | 0462 | 1C87 | D1A2 | E1B287 | 0 | 1 | Ѣ | ᲇ |
            | 0463 | 1C87 | D1A3 | E1B287 | 0 | 1 | ѣ | ᲇ |
            +-----------+-----------+--------+--------+----------+----------+------+------+
            {noformat}


            Note, toupper comparison considers more pairs as equal than tolower comparison:
            {code:sql}
            SELECT SUM(eq_lower), SUM(eq_upper) FROM t21;
            {code}
            {noformat}
            +---------------+---------------+
            | SUM(eq_lower) | SUM(eq_upper) |
            +---------------+---------------+
            | 21 | 96 |
            +---------------+---------------+
            {noformat}

            h2. A more compact table with distinct pairs (the character with a smaller code point is on the left)
            {code:sql}
            SELECT
              HEX(CONVERT(a USING ucs2)) AS unicode_a,
              HEX(CONVERT(b USING ucs2)) AS unicode_b,
              HEX(a), HEX(b),
              BINARY LOWER(a)=LOWER(b) AS eq_lower,
              BINARY UPPER(a)=UPPER(b) AS eq_upper,
              a,b
            FROM
            (
              SELECT DISTINCT IF(BINARY a<b,a,b) AS a,IF(binary a>=b,a,b) AS b from t21
            ) d1
            ORDER BY eq_lower, unicode_a, unicode_b;
            {code}
            {noformat}
            +-----------+-----------+--------+--------+----------+----------+-----+-----+
            | unicode_a | unicode_b | HEX(a) | HEX(b) | eq_lower | eq_upper | a | b |
            +-----------+-----------+--------+--------+----------+----------+-----+-----+
            | 0049 | 0131 | 49 | C4B1 | 0 | 1 | I | ı |
            | 0053 | 017F | 53 | C5BF | 0 | 1 | S | ſ |
            | 00B5 | 039C | C2B5 | CE9C | 0 | 1 | µ | Μ |
            | 0345 | 0399 | CD85 | CE99 | 0 | 1 | ͅ | Ι |
            | 0392 | 03D0 | CE92 | CF90 | 0 | 1 | Β | ϐ |
            | 0395 | 03F5 | CE95 | CFB5 | 0 | 1 | Ε | ϵ |
            | 0398 | 03D1 | CE98 | CF91 | 0 | 1 | Θ | ϑ |
            | 0399 | 1FBE | CE99 | E1BEBE | 0 | 1 | Ι | ι |
            | 039A | 03F0 | CE9A | CFB0 | 0 | 1 | Κ | ϰ |
            | 03A0 | 03D6 | CEA0 | CF96 | 0 | 1 | Π | ϖ |
            | 03A1 | 03F1 | CEA1 | CFB1 | 0 | 1 | Ρ | ϱ |
            | 03A3 | 03C2 | CEA3 | CF82 | 0 | 1 | Σ | ς |
            | 03A6 | 03D5 | CEA6 | CF95 | 0 | 1 | Φ | ϕ |
            | 0412 | 1C80 | D092 | E1B280 | 0 | 1 | В | ᲀ |
            | 0414 | 1C81 | D094 | E1B281 | 0 | 1 | Д | ᲁ |
            | 041E | 1C82 | D09E | E1B282 | 0 | 1 | О | ᲂ |
            | 0421 | 1C83 | D0A1 | E1B283 | 0 | 1 | С | ᲃ |
            | 0422 | 1C84 | D0A2 | E1B284 | 0 | 1 | Т | ᲄ |
            | 042A | 1C86 | D0AA | E1B286 | 0 | 1 | Ъ | ᲆ |
            | 0462 | 1C87 | D1A2 | E1B287 | 0 | 1 | Ѣ | ᲇ |
            | 1C88 | A64A | E1B288 | EA998A | 0 | 1 | ᲈ | Ꙋ |
            | 0049 | 0130 | 49 | C4B0 | 1 | 0 | I | İ |
            | 004B | 212A | 4B | E284AA | 1 | 0 | K | K |
            | 00C5 | 212B | C385 | E284AB | 1 | 0 | Å | Å |
            | 00DF | 1E9E | C39F | E1BA9E | 1 | 0 | ß | ẞ |
            | 03A9 | 2126 | CEA9 | E284A6 | 1 | 0 | Ω | Ω |
            +-----------+-----------+--------+--------+----------+----------+-----+-----+
            26 rows in set (0.005 sec)
            {noformat}

            h2. Cons of toupper vs tolower comparison
            - The tolower variant will compare exactly like strcasecmp() did
            - The toupper variant will compare close to how utf8mb3_general_ci works
            serg Sergei Golubchik made changes -
            Assignee Sergei Golubchik [ serg ] Alexander Barkov [ bar ]
            Status In Review [ 10002 ] Stalled [ 10000 ]
            bar Alexander Barkov made changes -
            bar Alexander Barkov made changes -
            Description There are two parallel comparison systems in MariaDB collation library, implemented as virtual functions in MY_COLLATION_HANDLER:

            - Comparison according to the collation, provided by these functions
            {code:cpp}
              int (*strnncoll)(CHARSET_INFO *,
                                   const uchar *, size_t, const uchar *, size_t, my_bool);
              int (*strnncollsp)(CHARSET_INFO *,
                                     const uchar *, size_t, const uchar *, size_t);

              int (*strnncollsp_nchars)(CHARSET_INFO *,
                                            const uchar *str1, size_t len1,
                                            const uchar *str2, size_t len2,
                                            size_t nchars,
                                            uint flags);
            {code}


            - Comparison in case insensitive (but accent sensitive) style, implemented by this function:
            {code:cpp}
              int (*strcasecmp)(CHARSET_INFO *, const char *, const char *);
            {code}
            Note, accent and case sensitivity of the collation does not matter. strcasecmp() always works using accent sensitive case insensitive comparison style.

            These two parallel systems are redundant.

            Note, strcasecmp() is used mostly to compare identifiers, while the functions of the first group are used to compare data.

            Let's get rid of the second comparison system:
            1. Remove MY_COLLATION_HANDLER::strcasecmp()
            2. Introduce a new collation utf8mb4_tolower_as_ci. Note, it should work for the entire Unicode range U+0000..U+10FFFF.
            3. Replace all calls for:
            {code:cpp}
            system_charset_info->coll->strcasecmp()
            {code}
            to calls for
            {code:cpp}
            my_charset_utf8mb4_tolower_as_ci->coll->strnncoll***()
            {code}

            The change would generally be quite mechanic. However, there is one small problem: strcasecmp() accepts 0-terminated strings, while the strnncoll-alike functions accept the pointer and the length. So some refactoring will be needed. Note, Monty earlier changed many MariaDB C data types to use LEX_CSTRING (instead of just a const char pointer) to store names. So this part of the current task will switch some more C data types to LEX_CSTRING.

            h2. tolower vs toupper comparison

            Another option is to implement utf8mb3_toupper_ci which will compare upper cases (instead of lower cases).


            The difference is in a few dozen BMP characters. This script finds all those characters:
            {code:sql}
            CREATE OR REPLACE TABLE t1 (a CHAR(1) CHARACTER SET utf8mb4 COLLATE utf8mb4_uca1400_ai_ci) ENGINE=MyISAM;
            DELIMITER $$
            FOR i IN 0..0xFFFF
            DO
              INSERT INTO t1 VALUES (CHAR(i USING ucs2));
            END FOR;
            $$
            DELIMITER ;
            ALTER TABLE t1
              ADD has_casefolding INT DEFAULT (BINARY LOWER(a)<>a OR BINARY UPPER(a)<>a),
              ADD KEY(has_casefolding, a);

            CREATE OR REPLACE TABLE t21 AS
            SELECT
              HEX(t1.a) AS hex_a, HEX(t2.a) AS hex_b,
              BINARY LOWER(t1.a)=LOWER(t2.a) eq_lower,
              BINARY UPPER(t1.a)=UPPER(t2.a) AS eq_upper,
              t1.a AS a, t2.a AS b
            FROM
              t1 t1, t1 t2
            WHERE
              t1.has_casefolding=1
            AND (BINARY LOWER(t1.a)=LOWER(t2.a))<>(BINARY UPPER(t1.a)=UPPER(t2.a));

            SELECT
              HEX(CONVERT(a USING ucs2)) AS unicode_a,
              HEX(CONVERT(b USING ucs2)) AS unicode_b,
              t21.* FROM t21;
            {code}
            {noformat}
            +-----------+-----------+--------+--------+----------+----------+------+------+
            | unicode_a | unicode_b | hex_a | hex_b | eq_lower | eq_upper | a | b |
            +-----------+-----------+--------+--------+----------+----------+------+------+
            | 1E9E | 00DF | E1BA9E | C39F | 1 | 0 | ẞ | ß |
            | 0399 | 0345 | CE99 | CD85 | 0 | 1 | Ι | ͅ |
            | 03B9 | 0345 | CEB9 | CD85 | 0 | 1 | ι | ͅ |
            | 1FBE | 0345 | E1BEBE | CD85 | 0 | 1 | ι | ͅ |
            | 212B | 00C5 | E284AB | C385 | 1 | 0 | Å | Å |
            | 212B | 00E5 | E284AB | C3A5 | 1 | 0 | Å | å |
            | 00C5 | 212B | C385 | E284AB | 1 | 0 | Å | Å |
            | 00E5 | 212B | C3A5 | E284AB | 1 | 0 | å | Å |
            | 0130 | 0049 | C4B0 | 49 | 1 | 0 | İ | I |
            | 0131 | 0049 | C4B1 | 49 | 0 | 1 | ı | I |
            | 0130 | 0069 | C4B0 | 69 | 1 | 0 | İ | i |
            | 0131 | 0069 | C4B1 | 69 | 0 | 1 | ı | i |
            | 0049 | 0130 | 49 | C4B0 | 1 | 0 | I | İ |
            | 0069 | 0130 | 69 | C4B0 | 1 | 0 | i | İ |
            | 0049 | 0131 | 49 | C4B1 | 0 | 1 | I | ı |
            | 0069 | 0131 | 69 | C4B1 | 0 | 1 | i | ı |
            | 212A | 004B | E284AA | 4B | 1 | 0 | K | K |
            | 212A | 006B | E284AA | 6B | 1 | 0 | K | k |
            | 004B | 212A | 4B | E284AA | 1 | 0 | K | K |
            | 006B | 212A | 6B | E284AA | 1 | 0 | k | K |
            | 017F | 0053 | C5BF | 53 | 0 | 1 | ſ | S |
            | 017F | 0073 | C5BF | 73 | 0 | 1 | ſ | s |
            | 0053 | 017F | 53 | C5BF | 0 | 1 | S | ſ |
            | 0073 | 017F | 73 | C5BF | 0 | 1 | s | ſ |
            | 1E9B | 1E60 | E1BA9B | E1B9A0 | 0 | 1 | ẛ | Ṡ |
            | 1E9B | 1E61 | E1BA9B | E1B9A1 | 0 | 1 | ẛ | ṡ |
            | 1E60 | 1E9B | E1B9A0 | E1BA9B | 0 | 1 | Ṡ | ẛ |
            | 1E61 | 1E9B | E1B9A1 | E1BA9B | 0 | 1 | ṡ | ẛ |
            | 03D0 | 0392 | CF90 | CE92 | 0 | 1 | ϐ | Β |
            | 03D0 | 03B2 | CF90 | CEB2 | 0 | 1 | ϐ | β |
            | 0392 | 03D0 | CE92 | CF90 | 0 | 1 | Β | ϐ |
            | 03B2 | 03D0 | CEB2 | CF90 | 0 | 1 | β | ϐ |
            | 03F5 | 0395 | CFB5 | CE95 | 0 | 1 | ϵ | Ε |
            | 03F5 | 03B5 | CFB5 | CEB5 | 0 | 1 | ϵ | ε |
            | 0395 | 03F5 | CE95 | CFB5 | 0 | 1 | Ε | ϵ |
            | 03B5 | 03F5 | CEB5 | CFB5 | 0 | 1 | ε | ϵ |
            | 03D1 | 0398 | CF91 | CE98 | 0 | 1 | ϑ | Θ |
            | 03F4 | 0398 | CFB4 | CE98 | 1 | 0 | ϴ | Θ |
            | 03D1 | 03B8 | CF91 | CEB8 | 0 | 1 | ϑ | θ |
            | 03F4 | 03B8 | CFB4 | CEB8 | 1 | 0 | ϴ | θ |
            | 0398 | 03D1 | CE98 | CF91 | 0 | 1 | Θ | ϑ |
            | 03B8 | 03D1 | CEB8 | CF91 | 0 | 1 | θ | ϑ |
            | 0398 | 03F4 | CE98 | CFB4 | 1 | 0 | Θ | ϴ |
            | 03B8 | 03F4 | CEB8 | CFB4 | 1 | 0 | θ | ϴ |
            | 0345 | 0399 | CD85 | CE99 | 0 | 1 | ͅ | Ι |
            | 1FBE | 0399 | E1BEBE | CE99 | 0 | 1 | ι | Ι |
            | 0345 | 03B9 | CD85 | CEB9 | 0 | 1 | ͅ | ι |
            | 1FBE | 03B9 | E1BEBE | CEB9 | 0 | 1 | ι | ι |
            | 0345 | 1FBE | CD85 | E1BEBE | 0 | 1 | ͅ | ι |
            | 0399 | 1FBE | CE99 | E1BEBE | 0 | 1 | Ι | ι |
            | 03B9 | 1FBE | CEB9 | E1BEBE | 0 | 1 | ι | ι |
            | 03F0 | 039A | CFB0 | CE9A | 0 | 1 | ϰ | Κ |
            | 03F0 | 03BA | CFB0 | CEBA | 0 | 1 | ϰ | κ |
            | 039A | 03F0 | CE9A | CFB0 | 0 | 1 | Κ | ϰ |
            | 03BA | 03F0 | CEBA | CFB0 | 0 | 1 | κ | ϰ |
            | 039C | 00B5 | CE9C | C2B5 | 0 | 1 | Μ | µ |
            | 03BC | 00B5 | CEBC | C2B5 | 0 | 1 | μ | µ |
            | 00B5 | 039C | C2B5 | CE9C | 0 | 1 | µ | Μ |
            | 00B5 | 03BC | C2B5 | CEBC | 0 | 1 | µ | μ |
            | 03D6 | 03A0 | CF96 | CEA0 | 0 | 1 | ϖ | Π |
            | 03D6 | 03C0 | CF96 | CF80 | 0 | 1 | ϖ | π |
            | 03A0 | 03D6 | CEA0 | CF96 | 0 | 1 | Π | ϖ |
            | 03C0 | 03D6 | CF80 | CF96 | 0 | 1 | π | ϖ |
            | 03F1 | 03A1 | CFB1 | CEA1 | 0 | 1 | ϱ | Ρ |
            | 03F1 | 03C1 | CFB1 | CF81 | 0 | 1 | ϱ | ρ |
            | 03A1 | 03F1 | CEA1 | CFB1 | 0 | 1 | Ρ | ϱ |
            | 03C1 | 03F1 | CF81 | CFB1 | 0 | 1 | ρ | ϱ |
            | 03C2 | 03A3 | CF82 | CEA3 | 0 | 1 | ς | Σ |
            | 03A3 | 03C2 | CEA3 | CF82 | 0 | 1 | Σ | ς |
            | 03C3 | 03C2 | CF83 | CF82 | 0 | 1 | σ | ς |
            | 03C2 | 03C3 | CF82 | CF83 | 0 | 1 | ς | σ |
            | 03D5 | 03A6 | CF95 | CEA6 | 0 | 1 | ϕ | Φ |
            | 03D5 | 03C6 | CF95 | CF86 | 0 | 1 | ϕ | φ |
            | 03A6 | 03D5 | CEA6 | CF95 | 0 | 1 | Φ | ϕ |
            | 03C6 | 03D5 | CF86 | CF95 | 0 | 1 | φ | ϕ |
            | 2126 | 03A9 | E284A6 | CEA9 | 1 | 0 | Ω | Ω |
            | 2126 | 03C9 | E284A6 | CF89 | 1 | 0 | Ω | ω |
            | 03A9 | 2126 | CEA9 | E284A6 | 1 | 0 | Ω | Ω |
            | 03C9 | 2126 | CF89 | E284A6 | 1 | 0 | ω | Ω |
            | 1C80 | 0412 | E1B280 | D092 | 0 | 1 | ᲀ | В |
            | 1C80 | 0432 | E1B280 | D0B2 | 0 | 1 | ᲀ | в |
            | 0412 | 1C80 | D092 | E1B280 | 0 | 1 | В | ᲀ |
            | 0432 | 1C80 | D0B2 | E1B280 | 0 | 1 | в | ᲀ |
            | 1C81 | 0414 | E1B281 | D094 | 0 | 1 | ᲁ | Д |
            | 1C81 | 0434 | E1B281 | D0B4 | 0 | 1 | ᲁ | д |
            | 0414 | 1C81 | D094 | E1B281 | 0 | 1 | Д | ᲁ |
            | 0434 | 1C81 | D0B4 | E1B281 | 0 | 1 | д | ᲁ |
            | 1C82 | 041E | E1B282 | D09E | 0 | 1 | ᲂ | О |
            | 1C82 | 043E | E1B282 | D0BE | 0 | 1 | ᲂ | о |
            | 041E | 1C82 | D09E | E1B282 | 0 | 1 | О | ᲂ |
            | 043E | 1C82 | D0BE | E1B282 | 0 | 1 | о | ᲂ |
            | 1C83 | 0421 | E1B283 | D0A1 | 0 | 1 | ᲃ | С |
            | 1C83 | 0441 | E1B283 | D181 | 0 | 1 | ᲃ | с |
            | 0421 | 1C83 | D0A1 | E1B283 | 0 | 1 | С | ᲃ |
            | 0441 | 1C83 | D181 | E1B283 | 0 | 1 | с | ᲃ |
            | 1C84 | 0422 | E1B284 | D0A2 | 0 | 1 | ᲄ | Т |
            | 1C85 | 0422 | E1B285 | D0A2 | 0 | 1 | ᲅ | Т |
            | 1C84 | 0442 | E1B284 | D182 | 0 | 1 | ᲄ | т |
            | 1C85 | 0442 | E1B285 | D182 | 0 | 1 | ᲅ | т |
            | 0422 | 1C84 | D0A2 | E1B284 | 0 | 1 | Т | ᲄ |
            | 0442 | 1C84 | D182 | E1B284 | 0 | 1 | т | ᲄ |
            | 1C85 | 1C84 | E1B285 | E1B284 | 0 | 1 | ᲅ | ᲄ |
            | 0422 | 1C85 | D0A2 | E1B285 | 0 | 1 | Т | ᲅ |
            | 0442 | 1C85 | D182 | E1B285 | 0 | 1 | т | ᲅ |
            | 1C84 | 1C85 | E1B284 | E1B285 | 0 | 1 | ᲄ | ᲅ |
            | A64A | 1C88 | EA998A | E1B288 | 0 | 1 | Ꙋ | ᲈ |
            | A64B | 1C88 | EA998B | E1B288 | 0 | 1 | ꙋ | ᲈ |
            | 1C88 | A64A | E1B288 | EA998A | 0 | 1 | ᲈ | Ꙋ |
            | 1C88 | A64B | E1B288 | EA998B | 0 | 1 | ᲈ | ꙋ |
            | 1C86 | 042A | E1B286 | D0AA | 0 | 1 | ᲆ | Ъ |
            | 1C86 | 044A | E1B286 | D18A | 0 | 1 | ᲆ | ъ |
            | 042A | 1C86 | D0AA | E1B286 | 0 | 1 | Ъ | ᲆ |
            | 044A | 1C86 | D18A | E1B286 | 0 | 1 | ъ | ᲆ |
            | 1C87 | 0462 | E1B287 | D1A2 | 0 | 1 | ᲇ | Ѣ |
            | 1C87 | 0463 | E1B287 | D1A3 | 0 | 1 | ᲇ | ѣ |
            | 0462 | 1C87 | D1A2 | E1B287 | 0 | 1 | Ѣ | ᲇ |
            | 0463 | 1C87 | D1A3 | E1B287 | 0 | 1 | ѣ | ᲇ |
            +-----------+-----------+--------+--------+----------+----------+------+------+
            {noformat}


            Note, toupper comparison considers more pairs as equal than tolower comparison:
            {code:sql}
            SELECT SUM(eq_lower), SUM(eq_upper) FROM t21;
            {code}
            {noformat}
            +---------------+---------------+
            | SUM(eq_lower) | SUM(eq_upper) |
            +---------------+---------------+
            | 21 | 96 |
            +---------------+---------------+
            {noformat}

            h2. A more compact table with distinct pairs (the character with a smaller code point is on the left)
            {code:sql}
            SELECT
              HEX(CONVERT(a USING ucs2)) AS unicode_a,
              HEX(CONVERT(b USING ucs2)) AS unicode_b,
              HEX(a), HEX(b),
              BINARY LOWER(a)=LOWER(b) AS eq_lower,
              BINARY UPPER(a)=UPPER(b) AS eq_upper,
              a,b
            FROM
            (
              SELECT DISTINCT IF(BINARY a<b,a,b) AS a,IF(binary a>=b,a,b) AS b from t21
            ) d1
            ORDER BY eq_lower, unicode_a, unicode_b;
            {code}
            {noformat}
            +-----------+-----------+--------+--------+----------+----------+-----+-----+
            | unicode_a | unicode_b | HEX(a) | HEX(b) | eq_lower | eq_upper | a | b |
            +-----------+-----------+--------+--------+----------+----------+-----+-----+
            | 0049 | 0131 | 49 | C4B1 | 0 | 1 | I | ı |
            | 0053 | 017F | 53 | C5BF | 0 | 1 | S | ſ |
            | 00B5 | 039C | C2B5 | CE9C | 0 | 1 | µ | Μ |
            | 0345 | 0399 | CD85 | CE99 | 0 | 1 | ͅ | Ι |
            | 0392 | 03D0 | CE92 | CF90 | 0 | 1 | Β | ϐ |
            | 0395 | 03F5 | CE95 | CFB5 | 0 | 1 | Ε | ϵ |
            | 0398 | 03D1 | CE98 | CF91 | 0 | 1 | Θ | ϑ |
            | 0399 | 1FBE | CE99 | E1BEBE | 0 | 1 | Ι | ι |
            | 039A | 03F0 | CE9A | CFB0 | 0 | 1 | Κ | ϰ |
            | 03A0 | 03D6 | CEA0 | CF96 | 0 | 1 | Π | ϖ |
            | 03A1 | 03F1 | CEA1 | CFB1 | 0 | 1 | Ρ | ϱ |
            | 03A3 | 03C2 | CEA3 | CF82 | 0 | 1 | Σ | ς |
            | 03A6 | 03D5 | CEA6 | CF95 | 0 | 1 | Φ | ϕ |
            | 0412 | 1C80 | D092 | E1B280 | 0 | 1 | В | ᲀ |
            | 0414 | 1C81 | D094 | E1B281 | 0 | 1 | Д | ᲁ |
            | 041E | 1C82 | D09E | E1B282 | 0 | 1 | О | ᲂ |
            | 0421 | 1C83 | D0A1 | E1B283 | 0 | 1 | С | ᲃ |
            | 0422 | 1C84 | D0A2 | E1B284 | 0 | 1 | Т | ᲄ |
            | 042A | 1C86 | D0AA | E1B286 | 0 | 1 | Ъ | ᲆ |
            | 0462 | 1C87 | D1A2 | E1B287 | 0 | 1 | Ѣ | ᲇ |
            | 1C88 | A64A | E1B288 | EA998A | 0 | 1 | ᲈ | Ꙋ |
            | 0049 | 0130 | 49 | C4B0 | 1 | 0 | I | İ |
            | 004B | 212A | 4B | E284AA | 1 | 0 | K | K |
            | 00C5 | 212B | C385 | E284AB | 1 | 0 | Å | Å |
            | 00DF | 1E9E | C39F | E1BA9E | 1 | 0 | ß | ẞ |
            | 03A9 | 2126 | CEA9 | E284A6 | 1 | 0 | Ω | Ω |
            +-----------+-----------+--------+--------+----------+----------+-----+-----+
            26 rows in set (0.005 sec)
            {noformat}

            h2. Cons of toupper vs tolower comparison
            - The tolower variant will compare exactly like strcasecmp() did
            - The toupper variant will compare close to how utf8mb3_general_ci works
            There are two parallel comparison systems in MariaDB collation library, implemented as virtual functions in MY_COLLATION_HANDLER:

            - Comparison according to the collation, provided by these functions
            {code:cpp}
              int (*strnncoll)(CHARSET_INFO *,
                                   const uchar *, size_t, const uchar *, size_t, my_bool);
              int (*strnncollsp)(CHARSET_INFO *,
                                     const uchar *, size_t, const uchar *, size_t);

              int (*strnncollsp_nchars)(CHARSET_INFO *,
                                            const uchar *str1, size_t len1,
                                            const uchar *str2, size_t len2,
                                            size_t nchars,
                                            uint flags);
            {code}


            - Comparison in case insensitive (but accent sensitive) style, implemented by this function:
            {code:cpp}
              int (*strcasecmp)(CHARSET_INFO *, const char *, const char *);
            {code}
            Note, accent and case sensitivity of the collation does not matter. strcasecmp() always works using accent sensitive case insensitive comparison style.

            These two parallel systems are redundant.

            Note, strcasecmp() is used mostly to compare identifiers, while the functions of the first group are used to compare data.

            Let's get rid of the second comparison system:
            1. Remove MY_COLLATION_HANDLER::strcasecmp()
            2. Introduce a new collation utf8mb4_general_as_ci. Note, it should work for the entire Unicode range U+0000..U+10FFFF.
            3. Replace all calls for:
            {code:cpp}
            system_charset_info->coll->strcasecmp()
            {code}
            to calls for
            {code:cpp}
            my_charset_utf8mb4_general_as_ci->coll->strnncoll***()
            {code}

            The change would generally be quite mechanic. However, there is one small problem: strcasecmp() accepts 0-terminated strings, while the strnncoll-alike functions accept the pointer and the length. So some refactoring will be needed. Note, Monty earlier changed many MariaDB C data types to use LEX_CSTRING (instead of just a const char pointer) to store names. So this part of the current task will switch some more C data types to LEX_CSTRING.

            h2. tolower vs toupper comparison

            Another option is to implement utf8mb3_general_as_ci which will compare upper cases (instead of lower cases).


            The difference is in a few dozen BMP characters. This script finds all those characters:
            {code:sql}
            CREATE OR REPLACE TABLE t1 (a CHAR(1) CHARACTER SET utf8mb4 COLLATE utf8mb4_uca1400_ai_ci) ENGINE=MyISAM;
            DELIMITER $$
            FOR i IN 0..0xFFFF
            DO
              INSERT INTO t1 VALUES (CHAR(i USING ucs2));
            END FOR;
            $$
            DELIMITER ;
            ALTER TABLE t1
              ADD has_casefolding INT DEFAULT (BINARY LOWER(a)<>a OR BINARY UPPER(a)<>a),
              ADD KEY(has_casefolding, a);

            CREATE OR REPLACE TABLE t21 AS
            SELECT
              HEX(t1.a) AS hex_a, HEX(t2.a) AS hex_b,
              BINARY LOWER(t1.a)=LOWER(t2.a) eq_lower,
              BINARY UPPER(t1.a)=UPPER(t2.a) AS eq_upper,
              t1.a AS a, t2.a AS b
            FROM
              t1 t1, t1 t2
            WHERE
              t1.has_casefolding=1
            AND (BINARY LOWER(t1.a)=LOWER(t2.a))<>(BINARY UPPER(t1.a)=UPPER(t2.a));

            SELECT
              HEX(CONVERT(a USING ucs2)) AS unicode_a,
              HEX(CONVERT(b USING ucs2)) AS unicode_b,
              t21.* FROM t21;
            {code}
            {noformat}
            +-----------+-----------+--------+--------+----------+----------+------+------+
            | unicode_a | unicode_b | hex_a | hex_b | eq_lower | eq_upper | a | b |
            +-----------+-----------+--------+--------+----------+----------+------+------+
            | 1E9E | 00DF | E1BA9E | C39F | 1 | 0 | ẞ | ß |
            | 0399 | 0345 | CE99 | CD85 | 0 | 1 | Ι | ͅ |
            | 03B9 | 0345 | CEB9 | CD85 | 0 | 1 | ι | ͅ |
            | 1FBE | 0345 | E1BEBE | CD85 | 0 | 1 | ι | ͅ |
            | 212B | 00C5 | E284AB | C385 | 1 | 0 | Å | Å |
            | 212B | 00E5 | E284AB | C3A5 | 1 | 0 | Å | å |
            | 00C5 | 212B | C385 | E284AB | 1 | 0 | Å | Å |
            | 00E5 | 212B | C3A5 | E284AB | 1 | 0 | å | Å |
            | 0130 | 0049 | C4B0 | 49 | 1 | 0 | İ | I |
            | 0131 | 0049 | C4B1 | 49 | 0 | 1 | ı | I |
            | 0130 | 0069 | C4B0 | 69 | 1 | 0 | İ | i |
            | 0131 | 0069 | C4B1 | 69 | 0 | 1 | ı | i |
            | 0049 | 0130 | 49 | C4B0 | 1 | 0 | I | İ |
            | 0069 | 0130 | 69 | C4B0 | 1 | 0 | i | İ |
            | 0049 | 0131 | 49 | C4B1 | 0 | 1 | I | ı |
            | 0069 | 0131 | 69 | C4B1 | 0 | 1 | i | ı |
            | 212A | 004B | E284AA | 4B | 1 | 0 | K | K |
            | 212A | 006B | E284AA | 6B | 1 | 0 | K | k |
            | 004B | 212A | 4B | E284AA | 1 | 0 | K | K |
            | 006B | 212A | 6B | E284AA | 1 | 0 | k | K |
            | 017F | 0053 | C5BF | 53 | 0 | 1 | ſ | S |
            | 017F | 0073 | C5BF | 73 | 0 | 1 | ſ | s |
            | 0053 | 017F | 53 | C5BF | 0 | 1 | S | ſ |
            | 0073 | 017F | 73 | C5BF | 0 | 1 | s | ſ |
            | 1E9B | 1E60 | E1BA9B | E1B9A0 | 0 | 1 | ẛ | Ṡ |
            | 1E9B | 1E61 | E1BA9B | E1B9A1 | 0 | 1 | ẛ | ṡ |
            | 1E60 | 1E9B | E1B9A0 | E1BA9B | 0 | 1 | Ṡ | ẛ |
            | 1E61 | 1E9B | E1B9A1 | E1BA9B | 0 | 1 | ṡ | ẛ |
            | 03D0 | 0392 | CF90 | CE92 | 0 | 1 | ϐ | Β |
            | 03D0 | 03B2 | CF90 | CEB2 | 0 | 1 | ϐ | β |
            | 0392 | 03D0 | CE92 | CF90 | 0 | 1 | Β | ϐ |
            | 03B2 | 03D0 | CEB2 | CF90 | 0 | 1 | β | ϐ |
            | 03F5 | 0395 | CFB5 | CE95 | 0 | 1 | ϵ | Ε |
            | 03F5 | 03B5 | CFB5 | CEB5 | 0 | 1 | ϵ | ε |
            | 0395 | 03F5 | CE95 | CFB5 | 0 | 1 | Ε | ϵ |
            | 03B5 | 03F5 | CEB5 | CFB5 | 0 | 1 | ε | ϵ |
            | 03D1 | 0398 | CF91 | CE98 | 0 | 1 | ϑ | Θ |
            | 03F4 | 0398 | CFB4 | CE98 | 1 | 0 | ϴ | Θ |
            | 03D1 | 03B8 | CF91 | CEB8 | 0 | 1 | ϑ | θ |
            | 03F4 | 03B8 | CFB4 | CEB8 | 1 | 0 | ϴ | θ |
            | 0398 | 03D1 | CE98 | CF91 | 0 | 1 | Θ | ϑ |
            | 03B8 | 03D1 | CEB8 | CF91 | 0 | 1 | θ | ϑ |
            | 0398 | 03F4 | CE98 | CFB4 | 1 | 0 | Θ | ϴ |
            | 03B8 | 03F4 | CEB8 | CFB4 | 1 | 0 | θ | ϴ |
            | 0345 | 0399 | CD85 | CE99 | 0 | 1 | ͅ | Ι |
            | 1FBE | 0399 | E1BEBE | CE99 | 0 | 1 | ι | Ι |
            | 0345 | 03B9 | CD85 | CEB9 | 0 | 1 | ͅ | ι |
            | 1FBE | 03B9 | E1BEBE | CEB9 | 0 | 1 | ι | ι |
            | 0345 | 1FBE | CD85 | E1BEBE | 0 | 1 | ͅ | ι |
            | 0399 | 1FBE | CE99 | E1BEBE | 0 | 1 | Ι | ι |
            | 03B9 | 1FBE | CEB9 | E1BEBE | 0 | 1 | ι | ι |
            | 03F0 | 039A | CFB0 | CE9A | 0 | 1 | ϰ | Κ |
            | 03F0 | 03BA | CFB0 | CEBA | 0 | 1 | ϰ | κ |
            | 039A | 03F0 | CE9A | CFB0 | 0 | 1 | Κ | ϰ |
            | 03BA | 03F0 | CEBA | CFB0 | 0 | 1 | κ | ϰ |
            | 039C | 00B5 | CE9C | C2B5 | 0 | 1 | Μ | µ |
            | 03BC | 00B5 | CEBC | C2B5 | 0 | 1 | μ | µ |
            | 00B5 | 039C | C2B5 | CE9C | 0 | 1 | µ | Μ |
            | 00B5 | 03BC | C2B5 | CEBC | 0 | 1 | µ | μ |
            | 03D6 | 03A0 | CF96 | CEA0 | 0 | 1 | ϖ | Π |
            | 03D6 | 03C0 | CF96 | CF80 | 0 | 1 | ϖ | π |
            | 03A0 | 03D6 | CEA0 | CF96 | 0 | 1 | Π | ϖ |
            | 03C0 | 03D6 | CF80 | CF96 | 0 | 1 | π | ϖ |
            | 03F1 | 03A1 | CFB1 | CEA1 | 0 | 1 | ϱ | Ρ |
            | 03F1 | 03C1 | CFB1 | CF81 | 0 | 1 | ϱ | ρ |
            | 03A1 | 03F1 | CEA1 | CFB1 | 0 | 1 | Ρ | ϱ |
            | 03C1 | 03F1 | CF81 | CFB1 | 0 | 1 | ρ | ϱ |
            | 03C2 | 03A3 | CF82 | CEA3 | 0 | 1 | ς | Σ |
            | 03A3 | 03C2 | CEA3 | CF82 | 0 | 1 | Σ | ς |
            | 03C3 | 03C2 | CF83 | CF82 | 0 | 1 | σ | ς |
            | 03C2 | 03C3 | CF82 | CF83 | 0 | 1 | ς | σ |
            | 03D5 | 03A6 | CF95 | CEA6 | 0 | 1 | ϕ | Φ |
            | 03D5 | 03C6 | CF95 | CF86 | 0 | 1 | ϕ | φ |
            | 03A6 | 03D5 | CEA6 | CF95 | 0 | 1 | Φ | ϕ |
            | 03C6 | 03D5 | CF86 | CF95 | 0 | 1 | φ | ϕ |
            | 2126 | 03A9 | E284A6 | CEA9 | 1 | 0 | Ω | Ω |
            | 2126 | 03C9 | E284A6 | CF89 | 1 | 0 | Ω | ω |
            | 03A9 | 2126 | CEA9 | E284A6 | 1 | 0 | Ω | Ω |
            | 03C9 | 2126 | CF89 | E284A6 | 1 | 0 | ω | Ω |
            | 1C80 | 0412 | E1B280 | D092 | 0 | 1 | ᲀ | В |
            | 1C80 | 0432 | E1B280 | D0B2 | 0 | 1 | ᲀ | в |
            | 0412 | 1C80 | D092 | E1B280 | 0 | 1 | В | ᲀ |
            | 0432 | 1C80 | D0B2 | E1B280 | 0 | 1 | в | ᲀ |
            | 1C81 | 0414 | E1B281 | D094 | 0 | 1 | ᲁ | Д |
            | 1C81 | 0434 | E1B281 | D0B4 | 0 | 1 | ᲁ | д |
            | 0414 | 1C81 | D094 | E1B281 | 0 | 1 | Д | ᲁ |
            | 0434 | 1C81 | D0B4 | E1B281 | 0 | 1 | д | ᲁ |
            | 1C82 | 041E | E1B282 | D09E | 0 | 1 | ᲂ | О |
            | 1C82 | 043E | E1B282 | D0BE | 0 | 1 | ᲂ | о |
            | 041E | 1C82 | D09E | E1B282 | 0 | 1 | О | ᲂ |
            | 043E | 1C82 | D0BE | E1B282 | 0 | 1 | о | ᲂ |
            | 1C83 | 0421 | E1B283 | D0A1 | 0 | 1 | ᲃ | С |
            | 1C83 | 0441 | E1B283 | D181 | 0 | 1 | ᲃ | с |
            | 0421 | 1C83 | D0A1 | E1B283 | 0 | 1 | С | ᲃ |
            | 0441 | 1C83 | D181 | E1B283 | 0 | 1 | с | ᲃ |
            | 1C84 | 0422 | E1B284 | D0A2 | 0 | 1 | ᲄ | Т |
            | 1C85 | 0422 | E1B285 | D0A2 | 0 | 1 | ᲅ | Т |
            | 1C84 | 0442 | E1B284 | D182 | 0 | 1 | ᲄ | т |
            | 1C85 | 0442 | E1B285 | D182 | 0 | 1 | ᲅ | т |
            | 0422 | 1C84 | D0A2 | E1B284 | 0 | 1 | Т | ᲄ |
            | 0442 | 1C84 | D182 | E1B284 | 0 | 1 | т | ᲄ |
            | 1C85 | 1C84 | E1B285 | E1B284 | 0 | 1 | ᲅ | ᲄ |
            | 0422 | 1C85 | D0A2 | E1B285 | 0 | 1 | Т | ᲅ |
            | 0442 | 1C85 | D182 | E1B285 | 0 | 1 | т | ᲅ |
            | 1C84 | 1C85 | E1B284 | E1B285 | 0 | 1 | ᲄ | ᲅ |
            | A64A | 1C88 | EA998A | E1B288 | 0 | 1 | Ꙋ | ᲈ |
            | A64B | 1C88 | EA998B | E1B288 | 0 | 1 | ꙋ | ᲈ |
            | 1C88 | A64A | E1B288 | EA998A | 0 | 1 | ᲈ | Ꙋ |
            | 1C88 | A64B | E1B288 | EA998B | 0 | 1 | ᲈ | ꙋ |
            | 1C86 | 042A | E1B286 | D0AA | 0 | 1 | ᲆ | Ъ |
            | 1C86 | 044A | E1B286 | D18A | 0 | 1 | ᲆ | ъ |
            | 042A | 1C86 | D0AA | E1B286 | 0 | 1 | Ъ | ᲆ |
            | 044A | 1C86 | D18A | E1B286 | 0 | 1 | ъ | ᲆ |
            | 1C87 | 0462 | E1B287 | D1A2 | 0 | 1 | ᲇ | Ѣ |
            | 1C87 | 0463 | E1B287 | D1A3 | 0 | 1 | ᲇ | ѣ |
            | 0462 | 1C87 | D1A2 | E1B287 | 0 | 1 | Ѣ | ᲇ |
            | 0463 | 1C87 | D1A3 | E1B287 | 0 | 1 | ѣ | ᲇ |
            +-----------+-----------+--------+--------+----------+----------+------+------+
            {noformat}


            Note, toupper comparison considers more pairs as equal than tolower comparison:
            {code:sql}
            SELECT SUM(eq_lower), SUM(eq_upper) FROM t21;
            {code}
            {noformat}
            +---------------+---------------+
            | SUM(eq_lower) | SUM(eq_upper) |
            +---------------+---------------+
            | 21 | 96 |
            +---------------+---------------+
            {noformat}

            h2. A more compact table with distinct pairs (the character with a smaller code point is on the left)
            {code:sql}
            SELECT
              HEX(CONVERT(a USING ucs2)) AS unicode_a,
              HEX(CONVERT(b USING ucs2)) AS unicode_b,
              HEX(a), HEX(b),
              BINARY LOWER(a)=LOWER(b) AS eq_lower,
              BINARY UPPER(a)=UPPER(b) AS eq_upper,
              a,b
            FROM
            (
              SELECT DISTINCT IF(BINARY a<b,a,b) AS a,IF(binary a>=b,a,b) AS b from t21
            ) d1
            ORDER BY eq_lower, unicode_a, unicode_b;
            {code}
            {noformat}
            +-----------+-----------+--------+--------+----------+----------+-----+-----+
            | unicode_a | unicode_b | HEX(a) | HEX(b) | eq_lower | eq_upper | a | b |
            +-----------+-----------+--------+--------+----------+----------+-----+-----+
            | 0049 | 0131 | 49 | C4B1 | 0 | 1 | I | ı |
            | 0053 | 017F | 53 | C5BF | 0 | 1 | S | ſ |
            | 00B5 | 039C | C2B5 | CE9C | 0 | 1 | µ | Μ |
            | 0345 | 0399 | CD85 | CE99 | 0 | 1 | ͅ | Ι |
            | 0392 | 03D0 | CE92 | CF90 | 0 | 1 | Β | ϐ |
            | 0395 | 03F5 | CE95 | CFB5 | 0 | 1 | Ε | ϵ |
            | 0398 | 03D1 | CE98 | CF91 | 0 | 1 | Θ | ϑ |
            | 0399 | 1FBE | CE99 | E1BEBE | 0 | 1 | Ι | ι |
            | 039A | 03F0 | CE9A | CFB0 | 0 | 1 | Κ | ϰ |
            | 03A0 | 03D6 | CEA0 | CF96 | 0 | 1 | Π | ϖ |
            | 03A1 | 03F1 | CEA1 | CFB1 | 0 | 1 | Ρ | ϱ |
            | 03A3 | 03C2 | CEA3 | CF82 | 0 | 1 | Σ | ς |
            | 03A6 | 03D5 | CEA6 | CF95 | 0 | 1 | Φ | ϕ |
            | 0412 | 1C80 | D092 | E1B280 | 0 | 1 | В | ᲀ |
            | 0414 | 1C81 | D094 | E1B281 | 0 | 1 | Д | ᲁ |
            | 041E | 1C82 | D09E | E1B282 | 0 | 1 | О | ᲂ |
            | 0421 | 1C83 | D0A1 | E1B283 | 0 | 1 | С | ᲃ |
            | 0422 | 1C84 | D0A2 | E1B284 | 0 | 1 | Т | ᲄ |
            | 042A | 1C86 | D0AA | E1B286 | 0 | 1 | Ъ | ᲆ |
            | 0462 | 1C87 | D1A2 | E1B287 | 0 | 1 | Ѣ | ᲇ |
            | 1C88 | A64A | E1B288 | EA998A | 0 | 1 | ᲈ | Ꙋ |
            | 0049 | 0130 | 49 | C4B0 | 1 | 0 | I | İ |
            | 004B | 212A | 4B | E284AA | 1 | 0 | K | K |
            | 00C5 | 212B | C385 | E284AB | 1 | 0 | Å | Å |
            | 00DF | 1E9E | C39F | E1BA9E | 1 | 0 | ß | ẞ |
            | 03A9 | 2126 | CEA9 | E284A6 | 1 | 0 | Ω | Ω |
            +-----------+-----------+--------+--------+----------+----------+-----+-----+
            26 rows in set (0.005 sec)
            {noformat}

            h2. Cons of toupper vs tolower comparison
            - The tolower variant will compare exactly like strcasecmp() did
            - The toupper variant will compare close to how utf8mb3_general_ci works
            bar Alexander Barkov made changes -
            Status Stalled [ 10000 ] In Progress [ 3 ]
            bar Alexander Barkov made changes -
            bar Alexander Barkov made changes -
            bar Alexander Barkov made changes -
            bar Alexander Barkov made changes -
            bar Alexander Barkov made changes -
            bar Alexander Barkov made changes -
            bar Alexander Barkov made changes -
            serg Sergei Golubchik made changes -
            Fix Version/s 11.5 [ 29506 ]
            Fix Version/s 11.4 [ 29301 ]
            bar Alexander Barkov made changes -
            bar Alexander Barkov made changes -
            bar Alexander Barkov made changes -
            Assignee Alexander Barkov [ bar ] Sergei Golubchik [ serg ]
            Status In Progress [ 3 ] In Review [ 10002 ]
            bar Alexander Barkov made changes -
            bar Alexander Barkov made changes -
            bar Alexander Barkov made changes -
            bar Alexander Barkov made changes -
            bar Alexander Barkov made changes -
            bar Alexander Barkov made changes -
            serg Sergei Golubchik made changes -
            Assignee Sergei Golubchik [ serg ] Alexander Barkov [ bar ]
            Status In Review [ 10002 ] Stalled [ 10000 ]
            julien.fritsch Julien Fritsch made changes -
            Issue Type Task [ 3 ] New Feature [ 2 ]
            bar Alexander Barkov made changes -
            Status Stalled [ 10000 ] In Testing [ 10301 ]
            bar Alexander Barkov made changes -
            Assignee Alexander Barkov [ bar ] Ramesh Sivaraman [ JIRAUSER48189 ]
            serg Sergei Golubchik made changes -
            Issue Type New Feature [ 2 ] Task [ 3 ]
            serg Sergei Golubchik made changes -
            ramesh Ramesh Sivaraman made changes -
            ramesh Ramesh Sivaraman made changes -
            Assignee Ramesh Sivaraman [ JIRAUSER48189 ] Alexander Barkov [ bar ]
            Status In Testing [ 10301 ] Stalled [ 10000 ]
            bar Alexander Barkov made changes -
            issue.field.resolutiondate 2024-04-18 18:31:12.0 2024-04-18 18:31:11.662
            bar Alexander Barkov made changes -
            Fix Version/s 11.5.1 [ 29634 ]
            Fix Version/s 11.5 [ 29506 ]
            Resolution Fixed [ 1 ]
            Status Stalled [ 10000 ] Closed [ 6 ]
            bar Alexander Barkov made changes -
            bar Alexander Barkov made changes -
            Description There are two parallel comparison systems in MariaDB collation library, implemented as virtual functions in MY_COLLATION_HANDLER:

            - Comparison according to the collation, provided by these functions
            {code:cpp}
              int (*strnncoll)(CHARSET_INFO *,
                                   const uchar *, size_t, const uchar *, size_t, my_bool);
              int (*strnncollsp)(CHARSET_INFO *,
                                     const uchar *, size_t, const uchar *, size_t);

              int (*strnncollsp_nchars)(CHARSET_INFO *,
                                            const uchar *str1, size_t len1,
                                            const uchar *str2, size_t len2,
                                            size_t nchars,
                                            uint flags);
            {code}


            - Comparison in case insensitive (but accent sensitive) style, implemented by this function:
            {code:cpp}
              int (*strcasecmp)(CHARSET_INFO *, const char *, const char *);
            {code}
            Note, accent and case sensitivity of the collation does not matter. strcasecmp() always works using accent sensitive case insensitive comparison style.

            These two parallel systems are redundant.

            Note, strcasecmp() is used mostly to compare identifiers, while the functions of the first group are used to compare data.

            Let's get rid of the second comparison system:
            1. Remove MY_COLLATION_HANDLER::strcasecmp()
            2. Introduce a new collation utf8mb4_general_as_ci. Note, it should work for the entire Unicode range U+0000..U+10FFFF.
            3. Replace all calls for:
            {code:cpp}
            system_charset_info->coll->strcasecmp()
            {code}
            to calls for
            {code:cpp}
            my_charset_utf8mb4_general_as_ci->coll->strnncoll***()
            {code}

            The change would generally be quite mechanic. However, there is one small problem: strcasecmp() accepts 0-terminated strings, while the strnncoll-alike functions accept the pointer and the length. So some refactoring will be needed. Note, Monty earlier changed many MariaDB C data types to use LEX_CSTRING (instead of just a const char pointer) to store names. So this part of the current task will switch some more C data types to LEX_CSTRING.

            h2. tolower vs toupper comparison

            Another option is to implement utf8mb3_general_as_ci which will compare upper cases (instead of lower cases).


            The difference is in a few dozen BMP characters. This script finds all those characters:
            {code:sql}
            CREATE OR REPLACE TABLE t1 (a CHAR(1) CHARACTER SET utf8mb4 COLLATE utf8mb4_uca1400_ai_ci) ENGINE=MyISAM;
            DELIMITER $$
            FOR i IN 0..0xFFFF
            DO
              INSERT INTO t1 VALUES (CHAR(i USING ucs2));
            END FOR;
            $$
            DELIMITER ;
            ALTER TABLE t1
              ADD has_casefolding INT DEFAULT (BINARY LOWER(a)<>a OR BINARY UPPER(a)<>a),
              ADD KEY(has_casefolding, a);

            CREATE OR REPLACE TABLE t21 AS
            SELECT
              HEX(t1.a) AS hex_a, HEX(t2.a) AS hex_b,
              BINARY LOWER(t1.a)=LOWER(t2.a) eq_lower,
              BINARY UPPER(t1.a)=UPPER(t2.a) AS eq_upper,
              t1.a AS a, t2.a AS b
            FROM
              t1 t1, t1 t2
            WHERE
              t1.has_casefolding=1
            AND (BINARY LOWER(t1.a)=LOWER(t2.a))<>(BINARY UPPER(t1.a)=UPPER(t2.a));

            SELECT
              HEX(CONVERT(a USING ucs2)) AS unicode_a,
              HEX(CONVERT(b USING ucs2)) AS unicode_b,
              t21.* FROM t21;
            {code}
            {noformat}
            +-----------+-----------+--------+--------+----------+----------+------+------+
            | unicode_a | unicode_b | hex_a | hex_b | eq_lower | eq_upper | a | b |
            +-----------+-----------+--------+--------+----------+----------+------+------+
            | 1E9E | 00DF | E1BA9E | C39F | 1 | 0 | ẞ | ß |
            | 0399 | 0345 | CE99 | CD85 | 0 | 1 | Ι | ͅ |
            | 03B9 | 0345 | CEB9 | CD85 | 0 | 1 | ι | ͅ |
            | 1FBE | 0345 | E1BEBE | CD85 | 0 | 1 | ι | ͅ |
            | 212B | 00C5 | E284AB | C385 | 1 | 0 | Å | Å |
            | 212B | 00E5 | E284AB | C3A5 | 1 | 0 | Å | å |
            | 00C5 | 212B | C385 | E284AB | 1 | 0 | Å | Å |
            | 00E5 | 212B | C3A5 | E284AB | 1 | 0 | å | Å |
            | 0130 | 0049 | C4B0 | 49 | 1 | 0 | İ | I |
            | 0131 | 0049 | C4B1 | 49 | 0 | 1 | ı | I |
            | 0130 | 0069 | C4B0 | 69 | 1 | 0 | İ | i |
            | 0131 | 0069 | C4B1 | 69 | 0 | 1 | ı | i |
            | 0049 | 0130 | 49 | C4B0 | 1 | 0 | I | İ |
            | 0069 | 0130 | 69 | C4B0 | 1 | 0 | i | İ |
            | 0049 | 0131 | 49 | C4B1 | 0 | 1 | I | ı |
            | 0069 | 0131 | 69 | C4B1 | 0 | 1 | i | ı |
            | 212A | 004B | E284AA | 4B | 1 | 0 | K | K |
            | 212A | 006B | E284AA | 6B | 1 | 0 | K | k |
            | 004B | 212A | 4B | E284AA | 1 | 0 | K | K |
            | 006B | 212A | 6B | E284AA | 1 | 0 | k | K |
            | 017F | 0053 | C5BF | 53 | 0 | 1 | ſ | S |
            | 017F | 0073 | C5BF | 73 | 0 | 1 | ſ | s |
            | 0053 | 017F | 53 | C5BF | 0 | 1 | S | ſ |
            | 0073 | 017F | 73 | C5BF | 0 | 1 | s | ſ |
            | 1E9B | 1E60 | E1BA9B | E1B9A0 | 0 | 1 | ẛ | Ṡ |
            | 1E9B | 1E61 | E1BA9B | E1B9A1 | 0 | 1 | ẛ | ṡ |
            | 1E60 | 1E9B | E1B9A0 | E1BA9B | 0 | 1 | Ṡ | ẛ |
            | 1E61 | 1E9B | E1B9A1 | E1BA9B | 0 | 1 | ṡ | ẛ |
            | 03D0 | 0392 | CF90 | CE92 | 0 | 1 | ϐ | Β |
            | 03D0 | 03B2 | CF90 | CEB2 | 0 | 1 | ϐ | β |
            | 0392 | 03D0 | CE92 | CF90 | 0 | 1 | Β | ϐ |
            | 03B2 | 03D0 | CEB2 | CF90 | 0 | 1 | β | ϐ |
            | 03F5 | 0395 | CFB5 | CE95 | 0 | 1 | ϵ | Ε |
            | 03F5 | 03B5 | CFB5 | CEB5 | 0 | 1 | ϵ | ε |
            | 0395 | 03F5 | CE95 | CFB5 | 0 | 1 | Ε | ϵ |
            | 03B5 | 03F5 | CEB5 | CFB5 | 0 | 1 | ε | ϵ |
            | 03D1 | 0398 | CF91 | CE98 | 0 | 1 | ϑ | Θ |
            | 03F4 | 0398 | CFB4 | CE98 | 1 | 0 | ϴ | Θ |
            | 03D1 | 03B8 | CF91 | CEB8 | 0 | 1 | ϑ | θ |
            | 03F4 | 03B8 | CFB4 | CEB8 | 1 | 0 | ϴ | θ |
            | 0398 | 03D1 | CE98 | CF91 | 0 | 1 | Θ | ϑ |
            | 03B8 | 03D1 | CEB8 | CF91 | 0 | 1 | θ | ϑ |
            | 0398 | 03F4 | CE98 | CFB4 | 1 | 0 | Θ | ϴ |
            | 03B8 | 03F4 | CEB8 | CFB4 | 1 | 0 | θ | ϴ |
            | 0345 | 0399 | CD85 | CE99 | 0 | 1 | ͅ | Ι |
            | 1FBE | 0399 | E1BEBE | CE99 | 0 | 1 | ι | Ι |
            | 0345 | 03B9 | CD85 | CEB9 | 0 | 1 | ͅ | ι |
            | 1FBE | 03B9 | E1BEBE | CEB9 | 0 | 1 | ι | ι |
            | 0345 | 1FBE | CD85 | E1BEBE | 0 | 1 | ͅ | ι |
            | 0399 | 1FBE | CE99 | E1BEBE | 0 | 1 | Ι | ι |
            | 03B9 | 1FBE | CEB9 | E1BEBE | 0 | 1 | ι | ι |
            | 03F0 | 039A | CFB0 | CE9A | 0 | 1 | ϰ | Κ |
            | 03F0 | 03BA | CFB0 | CEBA | 0 | 1 | ϰ | κ |
            | 039A | 03F0 | CE9A | CFB0 | 0 | 1 | Κ | ϰ |
            | 03BA | 03F0 | CEBA | CFB0 | 0 | 1 | κ | ϰ |
            | 039C | 00B5 | CE9C | C2B5 | 0 | 1 | Μ | µ |
            | 03BC | 00B5 | CEBC | C2B5 | 0 | 1 | μ | µ |
            | 00B5 | 039C | C2B5 | CE9C | 0 | 1 | µ | Μ |
            | 00B5 | 03BC | C2B5 | CEBC | 0 | 1 | µ | μ |
            | 03D6 | 03A0 | CF96 | CEA0 | 0 | 1 | ϖ | Π |
            | 03D6 | 03C0 | CF96 | CF80 | 0 | 1 | ϖ | π |
            | 03A0 | 03D6 | CEA0 | CF96 | 0 | 1 | Π | ϖ |
            | 03C0 | 03D6 | CF80 | CF96 | 0 | 1 | π | ϖ |
            | 03F1 | 03A1 | CFB1 | CEA1 | 0 | 1 | ϱ | Ρ |
            | 03F1 | 03C1 | CFB1 | CF81 | 0 | 1 | ϱ | ρ |
            | 03A1 | 03F1 | CEA1 | CFB1 | 0 | 1 | Ρ | ϱ |
            | 03C1 | 03F1 | CF81 | CFB1 | 0 | 1 | ρ | ϱ |
            | 03C2 | 03A3 | CF82 | CEA3 | 0 | 1 | ς | Σ |
            | 03A3 | 03C2 | CEA3 | CF82 | 0 | 1 | Σ | ς |
            | 03C3 | 03C2 | CF83 | CF82 | 0 | 1 | σ | ς |
            | 03C2 | 03C3 | CF82 | CF83 | 0 | 1 | ς | σ |
            | 03D5 | 03A6 | CF95 | CEA6 | 0 | 1 | ϕ | Φ |
            | 03D5 | 03C6 | CF95 | CF86 | 0 | 1 | ϕ | φ |
            | 03A6 | 03D5 | CEA6 | CF95 | 0 | 1 | Φ | ϕ |
            | 03C6 | 03D5 | CF86 | CF95 | 0 | 1 | φ | ϕ |
            | 2126 | 03A9 | E284A6 | CEA9 | 1 | 0 | Ω | Ω |
            | 2126 | 03C9 | E284A6 | CF89 | 1 | 0 | Ω | ω |
            | 03A9 | 2126 | CEA9 | E284A6 | 1 | 0 | Ω | Ω |
            | 03C9 | 2126 | CF89 | E284A6 | 1 | 0 | ω | Ω |
            | 1C80 | 0412 | E1B280 | D092 | 0 | 1 | ᲀ | В |
            | 1C80 | 0432 | E1B280 | D0B2 | 0 | 1 | ᲀ | в |
            | 0412 | 1C80 | D092 | E1B280 | 0 | 1 | В | ᲀ |
            | 0432 | 1C80 | D0B2 | E1B280 | 0 | 1 | в | ᲀ |
            | 1C81 | 0414 | E1B281 | D094 | 0 | 1 | ᲁ | Д |
            | 1C81 | 0434 | E1B281 | D0B4 | 0 | 1 | ᲁ | д |
            | 0414 | 1C81 | D094 | E1B281 | 0 | 1 | Д | ᲁ |
            | 0434 | 1C81 | D0B4 | E1B281 | 0 | 1 | д | ᲁ |
            | 1C82 | 041E | E1B282 | D09E | 0 | 1 | ᲂ | О |
            | 1C82 | 043E | E1B282 | D0BE | 0 | 1 | ᲂ | о |
            | 041E | 1C82 | D09E | E1B282 | 0 | 1 | О | ᲂ |
            | 043E | 1C82 | D0BE | E1B282 | 0 | 1 | о | ᲂ |
            | 1C83 | 0421 | E1B283 | D0A1 | 0 | 1 | ᲃ | С |
            | 1C83 | 0441 | E1B283 | D181 | 0 | 1 | ᲃ | с |
            | 0421 | 1C83 | D0A1 | E1B283 | 0 | 1 | С | ᲃ |
            | 0441 | 1C83 | D181 | E1B283 | 0 | 1 | с | ᲃ |
            | 1C84 | 0422 | E1B284 | D0A2 | 0 | 1 | ᲄ | Т |
            | 1C85 | 0422 | E1B285 | D0A2 | 0 | 1 | ᲅ | Т |
            | 1C84 | 0442 | E1B284 | D182 | 0 | 1 | ᲄ | т |
            | 1C85 | 0442 | E1B285 | D182 | 0 | 1 | ᲅ | т |
            | 0422 | 1C84 | D0A2 | E1B284 | 0 | 1 | Т | ᲄ |
            | 0442 | 1C84 | D182 | E1B284 | 0 | 1 | т | ᲄ |
            | 1C85 | 1C84 | E1B285 | E1B284 | 0 | 1 | ᲅ | ᲄ |
            | 0422 | 1C85 | D0A2 | E1B285 | 0 | 1 | Т | ᲅ |
            | 0442 | 1C85 | D182 | E1B285 | 0 | 1 | т | ᲅ |
            | 1C84 | 1C85 | E1B284 | E1B285 | 0 | 1 | ᲄ | ᲅ |
            | A64A | 1C88 | EA998A | E1B288 | 0 | 1 | Ꙋ | ᲈ |
            | A64B | 1C88 | EA998B | E1B288 | 0 | 1 | ꙋ | ᲈ |
            | 1C88 | A64A | E1B288 | EA998A | 0 | 1 | ᲈ | Ꙋ |
            | 1C88 | A64B | E1B288 | EA998B | 0 | 1 | ᲈ | ꙋ |
            | 1C86 | 042A | E1B286 | D0AA | 0 | 1 | ᲆ | Ъ |
            | 1C86 | 044A | E1B286 | D18A | 0 | 1 | ᲆ | ъ |
            | 042A | 1C86 | D0AA | E1B286 | 0 | 1 | Ъ | ᲆ |
            | 044A | 1C86 | D18A | E1B286 | 0 | 1 | ъ | ᲆ |
            | 1C87 | 0462 | E1B287 | D1A2 | 0 | 1 | ᲇ | Ѣ |
            | 1C87 | 0463 | E1B287 | D1A3 | 0 | 1 | ᲇ | ѣ |
            | 0462 | 1C87 | D1A2 | E1B287 | 0 | 1 | Ѣ | ᲇ |
            | 0463 | 1C87 | D1A3 | E1B287 | 0 | 1 | ѣ | ᲇ |
            +-----------+-----------+--------+--------+----------+----------+------+------+
            {noformat}


            Note, toupper comparison considers more pairs as equal than tolower comparison:
            {code:sql}
            SELECT SUM(eq_lower), SUM(eq_upper) FROM t21;
            {code}
            {noformat}
            +---------------+---------------+
            | SUM(eq_lower) | SUM(eq_upper) |
            +---------------+---------------+
            | 21 | 96 |
            +---------------+---------------+
            {noformat}

            h2. A more compact table with distinct pairs (the character with a smaller code point is on the left)
            {code:sql}
            SELECT
              HEX(CONVERT(a USING ucs2)) AS unicode_a,
              HEX(CONVERT(b USING ucs2)) AS unicode_b,
              HEX(a), HEX(b),
              BINARY LOWER(a)=LOWER(b) AS eq_lower,
              BINARY UPPER(a)=UPPER(b) AS eq_upper,
              a,b
            FROM
            (
              SELECT DISTINCT IF(BINARY a<b,a,b) AS a,IF(binary a>=b,a,b) AS b from t21
            ) d1
            ORDER BY eq_lower, unicode_a, unicode_b;
            {code}
            {noformat}
            +-----------+-----------+--------+--------+----------+----------+-----+-----+
            | unicode_a | unicode_b | HEX(a) | HEX(b) | eq_lower | eq_upper | a | b |
            +-----------+-----------+--------+--------+----------+----------+-----+-----+
            | 0049 | 0131 | 49 | C4B1 | 0 | 1 | I | ı |
            | 0053 | 017F | 53 | C5BF | 0 | 1 | S | ſ |
            | 00B5 | 039C | C2B5 | CE9C | 0 | 1 | µ | Μ |
            | 0345 | 0399 | CD85 | CE99 | 0 | 1 | ͅ | Ι |
            | 0392 | 03D0 | CE92 | CF90 | 0 | 1 | Β | ϐ |
            | 0395 | 03F5 | CE95 | CFB5 | 0 | 1 | Ε | ϵ |
            | 0398 | 03D1 | CE98 | CF91 | 0 | 1 | Θ | ϑ |
            | 0399 | 1FBE | CE99 | E1BEBE | 0 | 1 | Ι | ι |
            | 039A | 03F0 | CE9A | CFB0 | 0 | 1 | Κ | ϰ |
            | 03A0 | 03D6 | CEA0 | CF96 | 0 | 1 | Π | ϖ |
            | 03A1 | 03F1 | CEA1 | CFB1 | 0 | 1 | Ρ | ϱ |
            | 03A3 | 03C2 | CEA3 | CF82 | 0 | 1 | Σ | ς |
            | 03A6 | 03D5 | CEA6 | CF95 | 0 | 1 | Φ | ϕ |
            | 0412 | 1C80 | D092 | E1B280 | 0 | 1 | В | ᲀ |
            | 0414 | 1C81 | D094 | E1B281 | 0 | 1 | Д | ᲁ |
            | 041E | 1C82 | D09E | E1B282 | 0 | 1 | О | ᲂ |
            | 0421 | 1C83 | D0A1 | E1B283 | 0 | 1 | С | ᲃ |
            | 0422 | 1C84 | D0A2 | E1B284 | 0 | 1 | Т | ᲄ |
            | 042A | 1C86 | D0AA | E1B286 | 0 | 1 | Ъ | ᲆ |
            | 0462 | 1C87 | D1A2 | E1B287 | 0 | 1 | Ѣ | ᲇ |
            | 1C88 | A64A | E1B288 | EA998A | 0 | 1 | ᲈ | Ꙋ |
            | 0049 | 0130 | 49 | C4B0 | 1 | 0 | I | İ |
            | 004B | 212A | 4B | E284AA | 1 | 0 | K | K |
            | 00C5 | 212B | C385 | E284AB | 1 | 0 | Å | Å |
            | 00DF | 1E9E | C39F | E1BA9E | 1 | 0 | ß | ẞ |
            | 03A9 | 2126 | CEA9 | E284A6 | 1 | 0 | Ω | Ω |
            +-----------+-----------+--------+--------+----------+----------+-----+-----+
            26 rows in set (0.005 sec)
            {noformat}

            h2. Cons of toupper vs tolower comparison
            - The tolower variant will compare exactly like strcasecmp() did
            - The toupper variant will compare close to how utf8mb3_general_ci works
            There are two parallel comparison systems in MariaDB collation library, implemented as virtual functions in MY_COLLATION_HANDLER:

            - Comparison according to the collation, provided by these functions
            {code:cpp}
              int (*strnncoll)(CHARSET_INFO *,
                                   const uchar *, size_t, const uchar *, size_t, my_bool);
              int (*strnncollsp)(CHARSET_INFO *,
                                     const uchar *, size_t, const uchar *, size_t);

              int (*strnncollsp_nchars)(CHARSET_INFO *,
                                            const uchar *str1, size_t len1,
                                            const uchar *str2, size_t len2,
                                            size_t nchars,
                                            uint flags);
            {code}


            - Comparison in case insensitive (but accent sensitive) style, implemented by this function:
            {code:cpp}
              int (*strcasecmp)(CHARSET_INFO *, const char *, const char *);
            {code}
            Note, accent and case sensitivity of the collation does not matter. strcasecmp() always works using accent sensitive case insensitive comparison style.

            These two parallel systems are redundant.

            Note, strcasecmp() is used mostly to compare identifiers, while the functions of the first group are used to compare data.

            Let's get rid of the second comparison system:
            1. Remove MY_COLLATION_HANDLER::strcasecmp()
            2. Introduce a new collation utf8mb4_general1400_as_ci. Note, it should work for the entire Unicode range U+0000..U+10FFFF.
            3. Replace all calls for:
            {code:cpp}
            system_charset_info->coll->strcasecmp()
            {code}
            to calls for
            {code:cpp}
            my_charset_utf8mb4_general_as_ci->coll->strnncoll***()
            {code}

            The change would generally be quite mechanic. However, there is one small problem: strcasecmp() accepts 0-terminated strings, while the strnncoll-alike functions accept the pointer and the length. So some refactoring will be needed. Note, Monty earlier changed many MariaDB C data types to use LEX_CSTRING (instead of just a const char pointer) to store names. So this part of the current task will switch some more C data types to LEX_CSTRING.

            h2. tolower vs toupper comparison

            Another option is to implement utf8mb3_general_as_ci which will compare upper cases (instead of lower cases).


            The difference is in a few dozen BMP characters. This script finds all those characters:
            {code:sql}
            CREATE OR REPLACE TABLE t1 (a CHAR(1) CHARACTER SET utf8mb4 COLLATE utf8mb4_uca1400_ai_ci) ENGINE=MyISAM;
            DELIMITER $$
            FOR i IN 0..0xFFFF
            DO
              INSERT INTO t1 VALUES (CHAR(i USING ucs2));
            END FOR;
            $$
            DELIMITER ;
            ALTER TABLE t1
              ADD has_casefolding INT DEFAULT (BINARY LOWER(a)<>a OR BINARY UPPER(a)<>a),
              ADD KEY(has_casefolding, a);

            CREATE OR REPLACE TABLE t21 AS
            SELECT
              HEX(t1.a) AS hex_a, HEX(t2.a) AS hex_b,
              BINARY LOWER(t1.a)=LOWER(t2.a) eq_lower,
              BINARY UPPER(t1.a)=UPPER(t2.a) AS eq_upper,
              t1.a AS a, t2.a AS b
            FROM
              t1 t1, t1 t2
            WHERE
              t1.has_casefolding=1
            AND (BINARY LOWER(t1.a)=LOWER(t2.a))<>(BINARY UPPER(t1.a)=UPPER(t2.a));

            SELECT
              HEX(CONVERT(a USING ucs2)) AS unicode_a,
              HEX(CONVERT(b USING ucs2)) AS unicode_b,
              t21.* FROM t21;
            {code}
            {noformat}
            +-----------+-----------+--------+--------+----------+----------+------+------+
            | unicode_a | unicode_b | hex_a | hex_b | eq_lower | eq_upper | a | b |
            +-----------+-----------+--------+--------+----------+----------+------+------+
            | 1E9E | 00DF | E1BA9E | C39F | 1 | 0 | ẞ | ß |
            | 0399 | 0345 | CE99 | CD85 | 0 | 1 | Ι | ͅ |
            | 03B9 | 0345 | CEB9 | CD85 | 0 | 1 | ι | ͅ |
            | 1FBE | 0345 | E1BEBE | CD85 | 0 | 1 | ι | ͅ |
            | 212B | 00C5 | E284AB | C385 | 1 | 0 | Å | Å |
            | 212B | 00E5 | E284AB | C3A5 | 1 | 0 | Å | å |
            | 00C5 | 212B | C385 | E284AB | 1 | 0 | Å | Å |
            | 00E5 | 212B | C3A5 | E284AB | 1 | 0 | å | Å |
            | 0130 | 0049 | C4B0 | 49 | 1 | 0 | İ | I |
            | 0131 | 0049 | C4B1 | 49 | 0 | 1 | ı | I |
            | 0130 | 0069 | C4B0 | 69 | 1 | 0 | İ | i |
            | 0131 | 0069 | C4B1 | 69 | 0 | 1 | ı | i |
            | 0049 | 0130 | 49 | C4B0 | 1 | 0 | I | İ |
            | 0069 | 0130 | 69 | C4B0 | 1 | 0 | i | İ |
            | 0049 | 0131 | 49 | C4B1 | 0 | 1 | I | ı |
            | 0069 | 0131 | 69 | C4B1 | 0 | 1 | i | ı |
            | 212A | 004B | E284AA | 4B | 1 | 0 | K | K |
            | 212A | 006B | E284AA | 6B | 1 | 0 | K | k |
            | 004B | 212A | 4B | E284AA | 1 | 0 | K | K |
            | 006B | 212A | 6B | E284AA | 1 | 0 | k | K |
            | 017F | 0053 | C5BF | 53 | 0 | 1 | ſ | S |
            | 017F | 0073 | C5BF | 73 | 0 | 1 | ſ | s |
            | 0053 | 017F | 53 | C5BF | 0 | 1 | S | ſ |
            | 0073 | 017F | 73 | C5BF | 0 | 1 | s | ſ |
            | 1E9B | 1E60 | E1BA9B | E1B9A0 | 0 | 1 | ẛ | Ṡ |
            | 1E9B | 1E61 | E1BA9B | E1B9A1 | 0 | 1 | ẛ | ṡ |
            | 1E60 | 1E9B | E1B9A0 | E1BA9B | 0 | 1 | Ṡ | ẛ |
            | 1E61 | 1E9B | E1B9A1 | E1BA9B | 0 | 1 | ṡ | ẛ |
            | 03D0 | 0392 | CF90 | CE92 | 0 | 1 | ϐ | Β |
            | 03D0 | 03B2 | CF90 | CEB2 | 0 | 1 | ϐ | β |
            | 0392 | 03D0 | CE92 | CF90 | 0 | 1 | Β | ϐ |
            | 03B2 | 03D0 | CEB2 | CF90 | 0 | 1 | β | ϐ |
            | 03F5 | 0395 | CFB5 | CE95 | 0 | 1 | ϵ | Ε |
            | 03F5 | 03B5 | CFB5 | CEB5 | 0 | 1 | ϵ | ε |
            | 0395 | 03F5 | CE95 | CFB5 | 0 | 1 | Ε | ϵ |
            | 03B5 | 03F5 | CEB5 | CFB5 | 0 | 1 | ε | ϵ |
            | 03D1 | 0398 | CF91 | CE98 | 0 | 1 | ϑ | Θ |
            | 03F4 | 0398 | CFB4 | CE98 | 1 | 0 | ϴ | Θ |
            | 03D1 | 03B8 | CF91 | CEB8 | 0 | 1 | ϑ | θ |
            | 03F4 | 03B8 | CFB4 | CEB8 | 1 | 0 | ϴ | θ |
            | 0398 | 03D1 | CE98 | CF91 | 0 | 1 | Θ | ϑ |
            | 03B8 | 03D1 | CEB8 | CF91 | 0 | 1 | θ | ϑ |
            | 0398 | 03F4 | CE98 | CFB4 | 1 | 0 | Θ | ϴ |
            | 03B8 | 03F4 | CEB8 | CFB4 | 1 | 0 | θ | ϴ |
            | 0345 | 0399 | CD85 | CE99 | 0 | 1 | ͅ | Ι |
            | 1FBE | 0399 | E1BEBE | CE99 | 0 | 1 | ι | Ι |
            | 0345 | 03B9 | CD85 | CEB9 | 0 | 1 | ͅ | ι |
            | 1FBE | 03B9 | E1BEBE | CEB9 | 0 | 1 | ι | ι |
            | 0345 | 1FBE | CD85 | E1BEBE | 0 | 1 | ͅ | ι |
            | 0399 | 1FBE | CE99 | E1BEBE | 0 | 1 | Ι | ι |
            | 03B9 | 1FBE | CEB9 | E1BEBE | 0 | 1 | ι | ι |
            | 03F0 | 039A | CFB0 | CE9A | 0 | 1 | ϰ | Κ |
            | 03F0 | 03BA | CFB0 | CEBA | 0 | 1 | ϰ | κ |
            | 039A | 03F0 | CE9A | CFB0 | 0 | 1 | Κ | ϰ |
            | 03BA | 03F0 | CEBA | CFB0 | 0 | 1 | κ | ϰ |
            | 039C | 00B5 | CE9C | C2B5 | 0 | 1 | Μ | µ |
            | 03BC | 00B5 | CEBC | C2B5 | 0 | 1 | μ | µ |
            | 00B5 | 039C | C2B5 | CE9C | 0 | 1 | µ | Μ |
            | 00B5 | 03BC | C2B5 | CEBC | 0 | 1 | µ | μ |
            | 03D6 | 03A0 | CF96 | CEA0 | 0 | 1 | ϖ | Π |
            | 03D6 | 03C0 | CF96 | CF80 | 0 | 1 | ϖ | π |
            | 03A0 | 03D6 | CEA0 | CF96 | 0 | 1 | Π | ϖ |
            | 03C0 | 03D6 | CF80 | CF96 | 0 | 1 | π | ϖ |
            | 03F1 | 03A1 | CFB1 | CEA1 | 0 | 1 | ϱ | Ρ |
            | 03F1 | 03C1 | CFB1 | CF81 | 0 | 1 | ϱ | ρ |
            | 03A1 | 03F1 | CEA1 | CFB1 | 0 | 1 | Ρ | ϱ |
            | 03C1 | 03F1 | CF81 | CFB1 | 0 | 1 | ρ | ϱ |
            | 03C2 | 03A3 | CF82 | CEA3 | 0 | 1 | ς | Σ |
            | 03A3 | 03C2 | CEA3 | CF82 | 0 | 1 | Σ | ς |
            | 03C3 | 03C2 | CF83 | CF82 | 0 | 1 | σ | ς |
            | 03C2 | 03C3 | CF82 | CF83 | 0 | 1 | ς | σ |
            | 03D5 | 03A6 | CF95 | CEA6 | 0 | 1 | ϕ | Φ |
            | 03D5 | 03C6 | CF95 | CF86 | 0 | 1 | ϕ | φ |
            | 03A6 | 03D5 | CEA6 | CF95 | 0 | 1 | Φ | ϕ |
            | 03C6 | 03D5 | CF86 | CF95 | 0 | 1 | φ | ϕ |
            | 2126 | 03A9 | E284A6 | CEA9 | 1 | 0 | Ω | Ω |
            | 2126 | 03C9 | E284A6 | CF89 | 1 | 0 | Ω | ω |
            | 03A9 | 2126 | CEA9 | E284A6 | 1 | 0 | Ω | Ω |
            | 03C9 | 2126 | CF89 | E284A6 | 1 | 0 | ω | Ω |
            | 1C80 | 0412 | E1B280 | D092 | 0 | 1 | ᲀ | В |
            | 1C80 | 0432 | E1B280 | D0B2 | 0 | 1 | ᲀ | в |
            | 0412 | 1C80 | D092 | E1B280 | 0 | 1 | В | ᲀ |
            | 0432 | 1C80 | D0B2 | E1B280 | 0 | 1 | в | ᲀ |
            | 1C81 | 0414 | E1B281 | D094 | 0 | 1 | ᲁ | Д |
            | 1C81 | 0434 | E1B281 | D0B4 | 0 | 1 | ᲁ | д |
            | 0414 | 1C81 | D094 | E1B281 | 0 | 1 | Д | ᲁ |
            | 0434 | 1C81 | D0B4 | E1B281 | 0 | 1 | д | ᲁ |
            | 1C82 | 041E | E1B282 | D09E | 0 | 1 | ᲂ | О |
            | 1C82 | 043E | E1B282 | D0BE | 0 | 1 | ᲂ | о |
            | 041E | 1C82 | D09E | E1B282 | 0 | 1 | О | ᲂ |
            | 043E | 1C82 | D0BE | E1B282 | 0 | 1 | о | ᲂ |
            | 1C83 | 0421 | E1B283 | D0A1 | 0 | 1 | ᲃ | С |
            | 1C83 | 0441 | E1B283 | D181 | 0 | 1 | ᲃ | с |
            | 0421 | 1C83 | D0A1 | E1B283 | 0 | 1 | С | ᲃ |
            | 0441 | 1C83 | D181 | E1B283 | 0 | 1 | с | ᲃ |
            | 1C84 | 0422 | E1B284 | D0A2 | 0 | 1 | ᲄ | Т |
            | 1C85 | 0422 | E1B285 | D0A2 | 0 | 1 | ᲅ | Т |
            | 1C84 | 0442 | E1B284 | D182 | 0 | 1 | ᲄ | т |
            | 1C85 | 0442 | E1B285 | D182 | 0 | 1 | ᲅ | т |
            | 0422 | 1C84 | D0A2 | E1B284 | 0 | 1 | Т | ᲄ |
            | 0442 | 1C84 | D182 | E1B284 | 0 | 1 | т | ᲄ |
            | 1C85 | 1C84 | E1B285 | E1B284 | 0 | 1 | ᲅ | ᲄ |
            | 0422 | 1C85 | D0A2 | E1B285 | 0 | 1 | Т | ᲅ |
            | 0442 | 1C85 | D182 | E1B285 | 0 | 1 | т | ᲅ |
            | 1C84 | 1C85 | E1B284 | E1B285 | 0 | 1 | ᲄ | ᲅ |
            | A64A | 1C88 | EA998A | E1B288 | 0 | 1 | Ꙋ | ᲈ |
            | A64B | 1C88 | EA998B | E1B288 | 0 | 1 | ꙋ | ᲈ |
            | 1C88 | A64A | E1B288 | EA998A | 0 | 1 | ᲈ | Ꙋ |
            | 1C88 | A64B | E1B288 | EA998B | 0 | 1 | ᲈ | ꙋ |
            | 1C86 | 042A | E1B286 | D0AA | 0 | 1 | ᲆ | Ъ |
            | 1C86 | 044A | E1B286 | D18A | 0 | 1 | ᲆ | ъ |
            | 042A | 1C86 | D0AA | E1B286 | 0 | 1 | Ъ | ᲆ |
            | 044A | 1C86 | D18A | E1B286 | 0 | 1 | ъ | ᲆ |
            | 1C87 | 0462 | E1B287 | D1A2 | 0 | 1 | ᲇ | Ѣ |
            | 1C87 | 0463 | E1B287 | D1A3 | 0 | 1 | ᲇ | ѣ |
            | 0462 | 1C87 | D1A2 | E1B287 | 0 | 1 | Ѣ | ᲇ |
            | 0463 | 1C87 | D1A3 | E1B287 | 0 | 1 | ѣ | ᲇ |
            +-----------+-----------+--------+--------+----------+----------+------+------+
            {noformat}


            Note, toupper comparison considers more pairs as equal than tolower comparison:
            {code:sql}
            SELECT SUM(eq_lower), SUM(eq_upper) FROM t21;
            {code}
            {noformat}
            +---------------+---------------+
            | SUM(eq_lower) | SUM(eq_upper) |
            +---------------+---------------+
            | 21 | 96 |
            +---------------+---------------+
            {noformat}

            h2. A more compact table with distinct pairs (the character with a smaller code point is on the left)
            {code:sql}
            SELECT
              HEX(CONVERT(a USING ucs2)) AS unicode_a,
              HEX(CONVERT(b USING ucs2)) AS unicode_b,
              HEX(a), HEX(b),
              BINARY LOWER(a)=LOWER(b) AS eq_lower,
              BINARY UPPER(a)=UPPER(b) AS eq_upper,
              a,b
            FROM
            (
              SELECT DISTINCT IF(BINARY a<b,a,b) AS a,IF(binary a>=b,a,b) AS b from t21
            ) d1
            ORDER BY eq_lower, unicode_a, unicode_b;
            {code}
            {noformat}
            +-----------+-----------+--------+--------+----------+----------+-----+-----+
            | unicode_a | unicode_b | HEX(a) | HEX(b) | eq_lower | eq_upper | a | b |
            +-----------+-----------+--------+--------+----------+----------+-----+-----+
            | 0049 | 0131 | 49 | C4B1 | 0 | 1 | I | ı |
            | 0053 | 017F | 53 | C5BF | 0 | 1 | S | ſ |
            | 00B5 | 039C | C2B5 | CE9C | 0 | 1 | µ | Μ |
            | 0345 | 0399 | CD85 | CE99 | 0 | 1 | ͅ | Ι |
            | 0392 | 03D0 | CE92 | CF90 | 0 | 1 | Β | ϐ |
            | 0395 | 03F5 | CE95 | CFB5 | 0 | 1 | Ε | ϵ |
            | 0398 | 03D1 | CE98 | CF91 | 0 | 1 | Θ | ϑ |
            | 0399 | 1FBE | CE99 | E1BEBE | 0 | 1 | Ι | ι |
            | 039A | 03F0 | CE9A | CFB0 | 0 | 1 | Κ | ϰ |
            | 03A0 | 03D6 | CEA0 | CF96 | 0 | 1 | Π | ϖ |
            | 03A1 | 03F1 | CEA1 | CFB1 | 0 | 1 | Ρ | ϱ |
            | 03A3 | 03C2 | CEA3 | CF82 | 0 | 1 | Σ | ς |
            | 03A6 | 03D5 | CEA6 | CF95 | 0 | 1 | Φ | ϕ |
            | 0412 | 1C80 | D092 | E1B280 | 0 | 1 | В | ᲀ |
            | 0414 | 1C81 | D094 | E1B281 | 0 | 1 | Д | ᲁ |
            | 041E | 1C82 | D09E | E1B282 | 0 | 1 | О | ᲂ |
            | 0421 | 1C83 | D0A1 | E1B283 | 0 | 1 | С | ᲃ |
            | 0422 | 1C84 | D0A2 | E1B284 | 0 | 1 | Т | ᲄ |
            | 042A | 1C86 | D0AA | E1B286 | 0 | 1 | Ъ | ᲆ |
            | 0462 | 1C87 | D1A2 | E1B287 | 0 | 1 | Ѣ | ᲇ |
            | 1C88 | A64A | E1B288 | EA998A | 0 | 1 | ᲈ | Ꙋ |
            | 0049 | 0130 | 49 | C4B0 | 1 | 0 | I | İ |
            | 004B | 212A | 4B | E284AA | 1 | 0 | K | K |
            | 00C5 | 212B | C385 | E284AB | 1 | 0 | Å | Å |
            | 00DF | 1E9E | C39F | E1BA9E | 1 | 0 | ß | ẞ |
            | 03A9 | 2126 | CEA9 | E284A6 | 1 | 0 | Ω | Ω |
            +-----------+-----------+--------+--------+----------+----------+-----+-----+
            26 rows in set (0.005 sec)
            {noformat}

            h2. Cons of toupper vs tolower comparison
            - The tolower variant will compare exactly like strcasecmp() did
            - The toupper variant will compare close to how utf8mb3_general_ci works
            bar Alexander Barkov made changes -
            Description There are two parallel comparison systems in MariaDB collation library, implemented as virtual functions in MY_COLLATION_HANDLER:

            - Comparison according to the collation, provided by these functions
            {code:cpp}
              int (*strnncoll)(CHARSET_INFO *,
                                   const uchar *, size_t, const uchar *, size_t, my_bool);
              int (*strnncollsp)(CHARSET_INFO *,
                                     const uchar *, size_t, const uchar *, size_t);

              int (*strnncollsp_nchars)(CHARSET_INFO *,
                                            const uchar *str1, size_t len1,
                                            const uchar *str2, size_t len2,
                                            size_t nchars,
                                            uint flags);
            {code}


            - Comparison in case insensitive (but accent sensitive) style, implemented by this function:
            {code:cpp}
              int (*strcasecmp)(CHARSET_INFO *, const char *, const char *);
            {code}
            Note, accent and case sensitivity of the collation does not matter. strcasecmp() always works using accent sensitive case insensitive comparison style.

            These two parallel systems are redundant.

            Note, strcasecmp() is used mostly to compare identifiers, while the functions of the first group are used to compare data.

            Let's get rid of the second comparison system:
            1. Remove MY_COLLATION_HANDLER::strcasecmp()
            2. Introduce a new collation utf8mb4_general1400_as_ci. Note, it should work for the entire Unicode range U+0000..U+10FFFF.
            3. Replace all calls for:
            {code:cpp}
            system_charset_info->coll->strcasecmp()
            {code}
            to calls for
            {code:cpp}
            my_charset_utf8mb4_general_as_ci->coll->strnncoll***()
            {code}

            The change would generally be quite mechanic. However, there is one small problem: strcasecmp() accepts 0-terminated strings, while the strnncoll-alike functions accept the pointer and the length. So some refactoring will be needed. Note, Monty earlier changed many MariaDB C data types to use LEX_CSTRING (instead of just a const char pointer) to store names. So this part of the current task will switch some more C data types to LEX_CSTRING.

            h2. tolower vs toupper comparison

            Another option is to implement utf8mb3_general_as_ci which will compare upper cases (instead of lower cases).


            The difference is in a few dozen BMP characters. This script finds all those characters:
            {code:sql}
            CREATE OR REPLACE TABLE t1 (a CHAR(1) CHARACTER SET utf8mb4 COLLATE utf8mb4_uca1400_ai_ci) ENGINE=MyISAM;
            DELIMITER $$
            FOR i IN 0..0xFFFF
            DO
              INSERT INTO t1 VALUES (CHAR(i USING ucs2));
            END FOR;
            $$
            DELIMITER ;
            ALTER TABLE t1
              ADD has_casefolding INT DEFAULT (BINARY LOWER(a)<>a OR BINARY UPPER(a)<>a),
              ADD KEY(has_casefolding, a);

            CREATE OR REPLACE TABLE t21 AS
            SELECT
              HEX(t1.a) AS hex_a, HEX(t2.a) AS hex_b,
              BINARY LOWER(t1.a)=LOWER(t2.a) eq_lower,
              BINARY UPPER(t1.a)=UPPER(t2.a) AS eq_upper,
              t1.a AS a, t2.a AS b
            FROM
              t1 t1, t1 t2
            WHERE
              t1.has_casefolding=1
            AND (BINARY LOWER(t1.a)=LOWER(t2.a))<>(BINARY UPPER(t1.a)=UPPER(t2.a));

            SELECT
              HEX(CONVERT(a USING ucs2)) AS unicode_a,
              HEX(CONVERT(b USING ucs2)) AS unicode_b,
              t21.* FROM t21;
            {code}
            {noformat}
            +-----------+-----------+--------+--------+----------+----------+------+------+
            | unicode_a | unicode_b | hex_a | hex_b | eq_lower | eq_upper | a | b |
            +-----------+-----------+--------+--------+----------+----------+------+------+
            | 1E9E | 00DF | E1BA9E | C39F | 1 | 0 | ẞ | ß |
            | 0399 | 0345 | CE99 | CD85 | 0 | 1 | Ι | ͅ |
            | 03B9 | 0345 | CEB9 | CD85 | 0 | 1 | ι | ͅ |
            | 1FBE | 0345 | E1BEBE | CD85 | 0 | 1 | ι | ͅ |
            | 212B | 00C5 | E284AB | C385 | 1 | 0 | Å | Å |
            | 212B | 00E5 | E284AB | C3A5 | 1 | 0 | Å | å |
            | 00C5 | 212B | C385 | E284AB | 1 | 0 | Å | Å |
            | 00E5 | 212B | C3A5 | E284AB | 1 | 0 | å | Å |
            | 0130 | 0049 | C4B0 | 49 | 1 | 0 | İ | I |
            | 0131 | 0049 | C4B1 | 49 | 0 | 1 | ı | I |
            | 0130 | 0069 | C4B0 | 69 | 1 | 0 | İ | i |
            | 0131 | 0069 | C4B1 | 69 | 0 | 1 | ı | i |
            | 0049 | 0130 | 49 | C4B0 | 1 | 0 | I | İ |
            | 0069 | 0130 | 69 | C4B0 | 1 | 0 | i | İ |
            | 0049 | 0131 | 49 | C4B1 | 0 | 1 | I | ı |
            | 0069 | 0131 | 69 | C4B1 | 0 | 1 | i | ı |
            | 212A | 004B | E284AA | 4B | 1 | 0 | K | K |
            | 212A | 006B | E284AA | 6B | 1 | 0 | K | k |
            | 004B | 212A | 4B | E284AA | 1 | 0 | K | K |
            | 006B | 212A | 6B | E284AA | 1 | 0 | k | K |
            | 017F | 0053 | C5BF | 53 | 0 | 1 | ſ | S |
            | 017F | 0073 | C5BF | 73 | 0 | 1 | ſ | s |
            | 0053 | 017F | 53 | C5BF | 0 | 1 | S | ſ |
            | 0073 | 017F | 73 | C5BF | 0 | 1 | s | ſ |
            | 1E9B | 1E60 | E1BA9B | E1B9A0 | 0 | 1 | ẛ | Ṡ |
            | 1E9B | 1E61 | E1BA9B | E1B9A1 | 0 | 1 | ẛ | ṡ |
            | 1E60 | 1E9B | E1B9A0 | E1BA9B | 0 | 1 | Ṡ | ẛ |
            | 1E61 | 1E9B | E1B9A1 | E1BA9B | 0 | 1 | ṡ | ẛ |
            | 03D0 | 0392 | CF90 | CE92 | 0 | 1 | ϐ | Β |
            | 03D0 | 03B2 | CF90 | CEB2 | 0 | 1 | ϐ | β |
            | 0392 | 03D0 | CE92 | CF90 | 0 | 1 | Β | ϐ |
            | 03B2 | 03D0 | CEB2 | CF90 | 0 | 1 | β | ϐ |
            | 03F5 | 0395 | CFB5 | CE95 | 0 | 1 | ϵ | Ε |
            | 03F5 | 03B5 | CFB5 | CEB5 | 0 | 1 | ϵ | ε |
            | 0395 | 03F5 | CE95 | CFB5 | 0 | 1 | Ε | ϵ |
            | 03B5 | 03F5 | CEB5 | CFB5 | 0 | 1 | ε | ϵ |
            | 03D1 | 0398 | CF91 | CE98 | 0 | 1 | ϑ | Θ |
            | 03F4 | 0398 | CFB4 | CE98 | 1 | 0 | ϴ | Θ |
            | 03D1 | 03B8 | CF91 | CEB8 | 0 | 1 | ϑ | θ |
            | 03F4 | 03B8 | CFB4 | CEB8 | 1 | 0 | ϴ | θ |
            | 0398 | 03D1 | CE98 | CF91 | 0 | 1 | Θ | ϑ |
            | 03B8 | 03D1 | CEB8 | CF91 | 0 | 1 | θ | ϑ |
            | 0398 | 03F4 | CE98 | CFB4 | 1 | 0 | Θ | ϴ |
            | 03B8 | 03F4 | CEB8 | CFB4 | 1 | 0 | θ | ϴ |
            | 0345 | 0399 | CD85 | CE99 | 0 | 1 | ͅ | Ι |
            | 1FBE | 0399 | E1BEBE | CE99 | 0 | 1 | ι | Ι |
            | 0345 | 03B9 | CD85 | CEB9 | 0 | 1 | ͅ | ι |
            | 1FBE | 03B9 | E1BEBE | CEB9 | 0 | 1 | ι | ι |
            | 0345 | 1FBE | CD85 | E1BEBE | 0 | 1 | ͅ | ι |
            | 0399 | 1FBE | CE99 | E1BEBE | 0 | 1 | Ι | ι |
            | 03B9 | 1FBE | CEB9 | E1BEBE | 0 | 1 | ι | ι |
            | 03F0 | 039A | CFB0 | CE9A | 0 | 1 | ϰ | Κ |
            | 03F0 | 03BA | CFB0 | CEBA | 0 | 1 | ϰ | κ |
            | 039A | 03F0 | CE9A | CFB0 | 0 | 1 | Κ | ϰ |
            | 03BA | 03F0 | CEBA | CFB0 | 0 | 1 | κ | ϰ |
            | 039C | 00B5 | CE9C | C2B5 | 0 | 1 | Μ | µ |
            | 03BC | 00B5 | CEBC | C2B5 | 0 | 1 | μ | µ |
            | 00B5 | 039C | C2B5 | CE9C | 0 | 1 | µ | Μ |
            | 00B5 | 03BC | C2B5 | CEBC | 0 | 1 | µ | μ |
            | 03D6 | 03A0 | CF96 | CEA0 | 0 | 1 | ϖ | Π |
            | 03D6 | 03C0 | CF96 | CF80 | 0 | 1 | ϖ | π |
            | 03A0 | 03D6 | CEA0 | CF96 | 0 | 1 | Π | ϖ |
            | 03C0 | 03D6 | CF80 | CF96 | 0 | 1 | π | ϖ |
            | 03F1 | 03A1 | CFB1 | CEA1 | 0 | 1 | ϱ | Ρ |
            | 03F1 | 03C1 | CFB1 | CF81 | 0 | 1 | ϱ | ρ |
            | 03A1 | 03F1 | CEA1 | CFB1 | 0 | 1 | Ρ | ϱ |
            | 03C1 | 03F1 | CF81 | CFB1 | 0 | 1 | ρ | ϱ |
            | 03C2 | 03A3 | CF82 | CEA3 | 0 | 1 | ς | Σ |
            | 03A3 | 03C2 | CEA3 | CF82 | 0 | 1 | Σ | ς |
            | 03C3 | 03C2 | CF83 | CF82 | 0 | 1 | σ | ς |
            | 03C2 | 03C3 | CF82 | CF83 | 0 | 1 | ς | σ |
            | 03D5 | 03A6 | CF95 | CEA6 | 0 | 1 | ϕ | Φ |
            | 03D5 | 03C6 | CF95 | CF86 | 0 | 1 | ϕ | φ |
            | 03A6 | 03D5 | CEA6 | CF95 | 0 | 1 | Φ | ϕ |
            | 03C6 | 03D5 | CF86 | CF95 | 0 | 1 | φ | ϕ |
            | 2126 | 03A9 | E284A6 | CEA9 | 1 | 0 | Ω | Ω |
            | 2126 | 03C9 | E284A6 | CF89 | 1 | 0 | Ω | ω |
            | 03A9 | 2126 | CEA9 | E284A6 | 1 | 0 | Ω | Ω |
            | 03C9 | 2126 | CF89 | E284A6 | 1 | 0 | ω | Ω |
            | 1C80 | 0412 | E1B280 | D092 | 0 | 1 | ᲀ | В |
            | 1C80 | 0432 | E1B280 | D0B2 | 0 | 1 | ᲀ | в |
            | 0412 | 1C80 | D092 | E1B280 | 0 | 1 | В | ᲀ |
            | 0432 | 1C80 | D0B2 | E1B280 | 0 | 1 | в | ᲀ |
            | 1C81 | 0414 | E1B281 | D094 | 0 | 1 | ᲁ | Д |
            | 1C81 | 0434 | E1B281 | D0B4 | 0 | 1 | ᲁ | д |
            | 0414 | 1C81 | D094 | E1B281 | 0 | 1 | Д | ᲁ |
            | 0434 | 1C81 | D0B4 | E1B281 | 0 | 1 | д | ᲁ |
            | 1C82 | 041E | E1B282 | D09E | 0 | 1 | ᲂ | О |
            | 1C82 | 043E | E1B282 | D0BE | 0 | 1 | ᲂ | о |
            | 041E | 1C82 | D09E | E1B282 | 0 | 1 | О | ᲂ |
            | 043E | 1C82 | D0BE | E1B282 | 0 | 1 | о | ᲂ |
            | 1C83 | 0421 | E1B283 | D0A1 | 0 | 1 | ᲃ | С |
            | 1C83 | 0441 | E1B283 | D181 | 0 | 1 | ᲃ | с |
            | 0421 | 1C83 | D0A1 | E1B283 | 0 | 1 | С | ᲃ |
            | 0441 | 1C83 | D181 | E1B283 | 0 | 1 | с | ᲃ |
            | 1C84 | 0422 | E1B284 | D0A2 | 0 | 1 | ᲄ | Т |
            | 1C85 | 0422 | E1B285 | D0A2 | 0 | 1 | ᲅ | Т |
            | 1C84 | 0442 | E1B284 | D182 | 0 | 1 | ᲄ | т |
            | 1C85 | 0442 | E1B285 | D182 | 0 | 1 | ᲅ | т |
            | 0422 | 1C84 | D0A2 | E1B284 | 0 | 1 | Т | ᲄ |
            | 0442 | 1C84 | D182 | E1B284 | 0 | 1 | т | ᲄ |
            | 1C85 | 1C84 | E1B285 | E1B284 | 0 | 1 | ᲅ | ᲄ |
            | 0422 | 1C85 | D0A2 | E1B285 | 0 | 1 | Т | ᲅ |
            | 0442 | 1C85 | D182 | E1B285 | 0 | 1 | т | ᲅ |
            | 1C84 | 1C85 | E1B284 | E1B285 | 0 | 1 | ᲄ | ᲅ |
            | A64A | 1C88 | EA998A | E1B288 | 0 | 1 | Ꙋ | ᲈ |
            | A64B | 1C88 | EA998B | E1B288 | 0 | 1 | ꙋ | ᲈ |
            | 1C88 | A64A | E1B288 | EA998A | 0 | 1 | ᲈ | Ꙋ |
            | 1C88 | A64B | E1B288 | EA998B | 0 | 1 | ᲈ | ꙋ |
            | 1C86 | 042A | E1B286 | D0AA | 0 | 1 | ᲆ | Ъ |
            | 1C86 | 044A | E1B286 | D18A | 0 | 1 | ᲆ | ъ |
            | 042A | 1C86 | D0AA | E1B286 | 0 | 1 | Ъ | ᲆ |
            | 044A | 1C86 | D18A | E1B286 | 0 | 1 | ъ | ᲆ |
            | 1C87 | 0462 | E1B287 | D1A2 | 0 | 1 | ᲇ | Ѣ |
            | 1C87 | 0463 | E1B287 | D1A3 | 0 | 1 | ᲇ | ѣ |
            | 0462 | 1C87 | D1A2 | E1B287 | 0 | 1 | Ѣ | ᲇ |
            | 0463 | 1C87 | D1A3 | E1B287 | 0 | 1 | ѣ | ᲇ |
            +-----------+-----------+--------+--------+----------+----------+------+------+
            {noformat}


            Note, toupper comparison considers more pairs as equal than tolower comparison:
            {code:sql}
            SELECT SUM(eq_lower), SUM(eq_upper) FROM t21;
            {code}
            {noformat}
            +---------------+---------------+
            | SUM(eq_lower) | SUM(eq_upper) |
            +---------------+---------------+
            | 21 | 96 |
            +---------------+---------------+
            {noformat}

            h2. A more compact table with distinct pairs (the character with a smaller code point is on the left)
            {code:sql}
            SELECT
              HEX(CONVERT(a USING ucs2)) AS unicode_a,
              HEX(CONVERT(b USING ucs2)) AS unicode_b,
              HEX(a), HEX(b),
              BINARY LOWER(a)=LOWER(b) AS eq_lower,
              BINARY UPPER(a)=UPPER(b) AS eq_upper,
              a,b
            FROM
            (
              SELECT DISTINCT IF(BINARY a<b,a,b) AS a,IF(binary a>=b,a,b) AS b from t21
            ) d1
            ORDER BY eq_lower, unicode_a, unicode_b;
            {code}
            {noformat}
            +-----------+-----------+--------+--------+----------+----------+-----+-----+
            | unicode_a | unicode_b | HEX(a) | HEX(b) | eq_lower | eq_upper | a | b |
            +-----------+-----------+--------+--------+----------+----------+-----+-----+
            | 0049 | 0131 | 49 | C4B1 | 0 | 1 | I | ı |
            | 0053 | 017F | 53 | C5BF | 0 | 1 | S | ſ |
            | 00B5 | 039C | C2B5 | CE9C | 0 | 1 | µ | Μ |
            | 0345 | 0399 | CD85 | CE99 | 0 | 1 | ͅ | Ι |
            | 0392 | 03D0 | CE92 | CF90 | 0 | 1 | Β | ϐ |
            | 0395 | 03F5 | CE95 | CFB5 | 0 | 1 | Ε | ϵ |
            | 0398 | 03D1 | CE98 | CF91 | 0 | 1 | Θ | ϑ |
            | 0399 | 1FBE | CE99 | E1BEBE | 0 | 1 | Ι | ι |
            | 039A | 03F0 | CE9A | CFB0 | 0 | 1 | Κ | ϰ |
            | 03A0 | 03D6 | CEA0 | CF96 | 0 | 1 | Π | ϖ |
            | 03A1 | 03F1 | CEA1 | CFB1 | 0 | 1 | Ρ | ϱ |
            | 03A3 | 03C2 | CEA3 | CF82 | 0 | 1 | Σ | ς |
            | 03A6 | 03D5 | CEA6 | CF95 | 0 | 1 | Φ | ϕ |
            | 0412 | 1C80 | D092 | E1B280 | 0 | 1 | В | ᲀ |
            | 0414 | 1C81 | D094 | E1B281 | 0 | 1 | Д | ᲁ |
            | 041E | 1C82 | D09E | E1B282 | 0 | 1 | О | ᲂ |
            | 0421 | 1C83 | D0A1 | E1B283 | 0 | 1 | С | ᲃ |
            | 0422 | 1C84 | D0A2 | E1B284 | 0 | 1 | Т | ᲄ |
            | 042A | 1C86 | D0AA | E1B286 | 0 | 1 | Ъ | ᲆ |
            | 0462 | 1C87 | D1A2 | E1B287 | 0 | 1 | Ѣ | ᲇ |
            | 1C88 | A64A | E1B288 | EA998A | 0 | 1 | ᲈ | Ꙋ |
            | 0049 | 0130 | 49 | C4B0 | 1 | 0 | I | İ |
            | 004B | 212A | 4B | E284AA | 1 | 0 | K | K |
            | 00C5 | 212B | C385 | E284AB | 1 | 0 | Å | Å |
            | 00DF | 1E9E | C39F | E1BA9E | 1 | 0 | ß | ẞ |
            | 03A9 | 2126 | CEA9 | E284A6 | 1 | 0 | Ω | Ω |
            +-----------+-----------+--------+--------+----------+----------+-----+-----+
            26 rows in set (0.005 sec)
            {noformat}

            h2. Cons of toupper vs tolower comparison
            - The tolower variant will compare exactly like strcasecmp() did
            - The toupper variant will compare close to how utf8mb3_general_ci works
            There are two parallel comparison systems in MariaDB collation library, implemented as virtual functions in MY_COLLATION_HANDLER:

            - Comparison according to the collation, provided by these functions
            {code:cpp}
              int (*strnncoll)(CHARSET_INFO *,
                                   const uchar *, size_t, const uchar *, size_t, my_bool);
              int (*strnncollsp)(CHARSET_INFO *,
                                     const uchar *, size_t, const uchar *, size_t);

              int (*strnncollsp_nchars)(CHARSET_INFO *,
                                            const uchar *str1, size_t len1,
                                            const uchar *str2, size_t len2,
                                            size_t nchars,
                                            uint flags);
            {code}


            - Comparison in case insensitive (but accent sensitive) style, implemented by this function:
            {code:cpp}
              int (*strcasecmp)(CHARSET_INFO *, const char *, const char *);
            {code}
            Note, accent and case sensitivity of the collation does not matter. strcasecmp() always works using accent sensitive case insensitive comparison style.

            These two parallel systems are redundant.

            Note, strcasecmp() is used mostly to compare identifiers, while the functions of the first group are used to compare data.

            Let's get rid of the second comparison system:
            1. Remove MY_COLLATION_HANDLER::strcasecmp()
            2. Introduce a new collation utf8mb4_general1400_as_ci. Note, it should work for the entire Unicode range U+0000..U+10FFFF.
            3. Replace all calls for:
            {code:cpp}
            system_charset_info->coll->strcasecmp()
            {code}
            to calls for
            {code:cpp}
            my_charset_utf8mb4_general1400_as_ci->coll->strnncoll***()
            {code}

            The change would generally be quite mechanic. However, there is one small problem: strcasecmp() accepts 0-terminated strings, while the strnncoll-alike functions accept the pointer and the length. So some refactoring will be needed. Note, Monty earlier changed many MariaDB C data types to use LEX_CSTRING (instead of just a const char pointer) to store names. So this part of the current task will switch some more C data types to LEX_CSTRING.

            h2. tolower vs toupper comparison

            Another option is to implement utf8mb3_general_as_ci which will compare upper cases (instead of lower cases).


            The difference is in a few dozen BMP characters. This script finds all those characters:
            {code:sql}
            CREATE OR REPLACE TABLE t1 (a CHAR(1) CHARACTER SET utf8mb4 COLLATE utf8mb4_uca1400_ai_ci) ENGINE=MyISAM;
            DELIMITER $$
            FOR i IN 0..0xFFFF
            DO
              INSERT INTO t1 VALUES (CHAR(i USING ucs2));
            END FOR;
            $$
            DELIMITER ;
            ALTER TABLE t1
              ADD has_casefolding INT DEFAULT (BINARY LOWER(a)<>a OR BINARY UPPER(a)<>a),
              ADD KEY(has_casefolding, a);

            CREATE OR REPLACE TABLE t21 AS
            SELECT
              HEX(t1.a) AS hex_a, HEX(t2.a) AS hex_b,
              BINARY LOWER(t1.a)=LOWER(t2.a) eq_lower,
              BINARY UPPER(t1.a)=UPPER(t2.a) AS eq_upper,
              t1.a AS a, t2.a AS b
            FROM
              t1 t1, t1 t2
            WHERE
              t1.has_casefolding=1
            AND (BINARY LOWER(t1.a)=LOWER(t2.a))<>(BINARY UPPER(t1.a)=UPPER(t2.a));

            SELECT
              HEX(CONVERT(a USING ucs2)) AS unicode_a,
              HEX(CONVERT(b USING ucs2)) AS unicode_b,
              t21.* FROM t21;
            {code}
            {noformat}
            +-----------+-----------+--------+--------+----------+----------+------+------+
            | unicode_a | unicode_b | hex_a | hex_b | eq_lower | eq_upper | a | b |
            +-----------+-----------+--------+--------+----------+----------+------+------+
            | 1E9E | 00DF | E1BA9E | C39F | 1 | 0 | ẞ | ß |
            | 0399 | 0345 | CE99 | CD85 | 0 | 1 | Ι | ͅ |
            | 03B9 | 0345 | CEB9 | CD85 | 0 | 1 | ι | ͅ |
            | 1FBE | 0345 | E1BEBE | CD85 | 0 | 1 | ι | ͅ |
            | 212B | 00C5 | E284AB | C385 | 1 | 0 | Å | Å |
            | 212B | 00E5 | E284AB | C3A5 | 1 | 0 | Å | å |
            | 00C5 | 212B | C385 | E284AB | 1 | 0 | Å | Å |
            | 00E5 | 212B | C3A5 | E284AB | 1 | 0 | å | Å |
            | 0130 | 0049 | C4B0 | 49 | 1 | 0 | İ | I |
            | 0131 | 0049 | C4B1 | 49 | 0 | 1 | ı | I |
            | 0130 | 0069 | C4B0 | 69 | 1 | 0 | İ | i |
            | 0131 | 0069 | C4B1 | 69 | 0 | 1 | ı | i |
            | 0049 | 0130 | 49 | C4B0 | 1 | 0 | I | İ |
            | 0069 | 0130 | 69 | C4B0 | 1 | 0 | i | İ |
            | 0049 | 0131 | 49 | C4B1 | 0 | 1 | I | ı |
            | 0069 | 0131 | 69 | C4B1 | 0 | 1 | i | ı |
            | 212A | 004B | E284AA | 4B | 1 | 0 | K | K |
            | 212A | 006B | E284AA | 6B | 1 | 0 | K | k |
            | 004B | 212A | 4B | E284AA | 1 | 0 | K | K |
            | 006B | 212A | 6B | E284AA | 1 | 0 | k | K |
            | 017F | 0053 | C5BF | 53 | 0 | 1 | ſ | S |
            | 017F | 0073 | C5BF | 73 | 0 | 1 | ſ | s |
            | 0053 | 017F | 53 | C5BF | 0 | 1 | S | ſ |
            | 0073 | 017F | 73 | C5BF | 0 | 1 | s | ſ |
            | 1E9B | 1E60 | E1BA9B | E1B9A0 | 0 | 1 | ẛ | Ṡ |
            | 1E9B | 1E61 | E1BA9B | E1B9A1 | 0 | 1 | ẛ | ṡ |
            | 1E60 | 1E9B | E1B9A0 | E1BA9B | 0 | 1 | Ṡ | ẛ |
            | 1E61 | 1E9B | E1B9A1 | E1BA9B | 0 | 1 | ṡ | ẛ |
            | 03D0 | 0392 | CF90 | CE92 | 0 | 1 | ϐ | Β |
            | 03D0 | 03B2 | CF90 | CEB2 | 0 | 1 | ϐ | β |
            | 0392 | 03D0 | CE92 | CF90 | 0 | 1 | Β | ϐ |
            | 03B2 | 03D0 | CEB2 | CF90 | 0 | 1 | β | ϐ |
            | 03F5 | 0395 | CFB5 | CE95 | 0 | 1 | ϵ | Ε |
            | 03F5 | 03B5 | CFB5 | CEB5 | 0 | 1 | ϵ | ε |
            | 0395 | 03F5 | CE95 | CFB5 | 0 | 1 | Ε | ϵ |
            | 03B5 | 03F5 | CEB5 | CFB5 | 0 | 1 | ε | ϵ |
            | 03D1 | 0398 | CF91 | CE98 | 0 | 1 | ϑ | Θ |
            | 03F4 | 0398 | CFB4 | CE98 | 1 | 0 | ϴ | Θ |
            | 03D1 | 03B8 | CF91 | CEB8 | 0 | 1 | ϑ | θ |
            | 03F4 | 03B8 | CFB4 | CEB8 | 1 | 0 | ϴ | θ |
            | 0398 | 03D1 | CE98 | CF91 | 0 | 1 | Θ | ϑ |
            | 03B8 | 03D1 | CEB8 | CF91 | 0 | 1 | θ | ϑ |
            | 0398 | 03F4 | CE98 | CFB4 | 1 | 0 | Θ | ϴ |
            | 03B8 | 03F4 | CEB8 | CFB4 | 1 | 0 | θ | ϴ |
            | 0345 | 0399 | CD85 | CE99 | 0 | 1 | ͅ | Ι |
            | 1FBE | 0399 | E1BEBE | CE99 | 0 | 1 | ι | Ι |
            | 0345 | 03B9 | CD85 | CEB9 | 0 | 1 | ͅ | ι |
            | 1FBE | 03B9 | E1BEBE | CEB9 | 0 | 1 | ι | ι |
            | 0345 | 1FBE | CD85 | E1BEBE | 0 | 1 | ͅ | ι |
            | 0399 | 1FBE | CE99 | E1BEBE | 0 | 1 | Ι | ι |
            | 03B9 | 1FBE | CEB9 | E1BEBE | 0 | 1 | ι | ι |
            | 03F0 | 039A | CFB0 | CE9A | 0 | 1 | ϰ | Κ |
            | 03F0 | 03BA | CFB0 | CEBA | 0 | 1 | ϰ | κ |
            | 039A | 03F0 | CE9A | CFB0 | 0 | 1 | Κ | ϰ |
            | 03BA | 03F0 | CEBA | CFB0 | 0 | 1 | κ | ϰ |
            | 039C | 00B5 | CE9C | C2B5 | 0 | 1 | Μ | µ |
            | 03BC | 00B5 | CEBC | C2B5 | 0 | 1 | μ | µ |
            | 00B5 | 039C | C2B5 | CE9C | 0 | 1 | µ | Μ |
            | 00B5 | 03BC | C2B5 | CEBC | 0 | 1 | µ | μ |
            | 03D6 | 03A0 | CF96 | CEA0 | 0 | 1 | ϖ | Π |
            | 03D6 | 03C0 | CF96 | CF80 | 0 | 1 | ϖ | π |
            | 03A0 | 03D6 | CEA0 | CF96 | 0 | 1 | Π | ϖ |
            | 03C0 | 03D6 | CF80 | CF96 | 0 | 1 | π | ϖ |
            | 03F1 | 03A1 | CFB1 | CEA1 | 0 | 1 | ϱ | Ρ |
            | 03F1 | 03C1 | CFB1 | CF81 | 0 | 1 | ϱ | ρ |
            | 03A1 | 03F1 | CEA1 | CFB1 | 0 | 1 | Ρ | ϱ |
            | 03C1 | 03F1 | CF81 | CFB1 | 0 | 1 | ρ | ϱ |
            | 03C2 | 03A3 | CF82 | CEA3 | 0 | 1 | ς | Σ |
            | 03A3 | 03C2 | CEA3 | CF82 | 0 | 1 | Σ | ς |
            | 03C3 | 03C2 | CF83 | CF82 | 0 | 1 | σ | ς |
            | 03C2 | 03C3 | CF82 | CF83 | 0 | 1 | ς | σ |
            | 03D5 | 03A6 | CF95 | CEA6 | 0 | 1 | ϕ | Φ |
            | 03D5 | 03C6 | CF95 | CF86 | 0 | 1 | ϕ | φ |
            | 03A6 | 03D5 | CEA6 | CF95 | 0 | 1 | Φ | ϕ |
            | 03C6 | 03D5 | CF86 | CF95 | 0 | 1 | φ | ϕ |
            | 2126 | 03A9 | E284A6 | CEA9 | 1 | 0 | Ω | Ω |
            | 2126 | 03C9 | E284A6 | CF89 | 1 | 0 | Ω | ω |
            | 03A9 | 2126 | CEA9 | E284A6 | 1 | 0 | Ω | Ω |
            | 03C9 | 2126 | CF89 | E284A6 | 1 | 0 | ω | Ω |
            | 1C80 | 0412 | E1B280 | D092 | 0 | 1 | ᲀ | В |
            | 1C80 | 0432 | E1B280 | D0B2 | 0 | 1 | ᲀ | в |
            | 0412 | 1C80 | D092 | E1B280 | 0 | 1 | В | ᲀ |
            | 0432 | 1C80 | D0B2 | E1B280 | 0 | 1 | в | ᲀ |
            | 1C81 | 0414 | E1B281 | D094 | 0 | 1 | ᲁ | Д |
            | 1C81 | 0434 | E1B281 | D0B4 | 0 | 1 | ᲁ | д |
            | 0414 | 1C81 | D094 | E1B281 | 0 | 1 | Д | ᲁ |
            | 0434 | 1C81 | D0B4 | E1B281 | 0 | 1 | д | ᲁ |
            | 1C82 | 041E | E1B282 | D09E | 0 | 1 | ᲂ | О |
            | 1C82 | 043E | E1B282 | D0BE | 0 | 1 | ᲂ | о |
            | 041E | 1C82 | D09E | E1B282 | 0 | 1 | О | ᲂ |
            | 043E | 1C82 | D0BE | E1B282 | 0 | 1 | о | ᲂ |
            | 1C83 | 0421 | E1B283 | D0A1 | 0 | 1 | ᲃ | С |
            | 1C83 | 0441 | E1B283 | D181 | 0 | 1 | ᲃ | с |
            | 0421 | 1C83 | D0A1 | E1B283 | 0 | 1 | С | ᲃ |
            | 0441 | 1C83 | D181 | E1B283 | 0 | 1 | с | ᲃ |
            | 1C84 | 0422 | E1B284 | D0A2 | 0 | 1 | ᲄ | Т |
            | 1C85 | 0422 | E1B285 | D0A2 | 0 | 1 | ᲅ | Т |
            | 1C84 | 0442 | E1B284 | D182 | 0 | 1 | ᲄ | т |
            | 1C85 | 0442 | E1B285 | D182 | 0 | 1 | ᲅ | т |
            | 0422 | 1C84 | D0A2 | E1B284 | 0 | 1 | Т | ᲄ |
            | 0442 | 1C84 | D182 | E1B284 | 0 | 1 | т | ᲄ |
            | 1C85 | 1C84 | E1B285 | E1B284 | 0 | 1 | ᲅ | ᲄ |
            | 0422 | 1C85 | D0A2 | E1B285 | 0 | 1 | Т | ᲅ |
            | 0442 | 1C85 | D182 | E1B285 | 0 | 1 | т | ᲅ |
            | 1C84 | 1C85 | E1B284 | E1B285 | 0 | 1 | ᲄ | ᲅ |
            | A64A | 1C88 | EA998A | E1B288 | 0 | 1 | Ꙋ | ᲈ |
            | A64B | 1C88 | EA998B | E1B288 | 0 | 1 | ꙋ | ᲈ |
            | 1C88 | A64A | E1B288 | EA998A | 0 | 1 | ᲈ | Ꙋ |
            | 1C88 | A64B | E1B288 | EA998B | 0 | 1 | ᲈ | ꙋ |
            | 1C86 | 042A | E1B286 | D0AA | 0 | 1 | ᲆ | Ъ |
            | 1C86 | 044A | E1B286 | D18A | 0 | 1 | ᲆ | ъ |
            | 042A | 1C86 | D0AA | E1B286 | 0 | 1 | Ъ | ᲆ |
            | 044A | 1C86 | D18A | E1B286 | 0 | 1 | ъ | ᲆ |
            | 1C87 | 0462 | E1B287 | D1A2 | 0 | 1 | ᲇ | Ѣ |
            | 1C87 | 0463 | E1B287 | D1A3 | 0 | 1 | ᲇ | ѣ |
            | 0462 | 1C87 | D1A2 | E1B287 | 0 | 1 | Ѣ | ᲇ |
            | 0463 | 1C87 | D1A3 | E1B287 | 0 | 1 | ѣ | ᲇ |
            +-----------+-----------+--------+--------+----------+----------+------+------+
            {noformat}


            Note, toupper comparison considers more pairs as equal than tolower comparison:
            {code:sql}
            SELECT SUM(eq_lower), SUM(eq_upper) FROM t21;
            {code}
            {noformat}
            +---------------+---------------+
            | SUM(eq_lower) | SUM(eq_upper) |
            +---------------+---------------+
            | 21 | 96 |
            +---------------+---------------+
            {noformat}

            h2. A more compact table with distinct pairs (the character with a smaller code point is on the left)
            {code:sql}
            SELECT
              HEX(CONVERT(a USING ucs2)) AS unicode_a,
              HEX(CONVERT(b USING ucs2)) AS unicode_b,
              HEX(a), HEX(b),
              BINARY LOWER(a)=LOWER(b) AS eq_lower,
              BINARY UPPER(a)=UPPER(b) AS eq_upper,
              a,b
            FROM
            (
              SELECT DISTINCT IF(BINARY a<b,a,b) AS a,IF(binary a>=b,a,b) AS b from t21
            ) d1
            ORDER BY eq_lower, unicode_a, unicode_b;
            {code}
            {noformat}
            +-----------+-----------+--------+--------+----------+----------+-----+-----+
            | unicode_a | unicode_b | HEX(a) | HEX(b) | eq_lower | eq_upper | a | b |
            +-----------+-----------+--------+--------+----------+----------+-----+-----+
            | 0049 | 0131 | 49 | C4B1 | 0 | 1 | I | ı |
            | 0053 | 017F | 53 | C5BF | 0 | 1 | S | ſ |
            | 00B5 | 039C | C2B5 | CE9C | 0 | 1 | µ | Μ |
            | 0345 | 0399 | CD85 | CE99 | 0 | 1 | ͅ | Ι |
            | 0392 | 03D0 | CE92 | CF90 | 0 | 1 | Β | ϐ |
            | 0395 | 03F5 | CE95 | CFB5 | 0 | 1 | Ε | ϵ |
            | 0398 | 03D1 | CE98 | CF91 | 0 | 1 | Θ | ϑ |
            | 0399 | 1FBE | CE99 | E1BEBE | 0 | 1 | Ι | ι |
            | 039A | 03F0 | CE9A | CFB0 | 0 | 1 | Κ | ϰ |
            | 03A0 | 03D6 | CEA0 | CF96 | 0 | 1 | Π | ϖ |
            | 03A1 | 03F1 | CEA1 | CFB1 | 0 | 1 | Ρ | ϱ |
            | 03A3 | 03C2 | CEA3 | CF82 | 0 | 1 | Σ | ς |
            | 03A6 | 03D5 | CEA6 | CF95 | 0 | 1 | Φ | ϕ |
            | 0412 | 1C80 | D092 | E1B280 | 0 | 1 | В | ᲀ |
            | 0414 | 1C81 | D094 | E1B281 | 0 | 1 | Д | ᲁ |
            | 041E | 1C82 | D09E | E1B282 | 0 | 1 | О | ᲂ |
            | 0421 | 1C83 | D0A1 | E1B283 | 0 | 1 | С | ᲃ |
            | 0422 | 1C84 | D0A2 | E1B284 | 0 | 1 | Т | ᲄ |
            | 042A | 1C86 | D0AA | E1B286 | 0 | 1 | Ъ | ᲆ |
            | 0462 | 1C87 | D1A2 | E1B287 | 0 | 1 | Ѣ | ᲇ |
            | 1C88 | A64A | E1B288 | EA998A | 0 | 1 | ᲈ | Ꙋ |
            | 0049 | 0130 | 49 | C4B0 | 1 | 0 | I | İ |
            | 004B | 212A | 4B | E284AA | 1 | 0 | K | K |
            | 00C5 | 212B | C385 | E284AB | 1 | 0 | Å | Å |
            | 00DF | 1E9E | C39F | E1BA9E | 1 | 0 | ß | ẞ |
            | 03A9 | 2126 | CEA9 | E284A6 | 1 | 0 | Ω | Ω |
            +-----------+-----------+--------+--------+----------+----------+-----+-----+
            26 rows in set (0.005 sec)
            {noformat}

            h2. Cons of toupper vs tolower comparison
            - The tolower variant will compare exactly like strcasecmp() did
            - The toupper variant will compare close to how utf8mb3_general_ci works
            bar Alexander Barkov made changes -
            Description There are two parallel comparison systems in MariaDB collation library, implemented as virtual functions in MY_COLLATION_HANDLER:

            - Comparison according to the collation, provided by these functions
            {code:cpp}
              int (*strnncoll)(CHARSET_INFO *,
                                   const uchar *, size_t, const uchar *, size_t, my_bool);
              int (*strnncollsp)(CHARSET_INFO *,
                                     const uchar *, size_t, const uchar *, size_t);

              int (*strnncollsp_nchars)(CHARSET_INFO *,
                                            const uchar *str1, size_t len1,
                                            const uchar *str2, size_t len2,
                                            size_t nchars,
                                            uint flags);
            {code}


            - Comparison in case insensitive (but accent sensitive) style, implemented by this function:
            {code:cpp}
              int (*strcasecmp)(CHARSET_INFO *, const char *, const char *);
            {code}
            Note, accent and case sensitivity of the collation does not matter. strcasecmp() always works using accent sensitive case insensitive comparison style.

            These two parallel systems are redundant.

            Note, strcasecmp() is used mostly to compare identifiers, while the functions of the first group are used to compare data.

            Let's get rid of the second comparison system:
            1. Remove MY_COLLATION_HANDLER::strcasecmp()
            2. Introduce a new collation utf8mb4_general1400_as_ci. Note, it should work for the entire Unicode range U+0000..U+10FFFF.
            3. Replace all calls for:
            {code:cpp}
            system_charset_info->coll->strcasecmp()
            {code}
            to calls for
            {code:cpp}
            my_charset_utf8mb4_general1400_as_ci->coll->strnncoll***()
            {code}

            The change would generally be quite mechanic. However, there is one small problem: strcasecmp() accepts 0-terminated strings, while the strnncoll-alike functions accept the pointer and the length. So some refactoring will be needed. Note, Monty earlier changed many MariaDB C data types to use LEX_CSTRING (instead of just a const char pointer) to store names. So this part of the current task will switch some more C data types to LEX_CSTRING.

            h2. tolower vs toupper comparison

            Another option is to implement utf8mb3_general_as_ci which will compare upper cases (instead of lower cases).


            The difference is in a few dozen BMP characters. This script finds all those characters:
            {code:sql}
            CREATE OR REPLACE TABLE t1 (a CHAR(1) CHARACTER SET utf8mb4 COLLATE utf8mb4_uca1400_ai_ci) ENGINE=MyISAM;
            DELIMITER $$
            FOR i IN 0..0xFFFF
            DO
              INSERT INTO t1 VALUES (CHAR(i USING ucs2));
            END FOR;
            $$
            DELIMITER ;
            ALTER TABLE t1
              ADD has_casefolding INT DEFAULT (BINARY LOWER(a)<>a OR BINARY UPPER(a)<>a),
              ADD KEY(has_casefolding, a);

            CREATE OR REPLACE TABLE t21 AS
            SELECT
              HEX(t1.a) AS hex_a, HEX(t2.a) AS hex_b,
              BINARY LOWER(t1.a)=LOWER(t2.a) eq_lower,
              BINARY UPPER(t1.a)=UPPER(t2.a) AS eq_upper,
              t1.a AS a, t2.a AS b
            FROM
              t1 t1, t1 t2
            WHERE
              t1.has_casefolding=1
            AND (BINARY LOWER(t1.a)=LOWER(t2.a))<>(BINARY UPPER(t1.a)=UPPER(t2.a));

            SELECT
              HEX(CONVERT(a USING ucs2)) AS unicode_a,
              HEX(CONVERT(b USING ucs2)) AS unicode_b,
              t21.* FROM t21;
            {code}
            {noformat}
            +-----------+-----------+--------+--------+----------+----------+------+------+
            | unicode_a | unicode_b | hex_a | hex_b | eq_lower | eq_upper | a | b |
            +-----------+-----------+--------+--------+----------+----------+------+------+
            | 1E9E | 00DF | E1BA9E | C39F | 1 | 0 | ẞ | ß |
            | 0399 | 0345 | CE99 | CD85 | 0 | 1 | Ι | ͅ |
            | 03B9 | 0345 | CEB9 | CD85 | 0 | 1 | ι | ͅ |
            | 1FBE | 0345 | E1BEBE | CD85 | 0 | 1 | ι | ͅ |
            | 212B | 00C5 | E284AB | C385 | 1 | 0 | Å | Å |
            | 212B | 00E5 | E284AB | C3A5 | 1 | 0 | Å | å |
            | 00C5 | 212B | C385 | E284AB | 1 | 0 | Å | Å |
            | 00E5 | 212B | C3A5 | E284AB | 1 | 0 | å | Å |
            | 0130 | 0049 | C4B0 | 49 | 1 | 0 | İ | I |
            | 0131 | 0049 | C4B1 | 49 | 0 | 1 | ı | I |
            | 0130 | 0069 | C4B0 | 69 | 1 | 0 | İ | i |
            | 0131 | 0069 | C4B1 | 69 | 0 | 1 | ı | i |
            | 0049 | 0130 | 49 | C4B0 | 1 | 0 | I | İ |
            | 0069 | 0130 | 69 | C4B0 | 1 | 0 | i | İ |
            | 0049 | 0131 | 49 | C4B1 | 0 | 1 | I | ı |
            | 0069 | 0131 | 69 | C4B1 | 0 | 1 | i | ı |
            | 212A | 004B | E284AA | 4B | 1 | 0 | K | K |
            | 212A | 006B | E284AA | 6B | 1 | 0 | K | k |
            | 004B | 212A | 4B | E284AA | 1 | 0 | K | K |
            | 006B | 212A | 6B | E284AA | 1 | 0 | k | K |
            | 017F | 0053 | C5BF | 53 | 0 | 1 | ſ | S |
            | 017F | 0073 | C5BF | 73 | 0 | 1 | ſ | s |
            | 0053 | 017F | 53 | C5BF | 0 | 1 | S | ſ |
            | 0073 | 017F | 73 | C5BF | 0 | 1 | s | ſ |
            | 1E9B | 1E60 | E1BA9B | E1B9A0 | 0 | 1 | ẛ | Ṡ |
            | 1E9B | 1E61 | E1BA9B | E1B9A1 | 0 | 1 | ẛ | ṡ |
            | 1E60 | 1E9B | E1B9A0 | E1BA9B | 0 | 1 | Ṡ | ẛ |
            | 1E61 | 1E9B | E1B9A1 | E1BA9B | 0 | 1 | ṡ | ẛ |
            | 03D0 | 0392 | CF90 | CE92 | 0 | 1 | ϐ | Β |
            | 03D0 | 03B2 | CF90 | CEB2 | 0 | 1 | ϐ | β |
            | 0392 | 03D0 | CE92 | CF90 | 0 | 1 | Β | ϐ |
            | 03B2 | 03D0 | CEB2 | CF90 | 0 | 1 | β | ϐ |
            | 03F5 | 0395 | CFB5 | CE95 | 0 | 1 | ϵ | Ε |
            | 03F5 | 03B5 | CFB5 | CEB5 | 0 | 1 | ϵ | ε |
            | 0395 | 03F5 | CE95 | CFB5 | 0 | 1 | Ε | ϵ |
            | 03B5 | 03F5 | CEB5 | CFB5 | 0 | 1 | ε | ϵ |
            | 03D1 | 0398 | CF91 | CE98 | 0 | 1 | ϑ | Θ |
            | 03F4 | 0398 | CFB4 | CE98 | 1 | 0 | ϴ | Θ |
            | 03D1 | 03B8 | CF91 | CEB8 | 0 | 1 | ϑ | θ |
            | 03F4 | 03B8 | CFB4 | CEB8 | 1 | 0 | ϴ | θ |
            | 0398 | 03D1 | CE98 | CF91 | 0 | 1 | Θ | ϑ |
            | 03B8 | 03D1 | CEB8 | CF91 | 0 | 1 | θ | ϑ |
            | 0398 | 03F4 | CE98 | CFB4 | 1 | 0 | Θ | ϴ |
            | 03B8 | 03F4 | CEB8 | CFB4 | 1 | 0 | θ | ϴ |
            | 0345 | 0399 | CD85 | CE99 | 0 | 1 | ͅ | Ι |
            | 1FBE | 0399 | E1BEBE | CE99 | 0 | 1 | ι | Ι |
            | 0345 | 03B9 | CD85 | CEB9 | 0 | 1 | ͅ | ι |
            | 1FBE | 03B9 | E1BEBE | CEB9 | 0 | 1 | ι | ι |
            | 0345 | 1FBE | CD85 | E1BEBE | 0 | 1 | ͅ | ι |
            | 0399 | 1FBE | CE99 | E1BEBE | 0 | 1 | Ι | ι |
            | 03B9 | 1FBE | CEB9 | E1BEBE | 0 | 1 | ι | ι |
            | 03F0 | 039A | CFB0 | CE9A | 0 | 1 | ϰ | Κ |
            | 03F0 | 03BA | CFB0 | CEBA | 0 | 1 | ϰ | κ |
            | 039A | 03F0 | CE9A | CFB0 | 0 | 1 | Κ | ϰ |
            | 03BA | 03F0 | CEBA | CFB0 | 0 | 1 | κ | ϰ |
            | 039C | 00B5 | CE9C | C2B5 | 0 | 1 | Μ | µ |
            | 03BC | 00B5 | CEBC | C2B5 | 0 | 1 | μ | µ |
            | 00B5 | 039C | C2B5 | CE9C | 0 | 1 | µ | Μ |
            | 00B5 | 03BC | C2B5 | CEBC | 0 | 1 | µ | μ |
            | 03D6 | 03A0 | CF96 | CEA0 | 0 | 1 | ϖ | Π |
            | 03D6 | 03C0 | CF96 | CF80 | 0 | 1 | ϖ | π |
            | 03A0 | 03D6 | CEA0 | CF96 | 0 | 1 | Π | ϖ |
            | 03C0 | 03D6 | CF80 | CF96 | 0 | 1 | π | ϖ |
            | 03F1 | 03A1 | CFB1 | CEA1 | 0 | 1 | ϱ | Ρ |
            | 03F1 | 03C1 | CFB1 | CF81 | 0 | 1 | ϱ | ρ |
            | 03A1 | 03F1 | CEA1 | CFB1 | 0 | 1 | Ρ | ϱ |
            | 03C1 | 03F1 | CF81 | CFB1 | 0 | 1 | ρ | ϱ |
            | 03C2 | 03A3 | CF82 | CEA3 | 0 | 1 | ς | Σ |
            | 03A3 | 03C2 | CEA3 | CF82 | 0 | 1 | Σ | ς |
            | 03C3 | 03C2 | CF83 | CF82 | 0 | 1 | σ | ς |
            | 03C2 | 03C3 | CF82 | CF83 | 0 | 1 | ς | σ |
            | 03D5 | 03A6 | CF95 | CEA6 | 0 | 1 | ϕ | Φ |
            | 03D5 | 03C6 | CF95 | CF86 | 0 | 1 | ϕ | φ |
            | 03A6 | 03D5 | CEA6 | CF95 | 0 | 1 | Φ | ϕ |
            | 03C6 | 03D5 | CF86 | CF95 | 0 | 1 | φ | ϕ |
            | 2126 | 03A9 | E284A6 | CEA9 | 1 | 0 | Ω | Ω |
            | 2126 | 03C9 | E284A6 | CF89 | 1 | 0 | Ω | ω |
            | 03A9 | 2126 | CEA9 | E284A6 | 1 | 0 | Ω | Ω |
            | 03C9 | 2126 | CF89 | E284A6 | 1 | 0 | ω | Ω |
            | 1C80 | 0412 | E1B280 | D092 | 0 | 1 | ᲀ | В |
            | 1C80 | 0432 | E1B280 | D0B2 | 0 | 1 | ᲀ | в |
            | 0412 | 1C80 | D092 | E1B280 | 0 | 1 | В | ᲀ |
            | 0432 | 1C80 | D0B2 | E1B280 | 0 | 1 | в | ᲀ |
            | 1C81 | 0414 | E1B281 | D094 | 0 | 1 | ᲁ | Д |
            | 1C81 | 0434 | E1B281 | D0B4 | 0 | 1 | ᲁ | д |
            | 0414 | 1C81 | D094 | E1B281 | 0 | 1 | Д | ᲁ |
            | 0434 | 1C81 | D0B4 | E1B281 | 0 | 1 | д | ᲁ |
            | 1C82 | 041E | E1B282 | D09E | 0 | 1 | ᲂ | О |
            | 1C82 | 043E | E1B282 | D0BE | 0 | 1 | ᲂ | о |
            | 041E | 1C82 | D09E | E1B282 | 0 | 1 | О | ᲂ |
            | 043E | 1C82 | D0BE | E1B282 | 0 | 1 | о | ᲂ |
            | 1C83 | 0421 | E1B283 | D0A1 | 0 | 1 | ᲃ | С |
            | 1C83 | 0441 | E1B283 | D181 | 0 | 1 | ᲃ | с |
            | 0421 | 1C83 | D0A1 | E1B283 | 0 | 1 | С | ᲃ |
            | 0441 | 1C83 | D181 | E1B283 | 0 | 1 | с | ᲃ |
            | 1C84 | 0422 | E1B284 | D0A2 | 0 | 1 | ᲄ | Т |
            | 1C85 | 0422 | E1B285 | D0A2 | 0 | 1 | ᲅ | Т |
            | 1C84 | 0442 | E1B284 | D182 | 0 | 1 | ᲄ | т |
            | 1C85 | 0442 | E1B285 | D182 | 0 | 1 | ᲅ | т |
            | 0422 | 1C84 | D0A2 | E1B284 | 0 | 1 | Т | ᲄ |
            | 0442 | 1C84 | D182 | E1B284 | 0 | 1 | т | ᲄ |
            | 1C85 | 1C84 | E1B285 | E1B284 | 0 | 1 | ᲅ | ᲄ |
            | 0422 | 1C85 | D0A2 | E1B285 | 0 | 1 | Т | ᲅ |
            | 0442 | 1C85 | D182 | E1B285 | 0 | 1 | т | ᲅ |
            | 1C84 | 1C85 | E1B284 | E1B285 | 0 | 1 | ᲄ | ᲅ |
            | A64A | 1C88 | EA998A | E1B288 | 0 | 1 | Ꙋ | ᲈ |
            | A64B | 1C88 | EA998B | E1B288 | 0 | 1 | ꙋ | ᲈ |
            | 1C88 | A64A | E1B288 | EA998A | 0 | 1 | ᲈ | Ꙋ |
            | 1C88 | A64B | E1B288 | EA998B | 0 | 1 | ᲈ | ꙋ |
            | 1C86 | 042A | E1B286 | D0AA | 0 | 1 | ᲆ | Ъ |
            | 1C86 | 044A | E1B286 | D18A | 0 | 1 | ᲆ | ъ |
            | 042A | 1C86 | D0AA | E1B286 | 0 | 1 | Ъ | ᲆ |
            | 044A | 1C86 | D18A | E1B286 | 0 | 1 | ъ | ᲆ |
            | 1C87 | 0462 | E1B287 | D1A2 | 0 | 1 | ᲇ | Ѣ |
            | 1C87 | 0463 | E1B287 | D1A3 | 0 | 1 | ᲇ | ѣ |
            | 0462 | 1C87 | D1A2 | E1B287 | 0 | 1 | Ѣ | ᲇ |
            | 0463 | 1C87 | D1A3 | E1B287 | 0 | 1 | ѣ | ᲇ |
            +-----------+-----------+--------+--------+----------+----------+------+------+
            {noformat}


            Note, toupper comparison considers more pairs as equal than tolower comparison:
            {code:sql}
            SELECT SUM(eq_lower), SUM(eq_upper) FROM t21;
            {code}
            {noformat}
            +---------------+---------------+
            | SUM(eq_lower) | SUM(eq_upper) |
            +---------------+---------------+
            | 21 | 96 |
            +---------------+---------------+
            {noformat}

            h2. A more compact table with distinct pairs (the character with a smaller code point is on the left)
            {code:sql}
            SELECT
              HEX(CONVERT(a USING ucs2)) AS unicode_a,
              HEX(CONVERT(b USING ucs2)) AS unicode_b,
              HEX(a), HEX(b),
              BINARY LOWER(a)=LOWER(b) AS eq_lower,
              BINARY UPPER(a)=UPPER(b) AS eq_upper,
              a,b
            FROM
            (
              SELECT DISTINCT IF(BINARY a<b,a,b) AS a,IF(binary a>=b,a,b) AS b from t21
            ) d1
            ORDER BY eq_lower, unicode_a, unicode_b;
            {code}
            {noformat}
            +-----------+-----------+--------+--------+----------+----------+-----+-----+
            | unicode_a | unicode_b | HEX(a) | HEX(b) | eq_lower | eq_upper | a | b |
            +-----------+-----------+--------+--------+----------+----------+-----+-----+
            | 0049 | 0131 | 49 | C4B1 | 0 | 1 | I | ı |
            | 0053 | 017F | 53 | C5BF | 0 | 1 | S | ſ |
            | 00B5 | 039C | C2B5 | CE9C | 0 | 1 | µ | Μ |
            | 0345 | 0399 | CD85 | CE99 | 0 | 1 | ͅ | Ι |
            | 0392 | 03D0 | CE92 | CF90 | 0 | 1 | Β | ϐ |
            | 0395 | 03F5 | CE95 | CFB5 | 0 | 1 | Ε | ϵ |
            | 0398 | 03D1 | CE98 | CF91 | 0 | 1 | Θ | ϑ |
            | 0399 | 1FBE | CE99 | E1BEBE | 0 | 1 | Ι | ι |
            | 039A | 03F0 | CE9A | CFB0 | 0 | 1 | Κ | ϰ |
            | 03A0 | 03D6 | CEA0 | CF96 | 0 | 1 | Π | ϖ |
            | 03A1 | 03F1 | CEA1 | CFB1 | 0 | 1 | Ρ | ϱ |
            | 03A3 | 03C2 | CEA3 | CF82 | 0 | 1 | Σ | ς |
            | 03A6 | 03D5 | CEA6 | CF95 | 0 | 1 | Φ | ϕ |
            | 0412 | 1C80 | D092 | E1B280 | 0 | 1 | В | ᲀ |
            | 0414 | 1C81 | D094 | E1B281 | 0 | 1 | Д | ᲁ |
            | 041E | 1C82 | D09E | E1B282 | 0 | 1 | О | ᲂ |
            | 0421 | 1C83 | D0A1 | E1B283 | 0 | 1 | С | ᲃ |
            | 0422 | 1C84 | D0A2 | E1B284 | 0 | 1 | Т | ᲄ |
            | 042A | 1C86 | D0AA | E1B286 | 0 | 1 | Ъ | ᲆ |
            | 0462 | 1C87 | D1A2 | E1B287 | 0 | 1 | Ѣ | ᲇ |
            | 1C88 | A64A | E1B288 | EA998A | 0 | 1 | ᲈ | Ꙋ |
            | 0049 | 0130 | 49 | C4B0 | 1 | 0 | I | İ |
            | 004B | 212A | 4B | E284AA | 1 | 0 | K | K |
            | 00C5 | 212B | C385 | E284AB | 1 | 0 | Å | Å |
            | 00DF | 1E9E | C39F | E1BA9E | 1 | 0 | ß | ẞ |
            | 03A9 | 2126 | CEA9 | E284A6 | 1 | 0 | Ω | Ω |
            +-----------+-----------+--------+--------+----------+----------+-----+-----+
            26 rows in set (0.005 sec)
            {noformat}

            h2. Cons of toupper vs tolower comparison
            - The tolower variant will compare exactly like strcasecmp() did
            - The toupper variant will compare close to how utf8mb3_general_ci works
            There are two parallel comparison systems in MariaDB collation library, implemented as virtual functions in MY_COLLATION_HANDLER:

            - Comparison according to the collation, provided by these functions
            {code:cpp}
              int (*strnncoll)(CHARSET_INFO *,
                                   const uchar *, size_t, const uchar *, size_t, my_bool);
              int (*strnncollsp)(CHARSET_INFO *,
                                     const uchar *, size_t, const uchar *, size_t);

              int (*strnncollsp_nchars)(CHARSET_INFO *,
                                            const uchar *str1, size_t len1,
                                            const uchar *str2, size_t len2,
                                            size_t nchars,
                                            uint flags);
            {code}


            - Comparison in case insensitive (but accent sensitive) style, implemented by this function:
            {code:cpp}
              int (*strcasecmp)(CHARSET_INFO *, const char *, const char *);
            {code}
            Note, accent and case sensitivity of the collation does not matter. strcasecmp() always works using accent sensitive case insensitive comparison style.

            These two parallel systems are redundant.

            Note, strcasecmp() is used mostly to compare identifiers, while the functions of the first group are used to compare data.

            Let's get rid of the second comparison system:
            1. Remove MY_COLLATION_HANDLER::strcasecmp()
            2. Introduce a new collation utf8mb4_general1400_as_ci. Note, it should work for the entire Unicode range U+0000..U+10FFFF.
            3. Replace all calls for:
            {code:cpp}
            system_charset_info->coll->strcasecmp()
            {code}
            to calls for
            {code:cpp}
            my_charset_utf8mb4_general1400_as_ci->coll->strnncoll***()
            {code}

            The change would generally be quite mechanic. However, there is one small problem: strcasecmp() accepts 0-terminated strings, while the strnncoll-alike functions accept the pointer and the length. So some refactoring will be needed. Note, Monty earlier changed many MariaDB C data types to use LEX_CSTRING (instead of just a const char pointer) to store names. So this part of the current task will switch some more C data types to LEX_CSTRING.

            h2. tolower vs toupper comparison

            Another option is to implement utf8mb3_general1400_as_ci which will compare upper cases (instead of lower cases).


            The difference is in a few dozen BMP characters. This script finds all those characters:
            {code:sql}
            CREATE OR REPLACE TABLE t1 (a CHAR(1) CHARACTER SET utf8mb4 COLLATE utf8mb4_uca1400_ai_ci) ENGINE=MyISAM;
            DELIMITER $$
            FOR i IN 0..0xFFFF
            DO
              INSERT INTO t1 VALUES (CHAR(i USING ucs2));
            END FOR;
            $$
            DELIMITER ;
            ALTER TABLE t1
              ADD has_casefolding INT DEFAULT (BINARY LOWER(a)<>a OR BINARY UPPER(a)<>a),
              ADD KEY(has_casefolding, a);

            CREATE OR REPLACE TABLE t21 AS
            SELECT
              HEX(t1.a) AS hex_a, HEX(t2.a) AS hex_b,
              BINARY LOWER(t1.a)=LOWER(t2.a) eq_lower,
              BINARY UPPER(t1.a)=UPPER(t2.a) AS eq_upper,
              t1.a AS a, t2.a AS b
            FROM
              t1 t1, t1 t2
            WHERE
              t1.has_casefolding=1
            AND (BINARY LOWER(t1.a)=LOWER(t2.a))<>(BINARY UPPER(t1.a)=UPPER(t2.a));

            SELECT
              HEX(CONVERT(a USING ucs2)) AS unicode_a,
              HEX(CONVERT(b USING ucs2)) AS unicode_b,
              t21.* FROM t21;
            {code}
            {noformat}
            +-----------+-----------+--------+--------+----------+----------+------+------+
            | unicode_a | unicode_b | hex_a | hex_b | eq_lower | eq_upper | a | b |
            +-----------+-----------+--------+--------+----------+----------+------+------+
            | 1E9E | 00DF | E1BA9E | C39F | 1 | 0 | ẞ | ß |
            | 0399 | 0345 | CE99 | CD85 | 0 | 1 | Ι | ͅ |
            | 03B9 | 0345 | CEB9 | CD85 | 0 | 1 | ι | ͅ |
            | 1FBE | 0345 | E1BEBE | CD85 | 0 | 1 | ι | ͅ |
            | 212B | 00C5 | E284AB | C385 | 1 | 0 | Å | Å |
            | 212B | 00E5 | E284AB | C3A5 | 1 | 0 | Å | å |
            | 00C5 | 212B | C385 | E284AB | 1 | 0 | Å | Å |
            | 00E5 | 212B | C3A5 | E284AB | 1 | 0 | å | Å |
            | 0130 | 0049 | C4B0 | 49 | 1 | 0 | İ | I |
            | 0131 | 0049 | C4B1 | 49 | 0 | 1 | ı | I |
            | 0130 | 0069 | C4B0 | 69 | 1 | 0 | İ | i |
            | 0131 | 0069 | C4B1 | 69 | 0 | 1 | ı | i |
            | 0049 | 0130 | 49 | C4B0 | 1 | 0 | I | İ |
            | 0069 | 0130 | 69 | C4B0 | 1 | 0 | i | İ |
            | 0049 | 0131 | 49 | C4B1 | 0 | 1 | I | ı |
            | 0069 | 0131 | 69 | C4B1 | 0 | 1 | i | ı |
            | 212A | 004B | E284AA | 4B | 1 | 0 | K | K |
            | 212A | 006B | E284AA | 6B | 1 | 0 | K | k |
            | 004B | 212A | 4B | E284AA | 1 | 0 | K | K |
            | 006B | 212A | 6B | E284AA | 1 | 0 | k | K |
            | 017F | 0053 | C5BF | 53 | 0 | 1 | ſ | S |
            | 017F | 0073 | C5BF | 73 | 0 | 1 | ſ | s |
            | 0053 | 017F | 53 | C5BF | 0 | 1 | S | ſ |
            | 0073 | 017F | 73 | C5BF | 0 | 1 | s | ſ |
            | 1E9B | 1E60 | E1BA9B | E1B9A0 | 0 | 1 | ẛ | Ṡ |
            | 1E9B | 1E61 | E1BA9B | E1B9A1 | 0 | 1 | ẛ | ṡ |
            | 1E60 | 1E9B | E1B9A0 | E1BA9B | 0 | 1 | Ṡ | ẛ |
            | 1E61 | 1E9B | E1B9A1 | E1BA9B | 0 | 1 | ṡ | ẛ |
            | 03D0 | 0392 | CF90 | CE92 | 0 | 1 | ϐ | Β |
            | 03D0 | 03B2 | CF90 | CEB2 | 0 | 1 | ϐ | β |
            | 0392 | 03D0 | CE92 | CF90 | 0 | 1 | Β | ϐ |
            | 03B2 | 03D0 | CEB2 | CF90 | 0 | 1 | β | ϐ |
            | 03F5 | 0395 | CFB5 | CE95 | 0 | 1 | ϵ | Ε |
            | 03F5 | 03B5 | CFB5 | CEB5 | 0 | 1 | ϵ | ε |
            | 0395 | 03F5 | CE95 | CFB5 | 0 | 1 | Ε | ϵ |
            | 03B5 | 03F5 | CEB5 | CFB5 | 0 | 1 | ε | ϵ |
            | 03D1 | 0398 | CF91 | CE98 | 0 | 1 | ϑ | Θ |
            | 03F4 | 0398 | CFB4 | CE98 | 1 | 0 | ϴ | Θ |
            | 03D1 | 03B8 | CF91 | CEB8 | 0 | 1 | ϑ | θ |
            | 03F4 | 03B8 | CFB4 | CEB8 | 1 | 0 | ϴ | θ |
            | 0398 | 03D1 | CE98 | CF91 | 0 | 1 | Θ | ϑ |
            | 03B8 | 03D1 | CEB8 | CF91 | 0 | 1 | θ | ϑ |
            | 0398 | 03F4 | CE98 | CFB4 | 1 | 0 | Θ | ϴ |
            | 03B8 | 03F4 | CEB8 | CFB4 | 1 | 0 | θ | ϴ |
            | 0345 | 0399 | CD85 | CE99 | 0 | 1 | ͅ | Ι |
            | 1FBE | 0399 | E1BEBE | CE99 | 0 | 1 | ι | Ι |
            | 0345 | 03B9 | CD85 | CEB9 | 0 | 1 | ͅ | ι |
            | 1FBE | 03B9 | E1BEBE | CEB9 | 0 | 1 | ι | ι |
            | 0345 | 1FBE | CD85 | E1BEBE | 0 | 1 | ͅ | ι |
            | 0399 | 1FBE | CE99 | E1BEBE | 0 | 1 | Ι | ι |
            | 03B9 | 1FBE | CEB9 | E1BEBE | 0 | 1 | ι | ι |
            | 03F0 | 039A | CFB0 | CE9A | 0 | 1 | ϰ | Κ |
            | 03F0 | 03BA | CFB0 | CEBA | 0 | 1 | ϰ | κ |
            | 039A | 03F0 | CE9A | CFB0 | 0 | 1 | Κ | ϰ |
            | 03BA | 03F0 | CEBA | CFB0 | 0 | 1 | κ | ϰ |
            | 039C | 00B5 | CE9C | C2B5 | 0 | 1 | Μ | µ |
            | 03BC | 00B5 | CEBC | C2B5 | 0 | 1 | μ | µ |
            | 00B5 | 039C | C2B5 | CE9C | 0 | 1 | µ | Μ |
            | 00B5 | 03BC | C2B5 | CEBC | 0 | 1 | µ | μ |
            | 03D6 | 03A0 | CF96 | CEA0 | 0 | 1 | ϖ | Π |
            | 03D6 | 03C0 | CF96 | CF80 | 0 | 1 | ϖ | π |
            | 03A0 | 03D6 | CEA0 | CF96 | 0 | 1 | Π | ϖ |
            | 03C0 | 03D6 | CF80 | CF96 | 0 | 1 | π | ϖ |
            | 03F1 | 03A1 | CFB1 | CEA1 | 0 | 1 | ϱ | Ρ |
            | 03F1 | 03C1 | CFB1 | CF81 | 0 | 1 | ϱ | ρ |
            | 03A1 | 03F1 | CEA1 | CFB1 | 0 | 1 | Ρ | ϱ |
            | 03C1 | 03F1 | CF81 | CFB1 | 0 | 1 | ρ | ϱ |
            | 03C2 | 03A3 | CF82 | CEA3 | 0 | 1 | ς | Σ |
            | 03A3 | 03C2 | CEA3 | CF82 | 0 | 1 | Σ | ς |
            | 03C3 | 03C2 | CF83 | CF82 | 0 | 1 | σ | ς |
            | 03C2 | 03C3 | CF82 | CF83 | 0 | 1 | ς | σ |
            | 03D5 | 03A6 | CF95 | CEA6 | 0 | 1 | ϕ | Φ |
            | 03D5 | 03C6 | CF95 | CF86 | 0 | 1 | ϕ | φ |
            | 03A6 | 03D5 | CEA6 | CF95 | 0 | 1 | Φ | ϕ |
            | 03C6 | 03D5 | CF86 | CF95 | 0 | 1 | φ | ϕ |
            | 2126 | 03A9 | E284A6 | CEA9 | 1 | 0 | Ω | Ω |
            | 2126 | 03C9 | E284A6 | CF89 | 1 | 0 | Ω | ω |
            | 03A9 | 2126 | CEA9 | E284A6 | 1 | 0 | Ω | Ω |
            | 03C9 | 2126 | CF89 | E284A6 | 1 | 0 | ω | Ω |
            | 1C80 | 0412 | E1B280 | D092 | 0 | 1 | ᲀ | В |
            | 1C80 | 0432 | E1B280 | D0B2 | 0 | 1 | ᲀ | в |
            | 0412 | 1C80 | D092 | E1B280 | 0 | 1 | В | ᲀ |
            | 0432 | 1C80 | D0B2 | E1B280 | 0 | 1 | в | ᲀ |
            | 1C81 | 0414 | E1B281 | D094 | 0 | 1 | ᲁ | Д |
            | 1C81 | 0434 | E1B281 | D0B4 | 0 | 1 | ᲁ | д |
            | 0414 | 1C81 | D094 | E1B281 | 0 | 1 | Д | ᲁ |
            | 0434 | 1C81 | D0B4 | E1B281 | 0 | 1 | д | ᲁ |
            | 1C82 | 041E | E1B282 | D09E | 0 | 1 | ᲂ | О |
            | 1C82 | 043E | E1B282 | D0BE | 0 | 1 | ᲂ | о |
            | 041E | 1C82 | D09E | E1B282 | 0 | 1 | О | ᲂ |
            | 043E | 1C82 | D0BE | E1B282 | 0 | 1 | о | ᲂ |
            | 1C83 | 0421 | E1B283 | D0A1 | 0 | 1 | ᲃ | С |
            | 1C83 | 0441 | E1B283 | D181 | 0 | 1 | ᲃ | с |
            | 0421 | 1C83 | D0A1 | E1B283 | 0 | 1 | С | ᲃ |
            | 0441 | 1C83 | D181 | E1B283 | 0 | 1 | с | ᲃ |
            | 1C84 | 0422 | E1B284 | D0A2 | 0 | 1 | ᲄ | Т |
            | 1C85 | 0422 | E1B285 | D0A2 | 0 | 1 | ᲅ | Т |
            | 1C84 | 0442 | E1B284 | D182 | 0 | 1 | ᲄ | т |
            | 1C85 | 0442 | E1B285 | D182 | 0 | 1 | ᲅ | т |
            | 0422 | 1C84 | D0A2 | E1B284 | 0 | 1 | Т | ᲄ |
            | 0442 | 1C84 | D182 | E1B284 | 0 | 1 | т | ᲄ |
            | 1C85 | 1C84 | E1B285 | E1B284 | 0 | 1 | ᲅ | ᲄ |
            | 0422 | 1C85 | D0A2 | E1B285 | 0 | 1 | Т | ᲅ |
            | 0442 | 1C85 | D182 | E1B285 | 0 | 1 | т | ᲅ |
            | 1C84 | 1C85 | E1B284 | E1B285 | 0 | 1 | ᲄ | ᲅ |
            | A64A | 1C88 | EA998A | E1B288 | 0 | 1 | Ꙋ | ᲈ |
            | A64B | 1C88 | EA998B | E1B288 | 0 | 1 | ꙋ | ᲈ |
            | 1C88 | A64A | E1B288 | EA998A | 0 | 1 | ᲈ | Ꙋ |
            | 1C88 | A64B | E1B288 | EA998B | 0 | 1 | ᲈ | ꙋ |
            | 1C86 | 042A | E1B286 | D0AA | 0 | 1 | ᲆ | Ъ |
            | 1C86 | 044A | E1B286 | D18A | 0 | 1 | ᲆ | ъ |
            | 042A | 1C86 | D0AA | E1B286 | 0 | 1 | Ъ | ᲆ |
            | 044A | 1C86 | D18A | E1B286 | 0 | 1 | ъ | ᲆ |
            | 1C87 | 0462 | E1B287 | D1A2 | 0 | 1 | ᲇ | Ѣ |
            | 1C87 | 0463 | E1B287 | D1A3 | 0 | 1 | ᲇ | ѣ |
            | 0462 | 1C87 | D1A2 | E1B287 | 0 | 1 | Ѣ | ᲇ |
            | 0463 | 1C87 | D1A3 | E1B287 | 0 | 1 | ѣ | ᲇ |
            +-----------+-----------+--------+--------+----------+----------+------+------+
            {noformat}


            Note, toupper comparison considers more pairs as equal than tolower comparison:
            {code:sql}
            SELECT SUM(eq_lower), SUM(eq_upper) FROM t21;
            {code}
            {noformat}
            +---------------+---------------+
            | SUM(eq_lower) | SUM(eq_upper) |
            +---------------+---------------+
            | 21 | 96 |
            +---------------+---------------+
            {noformat}

            h2. A more compact table with distinct pairs (the character with a smaller code point is on the left)
            {code:sql}
            SELECT
              HEX(CONVERT(a USING ucs2)) AS unicode_a,
              HEX(CONVERT(b USING ucs2)) AS unicode_b,
              HEX(a), HEX(b),
              BINARY LOWER(a)=LOWER(b) AS eq_lower,
              BINARY UPPER(a)=UPPER(b) AS eq_upper,
              a,b
            FROM
            (
              SELECT DISTINCT IF(BINARY a<b,a,b) AS a,IF(binary a>=b,a,b) AS b from t21
            ) d1
            ORDER BY eq_lower, unicode_a, unicode_b;
            {code}
            {noformat}
            +-----------+-----------+--------+--------+----------+----------+-----+-----+
            | unicode_a | unicode_b | HEX(a) | HEX(b) | eq_lower | eq_upper | a | b |
            +-----------+-----------+--------+--------+----------+----------+-----+-----+
            | 0049 | 0131 | 49 | C4B1 | 0 | 1 | I | ı |
            | 0053 | 017F | 53 | C5BF | 0 | 1 | S | ſ |
            | 00B5 | 039C | C2B5 | CE9C | 0 | 1 | µ | Μ |
            | 0345 | 0399 | CD85 | CE99 | 0 | 1 | ͅ | Ι |
            | 0392 | 03D0 | CE92 | CF90 | 0 | 1 | Β | ϐ |
            | 0395 | 03F5 | CE95 | CFB5 | 0 | 1 | Ε | ϵ |
            | 0398 | 03D1 | CE98 | CF91 | 0 | 1 | Θ | ϑ |
            | 0399 | 1FBE | CE99 | E1BEBE | 0 | 1 | Ι | ι |
            | 039A | 03F0 | CE9A | CFB0 | 0 | 1 | Κ | ϰ |
            | 03A0 | 03D6 | CEA0 | CF96 | 0 | 1 | Π | ϖ |
            | 03A1 | 03F1 | CEA1 | CFB1 | 0 | 1 | Ρ | ϱ |
            | 03A3 | 03C2 | CEA3 | CF82 | 0 | 1 | Σ | ς |
            | 03A6 | 03D5 | CEA6 | CF95 | 0 | 1 | Φ | ϕ |
            | 0412 | 1C80 | D092 | E1B280 | 0 | 1 | В | ᲀ |
            | 0414 | 1C81 | D094 | E1B281 | 0 | 1 | Д | ᲁ |
            | 041E | 1C82 | D09E | E1B282 | 0 | 1 | О | ᲂ |
            | 0421 | 1C83 | D0A1 | E1B283 | 0 | 1 | С | ᲃ |
            | 0422 | 1C84 | D0A2 | E1B284 | 0 | 1 | Т | ᲄ |
            | 042A | 1C86 | D0AA | E1B286 | 0 | 1 | Ъ | ᲆ |
            | 0462 | 1C87 | D1A2 | E1B287 | 0 | 1 | Ѣ | ᲇ |
            | 1C88 | A64A | E1B288 | EA998A | 0 | 1 | ᲈ | Ꙋ |
            | 0049 | 0130 | 49 | C4B0 | 1 | 0 | I | İ |
            | 004B | 212A | 4B | E284AA | 1 | 0 | K | K |
            | 00C5 | 212B | C385 | E284AB | 1 | 0 | Å | Å |
            | 00DF | 1E9E | C39F | E1BA9E | 1 | 0 | ß | ẞ |
            | 03A9 | 2126 | CEA9 | E284A6 | 1 | 0 | Ω | Ω |
            +-----------+-----------+--------+--------+----------+----------+-----+-----+
            26 rows in set (0.005 sec)
            {noformat}

            h2. Cons of toupper vs tolower comparison
            - The tolower variant will compare exactly like strcasecmp() did
            - The toupper variant will compare close to how utf8mb3_general_ci works
            bar Alexander Barkov made changes -
            ParadoxV5 Jimmy Hú made changes -
            ParadoxV5 Jimmy Hú made changes -
            ParadoxV5 Jimmy Hú made changes -
            Roel Roel Van de Paar made changes -

            People

              bar Alexander Barkov
              bar Alexander Barkov
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.