Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-31340

Remove MY_COLLATION_HANDLER::strcasecmp()

Details

    Description

      There are two parallel comparison systems in MariaDB collation library, implemented as virtual functions in MY_COLLATION_HANDLER:

      • Comparison according to the collation, provided by these functions

          int     (*strnncoll)(CHARSET_INFO *,
                               const uchar *, size_t, const uchar *, size_t, my_bool);
          int     (*strnncollsp)(CHARSET_INFO *,
                                 const uchar *, size_t, const uchar *, size_t);
         
          int     (*strnncollsp_nchars)(CHARSET_INFO *,
                                        const uchar *str1, size_t len1,
                                        const uchar *str2, size_t len2,
                                        size_t nchars,
                                        uint flags);
        

      • Comparison in case insensitive (but accent sensitive) style, implemented by this function:

          int  (*strcasecmp)(CHARSET_INFO *, const char *, const char *);
        

        Note, accent and case sensitivity of the collation does not matter. strcasecmp() always works using accent sensitive case insensitive comparison style.

      These two parallel systems are redundant.

      Note, strcasecmp() is used mostly to compare identifiers, while the functions of the first group are used to compare data.

      Let's get rid of the second comparison system:
      1. Remove MY_COLLATION_HANDLER::strcasecmp()
      2. Introduce a new collation utf8mb4_general1400_as_ci. Note, it should work for the entire Unicode range U+0000..U+10FFFF.
      3. Replace all calls for:

      system_charset_info->coll->strcasecmp()
      

      to calls for

      my_charset_utf8mb4_general1400_as_ci->coll->strnncoll***()
      

      The change would generally be quite mechanic. However, there is one small problem: strcasecmp() accepts 0-terminated strings, while the strnncoll-alike functions accept the pointer and the length. So some refactoring will be needed. Note, Monty earlier changed many MariaDB C data types to use LEX_CSTRING (instead of just a const char pointer) to store names. So this part of the current task will switch some more C data types to LEX_CSTRING.

      tolower vs toupper comparison

      Another option is to implement utf8mb3_general1400_as_ci which will compare upper cases (instead of lower cases).

      The difference is in a few dozen BMP characters. This script finds all those characters:

      CREATE OR REPLACE TABLE t1 (a CHAR(1) CHARACTER SET utf8mb4 COLLATE utf8mb4_uca1400_ai_ci) ENGINE=MyISAM;
      DELIMITER $$
      FOR i IN 0..0xFFFF
      DO
        INSERT INTO t1 VALUES (CHAR(i USING ucs2));
      END FOR;
      $$
      DELIMITER ;
      ALTER TABLE t1
        ADD has_casefolding INT DEFAULT (BINARY LOWER(a)<>a OR BINARY UPPER(a)<>a),
        ADD KEY(has_casefolding, a);
       
      CREATE OR REPLACE TABLE t21 AS
      SELECT
        HEX(t1.a) AS hex_a, HEX(t2.a) AS hex_b,
        BINARY LOWER(t1.a)=LOWER(t2.a) eq_lower,
        BINARY UPPER(t1.a)=UPPER(t2.a) AS eq_upper,
        t1.a AS a, t2.a AS b
      FROM
        t1 t1, t1 t2
      WHERE
        t1.has_casefolding=1
      AND (BINARY LOWER(t1.a)=LOWER(t2.a))<>(BINARY UPPER(t1.a)=UPPER(t2.a));
       
      SELECT
        HEX(CONVERT(a USING ucs2)) AS unicode_a,
        HEX(CONVERT(b USING ucs2)) AS unicode_b,
        t21.* FROM t21;
      

      +-----------+-----------+--------+--------+----------+----------+------+------+
      | unicode_a | unicode_b | hex_a  | hex_b  | eq_lower | eq_upper | a    | b    |
      +-----------+-----------+--------+--------+----------+----------+------+------+
      |      1E9E |      00DF | E1BA9E |   C39F |        1 |        0 |    ẞ |    ß |
      |      0399 |      0345 |   CE99 |   CD85 |        0 |        1 |    Ι |    ͅ |
      |      03B9 |      0345 |   CEB9 |   CD85 |        0 |        1 |    ι |    ͅ |
      |      1FBE |      0345 | E1BEBE |   CD85 |        0 |        1 |    ι |    ͅ |
      |      212B |      00C5 | E284AB |   C385 |        1 |        0 |    Å |    Å |
      |      212B |      00E5 | E284AB |   C3A5 |        1 |        0 |    Å |    å |
      |      00C5 |      212B |   C385 | E284AB |        1 |        0 |    Å |    Å |
      |      00E5 |      212B |   C3A5 | E284AB |        1 |        0 |    å |    Å |
      |      0130 |      0049 |   C4B0 |     49 |        1 |        0 |    İ |    I |
      |      0131 |      0049 |   C4B1 |     49 |        0 |        1 |    ı |    I |
      |      0130 |      0069 |   C4B0 |     69 |        1 |        0 |    İ |    i |
      |      0131 |      0069 |   C4B1 |     69 |        0 |        1 |    ı |    i |
      |      0049 |      0130 |     49 |   C4B0 |        1 |        0 |    I |    İ |
      |      0069 |      0130 |     69 |   C4B0 |        1 |        0 |    i |    İ |
      |      0049 |      0131 |     49 |   C4B1 |        0 |        1 |    I |    ı |
      |      0069 |      0131 |     69 |   C4B1 |        0 |        1 |    i |    ı |
      |      212A |      004B | E284AA |     4B |        1 |        0 |    K |    K |
      |      212A |      006B | E284AA |     6B |        1 |        0 |    K |    k |
      |      004B |      212A |     4B | E284AA |        1 |        0 |    K |    K |
      |      006B |      212A |     6B | E284AA |        1 |        0 |    k |    K |
      |      017F |      0053 |   C5BF |     53 |        0 |        1 |    ſ |    S |
      |      017F |      0073 |   C5BF |     73 |        0 |        1 |    ſ |    s |
      |      0053 |      017F |     53 |   C5BF |        0 |        1 |    S |    ſ |
      |      0073 |      017F |     73 |   C5BF |        0 |        1 |    s |    ſ |
      |      1E9B |      1E60 | E1BA9B | E1B9A0 |        0 |        1 |    ẛ |    Ṡ |
      |      1E9B |      1E61 | E1BA9B | E1B9A1 |        0 |        1 |    ẛ |    ṡ |
      |      1E60 |      1E9B | E1B9A0 | E1BA9B |        0 |        1 |    Ṡ |    ẛ |
      |      1E61 |      1E9B | E1B9A1 | E1BA9B |        0 |        1 |    ṡ |    ẛ |
      |      03D0 |      0392 |   CF90 |   CE92 |        0 |        1 |    ϐ |    Β |
      |      03D0 |      03B2 |   CF90 |   CEB2 |        0 |        1 |    ϐ |    β |
      |      0392 |      03D0 |   CE92 |   CF90 |        0 |        1 |    Β |    ϐ |
      |      03B2 |      03D0 |   CEB2 |   CF90 |        0 |        1 |    β |    ϐ |
      |      03F5 |      0395 |   CFB5 |   CE95 |        0 |        1 |    ϵ |    Ε |
      |      03F5 |      03B5 |   CFB5 |   CEB5 |        0 |        1 |    ϵ |    ε |
      |      0395 |      03F5 |   CE95 |   CFB5 |        0 |        1 |    Ε |    ϵ |
      |      03B5 |      03F5 |   CEB5 |   CFB5 |        0 |        1 |    ε |    ϵ |
      |      03D1 |      0398 |   CF91 |   CE98 |        0 |        1 |    ϑ |    Θ |
      |      03F4 |      0398 |   CFB4 |   CE98 |        1 |        0 |    ϴ |    Θ |
      |      03D1 |      03B8 |   CF91 |   CEB8 |        0 |        1 |    ϑ |    θ |
      |      03F4 |      03B8 |   CFB4 |   CEB8 |        1 |        0 |    ϴ |    θ |
      |      0398 |      03D1 |   CE98 |   CF91 |        0 |        1 |    Θ |    ϑ |
      |      03B8 |      03D1 |   CEB8 |   CF91 |        0 |        1 |    θ |    ϑ |
      |      0398 |      03F4 |   CE98 |   CFB4 |        1 |        0 |    Θ |    ϴ |
      |      03B8 |      03F4 |   CEB8 |   CFB4 |        1 |        0 |    θ |    ϴ |
      |      0345 |      0399 |   CD85 |   CE99 |        0 |        1 |    ͅ |    Ι |
      |      1FBE |      0399 | E1BEBE |   CE99 |        0 |        1 |    ι |    Ι |
      |      0345 |      03B9 |   CD85 |   CEB9 |        0 |        1 |    ͅ |    ι |
      |      1FBE |      03B9 | E1BEBE |   CEB9 |        0 |        1 |    ι |    ι |
      |      0345 |      1FBE |   CD85 | E1BEBE |        0 |        1 |    ͅ |    ι |
      |      0399 |      1FBE |   CE99 | E1BEBE |        0 |        1 |    Ι |    ι |
      |      03B9 |      1FBE |   CEB9 | E1BEBE |        0 |        1 |    ι |    ι |
      |      03F0 |      039A |   CFB0 |   CE9A |        0 |        1 |    ϰ |    Κ |
      |      03F0 |      03BA |   CFB0 |   CEBA |        0 |        1 |    ϰ |    κ |
      |      039A |      03F0 |   CE9A |   CFB0 |        0 |        1 |    Κ |    ϰ |
      |      03BA |      03F0 |   CEBA |   CFB0 |        0 |        1 |    κ |    ϰ |
      |      039C |      00B5 |   CE9C |   C2B5 |        0 |        1 |    Μ |    µ |
      |      03BC |      00B5 |   CEBC |   C2B5 |        0 |        1 |    μ |    µ |
      |      00B5 |      039C |   C2B5 |   CE9C |        0 |        1 |    µ |    Μ |
      |      00B5 |      03BC |   C2B5 |   CEBC |        0 |        1 |    µ |    μ |
      |      03D6 |      03A0 |   CF96 |   CEA0 |        0 |        1 |    ϖ |    Π |
      |      03D6 |      03C0 |   CF96 |   CF80 |        0 |        1 |    ϖ |    π |
      |      03A0 |      03D6 |   CEA0 |   CF96 |        0 |        1 |    Π |    ϖ |
      |      03C0 |      03D6 |   CF80 |   CF96 |        0 |        1 |    π |    ϖ |
      |      03F1 |      03A1 |   CFB1 |   CEA1 |        0 |        1 |    ϱ |    Ρ |
      |      03F1 |      03C1 |   CFB1 |   CF81 |        0 |        1 |    ϱ |    ρ |
      |      03A1 |      03F1 |   CEA1 |   CFB1 |        0 |        1 |    Ρ |    ϱ |
      |      03C1 |      03F1 |   CF81 |   CFB1 |        0 |        1 |    ρ |    ϱ |
      |      03C2 |      03A3 |   CF82 |   CEA3 |        0 |        1 |    ς |    Σ |
      |      03A3 |      03C2 |   CEA3 |   CF82 |        0 |        1 |    Σ |    ς |
      |      03C3 |      03C2 |   CF83 |   CF82 |        0 |        1 |    σ |    ς |
      |      03C2 |      03C3 |   CF82 |   CF83 |        0 |        1 |    ς |    σ |
      |      03D5 |      03A6 |   CF95 |   CEA6 |        0 |        1 |    ϕ |    Φ |
      |      03D5 |      03C6 |   CF95 |   CF86 |        0 |        1 |    ϕ |    φ |
      |      03A6 |      03D5 |   CEA6 |   CF95 |        0 |        1 |    Φ |    ϕ |
      |      03C6 |      03D5 |   CF86 |   CF95 |        0 |        1 |    φ |    ϕ |
      |      2126 |      03A9 | E284A6 |   CEA9 |        1 |        0 |    Ω |    Ω |
      |      2126 |      03C9 | E284A6 |   CF89 |        1 |        0 |    Ω |    ω |
      |      03A9 |      2126 |   CEA9 | E284A6 |        1 |        0 |    Ω |    Ω |
      |      03C9 |      2126 |   CF89 | E284A6 |        1 |        0 |    ω |    Ω |
      |      1C80 |      0412 | E1B280 |   D092 |        0 |        1 |    ᲀ |    В |
      |      1C80 |      0432 | E1B280 |   D0B2 |        0 |        1 |    ᲀ |    в |
      |      0412 |      1C80 |   D092 | E1B280 |        0 |        1 |    В |    ᲀ |
      |      0432 |      1C80 |   D0B2 | E1B280 |        0 |        1 |    в |    ᲀ |
      |      1C81 |      0414 | E1B281 |   D094 |        0 |        1 |    ᲁ |    Д |
      |      1C81 |      0434 | E1B281 |   D0B4 |        0 |        1 |    ᲁ |    д |
      |      0414 |      1C81 |   D094 | E1B281 |        0 |        1 |    Д |    ᲁ |
      |      0434 |      1C81 |   D0B4 | E1B281 |        0 |        1 |    д |    ᲁ |
      |      1C82 |      041E | E1B282 |   D09E |        0 |        1 |    ᲂ |    О |
      |      1C82 |      043E | E1B282 |   D0BE |        0 |        1 |    ᲂ |    о |
      |      041E |      1C82 |   D09E | E1B282 |        0 |        1 |    О |    ᲂ |
      |      043E |      1C82 |   D0BE | E1B282 |        0 |        1 |    о |    ᲂ |
      |      1C83 |      0421 | E1B283 |   D0A1 |        0 |        1 |    ᲃ |    С |
      |      1C83 |      0441 | E1B283 |   D181 |        0 |        1 |    ᲃ |    с |
      |      0421 |      1C83 |   D0A1 | E1B283 |        0 |        1 |    С |    ᲃ |
      |      0441 |      1C83 |   D181 | E1B283 |        0 |        1 |    с |    ᲃ |
      |      1C84 |      0422 | E1B284 |   D0A2 |        0 |        1 |    ᲄ |    Т |
      |      1C85 |      0422 | E1B285 |   D0A2 |        0 |        1 |    ᲅ |    Т |
      |      1C84 |      0442 | E1B284 |   D182 |        0 |        1 |    ᲄ |    т |
      |      1C85 |      0442 | E1B285 |   D182 |        0 |        1 |    ᲅ |    т |
      |      0422 |      1C84 |   D0A2 | E1B284 |        0 |        1 |    Т |    ᲄ |
      |      0442 |      1C84 |   D182 | E1B284 |        0 |        1 |    т |    ᲄ |
      |      1C85 |      1C84 | E1B285 | E1B284 |        0 |        1 |    ᲅ |    ᲄ |
      |      0422 |      1C85 |   D0A2 | E1B285 |        0 |        1 |    Т |    ᲅ |
      |      0442 |      1C85 |   D182 | E1B285 |        0 |        1 |    т |    ᲅ |
      |      1C84 |      1C85 | E1B284 | E1B285 |        0 |        1 |    ᲄ |    ᲅ |
      |      A64A |      1C88 | EA998A | E1B288 |        0 |        1 |    Ꙋ |    ᲈ |
      |      A64B |      1C88 | EA998B | E1B288 |        0 |        1 |    ꙋ |    ᲈ |
      |      1C88 |      A64A | E1B288 | EA998A |        0 |        1 |    ᲈ |    Ꙋ |
      |      1C88 |      A64B | E1B288 | EA998B |        0 |        1 |    ᲈ |    ꙋ |
      |      1C86 |      042A | E1B286 |   D0AA |        0 |        1 |    ᲆ |    Ъ |
      |      1C86 |      044A | E1B286 |   D18A |        0 |        1 |    ᲆ |    ъ |
      |      042A |      1C86 |   D0AA | E1B286 |        0 |        1 |    Ъ |    ᲆ |
      |      044A |      1C86 |   D18A | E1B286 |        0 |        1 |    ъ |    ᲆ |
      |      1C87 |      0462 | E1B287 |   D1A2 |        0 |        1 |    ᲇ |    Ѣ |
      |      1C87 |      0463 | E1B287 |   D1A3 |        0 |        1 |    ᲇ |    ѣ |
      |      0462 |      1C87 |   D1A2 | E1B287 |        0 |        1 |    Ѣ |    ᲇ |
      |      0463 |      1C87 |   D1A3 | E1B287 |        0 |        1 |    ѣ |    ᲇ |
      +-----------+-----------+--------+--------+----------+----------+------+------+
      

      Note, toupper comparison considers more pairs as equal than tolower comparison:

      SELECT SUM(eq_lower), SUM(eq_upper) FROM t21;
      

      +---------------+---------------+
      | SUM(eq_lower) | SUM(eq_upper) |
      +---------------+---------------+
      |            21 |            96 |
      +---------------+---------------+
      

      A more compact table with distinct pairs (the character with a smaller code point is on the left)

      SELECT
        HEX(CONVERT(a USING ucs2)) AS unicode_a,
        HEX(CONVERT(b USING ucs2)) AS unicode_b,
        HEX(a), HEX(b),
        BINARY LOWER(a)=LOWER(b) AS eq_lower,
        BINARY UPPER(a)=UPPER(b) AS eq_upper,
        a,b
      FROM
      (
        SELECT DISTINCT IF(BINARY a<b,a,b) AS a,IF(binary a>=b,a,b) AS b from t21
      ) d1
      ORDER BY eq_lower, unicode_a, unicode_b;
      

      +-----------+-----------+--------+--------+----------+----------+-----+-----+
      | unicode_a | unicode_b | HEX(a) | HEX(b) | eq_lower | eq_upper | a   | b   |
      +-----------+-----------+--------+--------+----------+----------+-----+-----+
      |      0049 |      0131 |     49 |   C4B1 |        0 |        1 |   I |   ı |
      |      0053 |      017F |     53 |   C5BF |        0 |        1 |   S |   ſ |
      |      00B5 |      039C |   C2B5 |   CE9C |        0 |        1 |   µ |   Μ |
      |      0345 |      0399 |   CD85 |   CE99 |        0 |        1 |   ͅ |   Ι |
      |      0392 |      03D0 |   CE92 |   CF90 |        0 |        1 |   Β |   ϐ |
      |      0395 |      03F5 |   CE95 |   CFB5 |        0 |        1 |   Ε |   ϵ |
      |      0398 |      03D1 |   CE98 |   CF91 |        0 |        1 |   Θ |   ϑ |
      |      0399 |      1FBE |   CE99 | E1BEBE |        0 |        1 |   Ι |   ι |
      |      039A |      03F0 |   CE9A |   CFB0 |        0 |        1 |   Κ |   ϰ |
      |      03A0 |      03D6 |   CEA0 |   CF96 |        0 |        1 |   Π |   ϖ |
      |      03A1 |      03F1 |   CEA1 |   CFB1 |        0 |        1 |   Ρ |   ϱ |
      |      03A3 |      03C2 |   CEA3 |   CF82 |        0 |        1 |   Σ |   ς |
      |      03A6 |      03D5 |   CEA6 |   CF95 |        0 |        1 |   Φ |   ϕ |
      |      0412 |      1C80 |   D092 | E1B280 |        0 |        1 |   В |   ᲀ |
      |      0414 |      1C81 |   D094 | E1B281 |        0 |        1 |   Д |   ᲁ |
      |      041E |      1C82 |   D09E | E1B282 |        0 |        1 |   О |   ᲂ |
      |      0421 |      1C83 |   D0A1 | E1B283 |        0 |        1 |   С |   ᲃ |
      |      0422 |      1C84 |   D0A2 | E1B284 |        0 |        1 |   Т |   ᲄ |
      |      042A |      1C86 |   D0AA | E1B286 |        0 |        1 |   Ъ |   ᲆ |
      |      0462 |      1C87 |   D1A2 | E1B287 |        0 |        1 |   Ѣ |   ᲇ |
      |      1C88 |      A64A | E1B288 | EA998A |        0 |        1 |   ᲈ |   Ꙋ |
      |      0049 |      0130 |     49 |   C4B0 |        1 |        0 |   I |   İ |
      |      004B |      212A |     4B | E284AA |        1 |        0 |   K |   K |
      |      00C5 |      212B |   C385 | E284AB |        1 |        0 |   Å |   Å |
      |      00DF |      1E9E |   C39F | E1BA9E |        1 |        0 |   ß |   ẞ |
      |      03A9 |      2126 |   CEA9 | E284A6 |        1 |        0 |   Ω |   Ω |
      +-----------+-----------+--------+--------+----------+----------+-----+-----+
      26 rows in set (0.005 sec)
      

      Cons of toupper vs tolower comparison

      • The tolower variant will compare exactly like strcasecmp() did
      • The toupper variant will compare close to how utf8mb3_general_ci works

      Attachments

        Issue Links

          Activity

            bar Alexander Barkov created issue -
            bar Alexander Barkov made changes -
            Field Original Value New Value
            bar Alexander Barkov made changes -
            Priority Major [ 3 ] Critical [ 2 ]
            bar Alexander Barkov made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            serg Sergei Golubchik made changes -
            Fix Version/s 11.3 [ 28565 ]
            Fix Version/s 11.2 [ 28603 ]
            serg Sergei Golubchik made changes -
            Fix Version/s 11.4 [ 29301 ]
            Fix Version/s 11.3 [ 28565 ]
            bar Alexander Barkov added a comment - - edited
            bar Alexander Barkov added a comment - - edited Hello serg , Please review a patch for this task: https://github.com/MariaDB/server/commit/02b85f1cf9bbdaf4c0fb65ab4c259eb1d8446e1c Thanks.
            bar Alexander Barkov made changes -
            Assignee Alexander Barkov [ bar ] Sergei Golubchik [ serg ]
            Status In Progress [ 3 ] In Review [ 10002 ]
            julien.fritsch Julien Fritsch made changes -
            Issue Type Task [ 3 ] New Feature [ 2 ]
            ralf.gebhardt Ralf Gebhardt made changes -
            Issue Type New Feature [ 2 ] Task [ 3 ]
            bar Alexander Barkov made changes -
            Description There are two parallel comparison systems in MariaDB collation library, implemented as virtual functions in MY_COLLATION_HANDLER:

            - Comparison according to the collation, provided by these functions
            {code:cpp}
              int (*strnncoll)(CHARSET_INFO *,
                                   const uchar *, size_t, const uchar *, size_t, my_bool);
              int (*strnncollsp)(CHARSET_INFO *,
                                     const uchar *, size_t, const uchar *, size_t);

              int (*strnncollsp_nchars)(CHARSET_INFO *,
                                            const uchar *str1, size_t len1,
                                            const uchar *str2, size_t len2,
                                            size_t nchars,
                                            uint flags);
            {code}


            - Comparison in case insensitive (but accent sensitive) style, implemented by this function:
            {code:cpp}
              int (*strcasecmp)(CHARSET_INFO *, const char *, const char *);
            {code}
            Note, accent and case sensitivity of the collation does not matter. strcasecmp() always works using accent sensitive case insensitive comparison style.

            These two parallel systems are redundant.

            Note, strcasecmp() is used mostly to compare identifiers, while the functions of the first group are used to compare data.

            Let's get rid of the second comparison system:
            1. Remove MY_COLLATION_HANDLER::strcasecmp()
            2. Introduce a new collation utf8mb4_tolower_as_ci. Note, it should work for the entire Unicode range U+0000..U+10FFFF.
            3. Replace all calls for:
            {code:cpp}
            system_charset_info->coll->strcasecmp()
            {code}
            to calls for
            {code:cpp}
            my_charset_utf8mb4_tolower_as_ci->coll->strnncoll***()
            {code}

            The change would generally be quite mechanic. However, there is one small problem: strcasecmp() accepts 0-terminated strings, while the strnncoll-alike functions accept the pointer and the length. So some refactoring will be needed. Note, Monty earlier changed many MariaDB C data types to use LEX_CSTRING (instead of just a const char pointer) to store names. So this part of the current task will switch some more C data types to LEX_CSTRING.


            There are two parallel comparison systems in MariaDB collation library, implemented as virtual functions in MY_COLLATION_HANDLER:

            - Comparison according to the collation, provided by these functions
            {code:cpp}
              int (*strnncoll)(CHARSET_INFO *,
                                   const uchar *, size_t, const uchar *, size_t, my_bool);
              int (*strnncollsp)(CHARSET_INFO *,
                                     const uchar *, size_t, const uchar *, size_t);

              int (*strnncollsp_nchars)(CHARSET_INFO *,
                                            const uchar *str1, size_t len1,
                                            const uchar *str2, size_t len2,
                                            size_t nchars,
                                            uint flags);
            {code}


            - Comparison in case insensitive (but accent sensitive) style, implemented by this function:
            {code:cpp}
              int (*strcasecmp)(CHARSET_INFO *, const char *, const char *);
            {code}
            Note, accent and case sensitivity of the collation does not matter. strcasecmp() always works using accent sensitive case insensitive comparison style.

            These two parallel systems are redundant.

            Note, strcasecmp() is used mostly to compare identifiers, while the functions of the first group are used to compare data.

            Let's get rid of the second comparison system:
            1. Remove MY_COLLATION_HANDLER::strcasecmp()
            2. Introduce a new collation utf8mb4_tolower_as_ci. Note, it should work for the entire Unicode range U+0000..U+10FFFF.
            3. Replace all calls for:
            {code:cpp}
            system_charset_info->coll->strcasecmp()
            {code}
            to calls for
            {code:cpp}
            my_charset_utf8mb4_tolower_as_ci->coll->strnncoll***()
            {code}

            The change would generally be quite mechanic. However, there is one small problem: strcasecmp() accepts 0-terminated strings, while the strnncoll-alike functions accept the pointer and the length. So some refactoring will be needed. Note, Monty earlier changed many MariaDB C data types to use LEX_CSTRING (instead of just a const char pointer) to store names. So this part of the current task will switch some more C data types to LEX_CSTRING.

            h2. tolower vs toupper comparison

            Another option is to implement utf8mb3_toupper_ci which will compare upper cases (instead of lower cases).


            The difference is in a few dozen characters. This script finds all those characters:
            {code:sql}
            CREATE OR REPLACE TABLE t1 (a CHAR(1) CHARACTER SET utf8mb4 COLLATE utf8mb4_uca1400_ai_ci) ENGINE=MyISAM;
            DELIMITER $$
            FOR i IN 0..0xFFFF
            DO
              INSERT INTO t1 VALUES (CHAR(i USING ucs2));
            END FOR;
            $$
            DELIMITER ;
            ALTER TABLE t1
              ADD has_casefolding INT DEFAULT (BINARY LOWER(a)<>a OR BINARY UPPER(a)<>a),
              ADD KEY(has_casefolding, a);

            CREATE OR REPLACE TABLE t21 AS
            SELECT
              HEX(t1.a) AS hex_a, HEX(t2.a) AS hex_b,
              BINARY LOWER(t1.a)=LOWER(t2.a) eq_lower,
              BINARY UPPER(t1.a)=UPPER(t2.a) AS eq_upper,
              t1.a AS a, t2.a AS b
            FROM
              t1 t1, t1 t2
            WHERE
              t1.has_casefolding=1
            AND (BINARY LOWER(t1.a)=LOWER(t2.a))<>(BINARY UPPER(t1.a)=UPPER(t2.a));

            SELECT * FROM t21;
            {code}
            {noformat}
            +--------+--------+----------+----------+------+------+
            | hex_a | hex_b | eq_lower | eq_upper | a | b |
            +--------+--------+----------+----------+------+------+
            | E1BA9E | C39F | 1 | 0 | ẞ | ß |
            | CE99 | CD85 | 0 | 1 | Ι | ͅ |
            | CEB9 | CD85 | 0 | 1 | ι | ͅ |
            | E1BEBE | CD85 | 0 | 1 | ι | ͅ |
            | E284AB | C385 | 1 | 0 | Å | Å |
            | E284AB | C3A5 | 1 | 0 | Å | å |
            | C385 | E284AB | 1 | 0 | Å | Å |
            | C3A5 | E284AB | 1 | 0 | å | Å |
            | C4B0 | 49 | 1 | 0 | İ | I |
            | C4B1 | 49 | 0 | 1 | ı | I |
            | C4B0 | 69 | 1 | 0 | İ | i |
            | C4B1 | 69 | 0 | 1 | ı | i |
            | 49 | C4B0 | 1 | 0 | I | İ |
            | 69 | C4B0 | 1 | 0 | i | İ |
            | 49 | C4B1 | 0 | 1 | I | ı |
            | 69 | C4B1 | 0 | 1 | i | ı |
            | E284AA | 4B | 1 | 0 | K | K |
            | E284AA | 6B | 1 | 0 | K | k |
            | 4B | E284AA | 1 | 0 | K | K |
            | 6B | E284AA | 1 | 0 | k | K |
            | C5BF | 53 | 0 | 1 | ſ | S |
            | C5BF | 73 | 0 | 1 | ſ | s |
            | 53 | C5BF | 0 | 1 | S | ſ |
            | 73 | C5BF | 0 | 1 | s | ſ |
            | E1BA9B | E1B9A0 | 0 | 1 | ẛ | Ṡ |
            | E1BA9B | E1B9A1 | 0 | 1 | ẛ | ṡ |
            | E1B9A0 | E1BA9B | 0 | 1 | Ṡ | ẛ |
            | E1B9A1 | E1BA9B | 0 | 1 | ṡ | ẛ |
            | CF90 | CE92 | 0 | 1 | ϐ | Β |
            | CF90 | CEB2 | 0 | 1 | ϐ | β |
            | CE92 | CF90 | 0 | 1 | Β | ϐ |
            | CEB2 | CF90 | 0 | 1 | β | ϐ |
            | CFB5 | CE95 | 0 | 1 | ϵ | Ε |
            | CFB5 | CEB5 | 0 | 1 | ϵ | ε |
            | CE95 | CFB5 | 0 | 1 | Ε | ϵ |
            | CEB5 | CFB5 | 0 | 1 | ε | ϵ |
            | CF91 | CE98 | 0 | 1 | ϑ | Θ |
            | CFB4 | CE98 | 1 | 0 | ϴ | Θ |
            | CF91 | CEB8 | 0 | 1 | ϑ | θ |
            | CFB4 | CEB8 | 1 | 0 | ϴ | θ |
            | CE98 | CF91 | 0 | 1 | Θ | ϑ |
            | CEB8 | CF91 | 0 | 1 | θ | ϑ |
            | CE98 | CFB4 | 1 | 0 | Θ | ϴ |
            | CEB8 | CFB4 | 1 | 0 | θ | ϴ |
            | CD85 | CE99 | 0 | 1 | ͅ | Ι |
            | E1BEBE | CE99 | 0 | 1 | ι | Ι |
            | CD85 | CEB9 | 0 | 1 | ͅ | ι |
            | E1BEBE | CEB9 | 0 | 1 | ι | ι |
            | CD85 | E1BEBE | 0 | 1 | ͅ | ι |
            | CE99 | E1BEBE | 0 | 1 | Ι | ι |
            | CEB9 | E1BEBE | 0 | 1 | ι | ι |
            | CFB0 | CE9A | 0 | 1 | ϰ | Κ |
            | CFB0 | CEBA | 0 | 1 | ϰ | κ |
            | CE9A | CFB0 | 0 | 1 | Κ | ϰ |
            | CEBA | CFB0 | 0 | 1 | κ | ϰ |
            | CE9C | C2B5 | 0 | 1 | Μ | µ |
            | CEBC | C2B5 | 0 | 1 | μ | µ |
            | C2B5 | CE9C | 0 | 1 | µ | Μ |
            | C2B5 | CEBC | 0 | 1 | µ | μ |
            | CF96 | CEA0 | 0 | 1 | ϖ | Π |
            | CF96 | CF80 | 0 | 1 | ϖ | π |
            | CEA0 | CF96 | 0 | 1 | Π | ϖ |
            | CF80 | CF96 | 0 | 1 | π | ϖ |
            | CFB1 | CEA1 | 0 | 1 | ϱ | Ρ |
            | CFB1 | CF81 | 0 | 1 | ϱ | ρ |
            | CEA1 | CFB1 | 0 | 1 | Ρ | ϱ |
            | CF81 | CFB1 | 0 | 1 | ρ | ϱ |
            | CF82 | CEA3 | 0 | 1 | ς | Σ |
            | CEA3 | CF82 | 0 | 1 | Σ | ς |
            | CF83 | CF82 | 0 | 1 | σ | ς |
            | CF82 | CF83 | 0 | 1 | ς | σ |
            | CF95 | CEA6 | 0 | 1 | ϕ | Φ |
            | CF95 | CF86 | 0 | 1 | ϕ | φ |
            | CEA6 | CF95 | 0 | 1 | Φ | ϕ |
            | CF86 | CF95 | 0 | 1 | φ | ϕ |
            | E284A6 | CEA9 | 1 | 0 | Ω | Ω |
            | E284A6 | CF89 | 1 | 0 | Ω | ω |
            | CEA9 | E284A6 | 1 | 0 | Ω | Ω |
            | CF89 | E284A6 | 1 | 0 | ω | Ω |
            | E1B280 | D092 | 0 | 1 | ᲀ | В |
            | E1B280 | D0B2 | 0 | 1 | ᲀ | в |
            | D092 | E1B280 | 0 | 1 | В | ᲀ |
            | D0B2 | E1B280 | 0 | 1 | в | ᲀ |
            | E1B281 | D094 | 0 | 1 | ᲁ | Д |
            | E1B281 | D0B4 | 0 | 1 | ᲁ | д |
            | D094 | E1B281 | 0 | 1 | Д | ᲁ |
            | D0B4 | E1B281 | 0 | 1 | д | ᲁ |
            | E1B282 | D09E | 0 | 1 | ᲂ | О |
            | E1B282 | D0BE | 0 | 1 | ᲂ | о |
            | D09E | E1B282 | 0 | 1 | О | ᲂ |
            | D0BE | E1B282 | 0 | 1 | о | ᲂ |
            | E1B283 | D0A1 | 0 | 1 | ᲃ | С |
            | E1B283 | D181 | 0 | 1 | ᲃ | с |
            | D0A1 | E1B283 | 0 | 1 | С | ᲃ |
            | D181 | E1B283 | 0 | 1 | с | ᲃ |
            | E1B284 | D0A2 | 0 | 1 | ᲄ | Т |
            | E1B285 | D0A2 | 0 | 1 | ᲅ | Т |
            | E1B284 | D182 | 0 | 1 | ᲄ | т |
            | E1B285 | D182 | 0 | 1 | ᲅ | т |
            | D0A2 | E1B284 | 0 | 1 | Т | ᲄ |
            | D182 | E1B284 | 0 | 1 | т | ᲄ |
            | E1B285 | E1B284 | 0 | 1 | ᲅ | ᲄ |
            | D0A2 | E1B285 | 0 | 1 | Т | ᲅ |
            | D182 | E1B285 | 0 | 1 | т | ᲅ |
            | E1B284 | E1B285 | 0 | 1 | ᲄ | ᲅ |
            | EA998A | E1B288 | 0 | 1 | Ꙋ | ᲈ |
            | EA998B | E1B288 | 0 | 1 | ꙋ | ᲈ |
            | E1B288 | EA998A | 0 | 1 | ᲈ | Ꙋ |
            | E1B288 | EA998B | 0 | 1 | ᲈ | ꙋ |
            | E1B286 | D0AA | 0 | 1 | ᲆ | Ъ |
            | E1B286 | D18A | 0 | 1 | ᲆ | ъ |
            | D0AA | E1B286 | 0 | 1 | Ъ | ᲆ |
            | D18A | E1B286 | 0 | 1 | ъ | ᲆ |
            | E1B287 | D1A2 | 0 | 1 | ᲇ | Ѣ |
            | E1B287 | D1A3 | 0 | 1 | ᲇ | ѣ |
            | D1A2 | E1B287 | 0 | 1 | Ѣ | ᲇ |
            | D1A3 | E1B287 | 0 | 1 | ѣ | ᲇ |
            +--------+--------+----------+----------+------+------+
            {noformat}

            bar Alexander Barkov made changes -
            Description There are two parallel comparison systems in MariaDB collation library, implemented as virtual functions in MY_COLLATION_HANDLER:

            - Comparison according to the collation, provided by these functions
            {code:cpp}
              int (*strnncoll)(CHARSET_INFO *,
                                   const uchar *, size_t, const uchar *, size_t, my_bool);
              int (*strnncollsp)(CHARSET_INFO *,
                                     const uchar *, size_t, const uchar *, size_t);

              int (*strnncollsp_nchars)(CHARSET_INFO *,
                                            const uchar *str1, size_t len1,
                                            const uchar *str2, size_t len2,
                                            size_t nchars,
                                            uint flags);
            {code}


            - Comparison in case insensitive (but accent sensitive) style, implemented by this function:
            {code:cpp}
              int (*strcasecmp)(CHARSET_INFO *, const char *, const char *);
            {code}
            Note, accent and case sensitivity of the collation does not matter. strcasecmp() always works using accent sensitive case insensitive comparison style.

            These two parallel systems are redundant.

            Note, strcasecmp() is used mostly to compare identifiers, while the functions of the first group are used to compare data.

            Let's get rid of the second comparison system:
            1. Remove MY_COLLATION_HANDLER::strcasecmp()
            2. Introduce a new collation utf8mb4_tolower_as_ci. Note, it should work for the entire Unicode range U+0000..U+10FFFF.
            3. Replace all calls for:
            {code:cpp}
            system_charset_info->coll->strcasecmp()
            {code}
            to calls for
            {code:cpp}
            my_charset_utf8mb4_tolower_as_ci->coll->strnncoll***()
            {code}

            The change would generally be quite mechanic. However, there is one small problem: strcasecmp() accepts 0-terminated strings, while the strnncoll-alike functions accept the pointer and the length. So some refactoring will be needed. Note, Monty earlier changed many MariaDB C data types to use LEX_CSTRING (instead of just a const char pointer) to store names. So this part of the current task will switch some more C data types to LEX_CSTRING.

            h2. tolower vs toupper comparison

            Another option is to implement utf8mb3_toupper_ci which will compare upper cases (instead of lower cases).


            The difference is in a few dozen characters. This script finds all those characters:
            {code:sql}
            CREATE OR REPLACE TABLE t1 (a CHAR(1) CHARACTER SET utf8mb4 COLLATE utf8mb4_uca1400_ai_ci) ENGINE=MyISAM;
            DELIMITER $$
            FOR i IN 0..0xFFFF
            DO
              INSERT INTO t1 VALUES (CHAR(i USING ucs2));
            END FOR;
            $$
            DELIMITER ;
            ALTER TABLE t1
              ADD has_casefolding INT DEFAULT (BINARY LOWER(a)<>a OR BINARY UPPER(a)<>a),
              ADD KEY(has_casefolding, a);

            CREATE OR REPLACE TABLE t21 AS
            SELECT
              HEX(t1.a) AS hex_a, HEX(t2.a) AS hex_b,
              BINARY LOWER(t1.a)=LOWER(t2.a) eq_lower,
              BINARY UPPER(t1.a)=UPPER(t2.a) AS eq_upper,
              t1.a AS a, t2.a AS b
            FROM
              t1 t1, t1 t2
            WHERE
              t1.has_casefolding=1
            AND (BINARY LOWER(t1.a)=LOWER(t2.a))<>(BINARY UPPER(t1.a)=UPPER(t2.a));

            SELECT * FROM t21;
            {code}
            {noformat}
            +--------+--------+----------+----------+------+------+
            | hex_a | hex_b | eq_lower | eq_upper | a | b |
            +--------+--------+----------+----------+------+------+
            | E1BA9E | C39F | 1 | 0 | ẞ | ß |
            | CE99 | CD85 | 0 | 1 | Ι | ͅ |
            | CEB9 | CD85 | 0 | 1 | ι | ͅ |
            | E1BEBE | CD85 | 0 | 1 | ι | ͅ |
            | E284AB | C385 | 1 | 0 | Å | Å |
            | E284AB | C3A5 | 1 | 0 | Å | å |
            | C385 | E284AB | 1 | 0 | Å | Å |
            | C3A5 | E284AB | 1 | 0 | å | Å |
            | C4B0 | 49 | 1 | 0 | İ | I |
            | C4B1 | 49 | 0 | 1 | ı | I |
            | C4B0 | 69 | 1 | 0 | İ | i |
            | C4B1 | 69 | 0 | 1 | ı | i |
            | 49 | C4B0 | 1 | 0 | I | İ |
            | 69 | C4B0 | 1 | 0 | i | İ |
            | 49 | C4B1 | 0 | 1 | I | ı |
            | 69 | C4B1 | 0 | 1 | i | ı |
            | E284AA | 4B | 1 | 0 | K | K |
            | E284AA | 6B | 1 | 0 | K | k |
            | 4B | E284AA | 1 | 0 | K | K |
            | 6B | E284AA | 1 | 0 | k | K |
            | C5BF | 53 | 0 | 1 | ſ | S |
            | C5BF | 73 | 0 | 1 | ſ | s |
            | 53 | C5BF | 0 | 1 | S | ſ |
            | 73 | C5BF | 0 | 1 | s | ſ |
            | E1BA9B | E1B9A0 | 0 | 1 | ẛ | Ṡ |
            | E1BA9B | E1B9A1 | 0 | 1 | ẛ | ṡ |
            | E1B9A0 | E1BA9B | 0 | 1 | Ṡ | ẛ |
            | E1B9A1 | E1BA9B | 0 | 1 | ṡ | ẛ |
            | CF90 | CE92 | 0 | 1 | ϐ | Β |
            | CF90 | CEB2 | 0 | 1 | ϐ | β |
            | CE92 | CF90 | 0 | 1 | Β | ϐ |
            | CEB2 | CF90 | 0 | 1 | β | ϐ |
            | CFB5 | CE95 | 0 | 1 | ϵ | Ε |
            | CFB5 | CEB5 | 0 | 1 | ϵ | ε |
            | CE95 | CFB5 | 0 | 1 | Ε | ϵ |
            | CEB5 | CFB5 | 0 | 1 | ε | ϵ |
            | CF91 | CE98 | 0 | 1 | ϑ | Θ |
            | CFB4 | CE98 | 1 | 0 | ϴ | Θ |
            | CF91 | CEB8 | 0 | 1 | ϑ | θ |
            | CFB4 | CEB8 | 1 | 0 | ϴ | θ |
            | CE98 | CF91 | 0 | 1 | Θ | ϑ |
            | CEB8 | CF91 | 0 | 1 | θ | ϑ |
            | CE98 | CFB4 | 1 | 0 | Θ | ϴ |
            | CEB8 | CFB4 | 1 | 0 | θ | ϴ |
            | CD85 | CE99 | 0 | 1 | ͅ | Ι |
            | E1BEBE | CE99 | 0 | 1 | ι | Ι |
            | CD85 | CEB9 | 0 | 1 | ͅ | ι |
            | E1BEBE | CEB9 | 0 | 1 | ι | ι |
            | CD85 | E1BEBE | 0 | 1 | ͅ | ι |
            | CE99 | E1BEBE | 0 | 1 | Ι | ι |
            | CEB9 | E1BEBE | 0 | 1 | ι | ι |
            | CFB0 | CE9A | 0 | 1 | ϰ | Κ |
            | CFB0 | CEBA | 0 | 1 | ϰ | κ |
            | CE9A | CFB0 | 0 | 1 | Κ | ϰ |
            | CEBA | CFB0 | 0 | 1 | κ | ϰ |
            | CE9C | C2B5 | 0 | 1 | Μ | µ |
            | CEBC | C2B5 | 0 | 1 | μ | µ |
            | C2B5 | CE9C | 0 | 1 | µ | Μ |
            | C2B5 | CEBC | 0 | 1 | µ | μ |
            | CF96 | CEA0 | 0 | 1 | ϖ | Π |
            | CF96 | CF80 | 0 | 1 | ϖ | π |
            | CEA0 | CF96 | 0 | 1 | Π | ϖ |
            | CF80 | CF96 | 0 | 1 | π | ϖ |
            | CFB1 | CEA1 | 0 | 1 | ϱ | Ρ |
            | CFB1 | CF81 | 0 | 1 | ϱ | ρ |
            | CEA1 | CFB1 | 0 | 1 | Ρ | ϱ |
            | CF81 | CFB1 | 0 | 1 | ρ | ϱ |
            | CF82 | CEA3 | 0 | 1 | ς | Σ |
            | CEA3 | CF82 | 0 | 1 | Σ | ς |
            | CF83 | CF82 | 0 | 1 | σ | ς |
            | CF82 | CF83 | 0 | 1 | ς | σ |
            | CF95 | CEA6 | 0 | 1 | ϕ | Φ |
            | CF95 | CF86 | 0 | 1 | ϕ | φ |
            | CEA6 | CF95 | 0 | 1 | Φ | ϕ |
            | CF86 | CF95 | 0 | 1 | φ | ϕ |
            | E284A6 | CEA9 | 1 | 0 | Ω | Ω |
            | E284A6 | CF89 | 1 | 0 | Ω | ω |
            | CEA9 | E284A6 | 1 | 0 | Ω | Ω |
            | CF89 | E284A6 | 1 | 0 | ω | Ω |
            | E1B280 | D092 | 0 | 1 | ᲀ | В |
            | E1B280 | D0B2 | 0 | 1 | ᲀ | в |
            | D092 | E1B280 | 0 | 1 | В | ᲀ |
            | D0B2 | E1B280 | 0 | 1 | в | ᲀ |
            | E1B281 | D094 | 0 | 1 | ᲁ | Д |
            | E1B281 | D0B4 | 0 | 1 | ᲁ | д |
            | D094 | E1B281 | 0 | 1 | Д | ᲁ |
            | D0B4 | E1B281 | 0 | 1 | д | ᲁ |
            | E1B282 | D09E | 0 | 1 | ᲂ | О |
            | E1B282 | D0BE | 0 | 1 | ᲂ | о |
            | D09E | E1B282 | 0 | 1 | О | ᲂ |
            | D0BE | E1B282 | 0 | 1 | о | ᲂ |
            | E1B283 | D0A1 | 0 | 1 | ᲃ | С |
            | E1B283 | D181 | 0 | 1 | ᲃ | с |
            | D0A1 | E1B283 | 0 | 1 | С | ᲃ |
            | D181 | E1B283 | 0 | 1 | с | ᲃ |
            | E1B284 | D0A2 | 0 | 1 | ᲄ | Т |
            | E1B285 | D0A2 | 0 | 1 | ᲅ | Т |
            | E1B284 | D182 | 0 | 1 | ᲄ | т |
            | E1B285 | D182 | 0 | 1 | ᲅ | т |
            | D0A2 | E1B284 | 0 | 1 | Т | ᲄ |
            | D182 | E1B284 | 0 | 1 | т | ᲄ |
            | E1B285 | E1B284 | 0 | 1 | ᲅ | ᲄ |
            | D0A2 | E1B285 | 0 | 1 | Т | ᲅ |
            | D182 | E1B285 | 0 | 1 | т | ᲅ |
            | E1B284 | E1B285 | 0 | 1 | ᲄ | ᲅ |
            | EA998A | E1B288 | 0 | 1 | Ꙋ | ᲈ |
            | EA998B | E1B288 | 0 | 1 | ꙋ | ᲈ |
            | E1B288 | EA998A | 0 | 1 | ᲈ | Ꙋ |
            | E1B288 | EA998B | 0 | 1 | ᲈ | ꙋ |
            | E1B286 | D0AA | 0 | 1 | ᲆ | Ъ |
            | E1B286 | D18A | 0 | 1 | ᲆ | ъ |
            | D0AA | E1B286 | 0 | 1 | Ъ | ᲆ |
            | D18A | E1B286 | 0 | 1 | ъ | ᲆ |
            | E1B287 | D1A2 | 0 | 1 | ᲇ | Ѣ |
            | E1B287 | D1A3 | 0 | 1 | ᲇ | ѣ |
            | D1A2 | E1B287 | 0 | 1 | Ѣ | ᲇ |
            | D1A3 | E1B287 | 0 | 1 | ѣ | ᲇ |
            +--------+--------+----------+----------+------+------+
            {noformat}

            There are two parallel comparison systems in MariaDB collation library, implemented as virtual functions in MY_COLLATION_HANDLER:

            - Comparison according to the collation, provided by these functions
            {code:cpp}
              int (*strnncoll)(CHARSET_INFO *,
                                   const uchar *, size_t, const uchar *, size_t, my_bool);
              int (*strnncollsp)(CHARSET_INFO *,
                                     const uchar *, size_t, const uchar *, size_t);

              int (*strnncollsp_nchars)(CHARSET_INFO *,
                                            const uchar *str1, size_t len1,
                                            const uchar *str2, size_t len2,
                                            size_t nchars,
                                            uint flags);
            {code}


            - Comparison in case insensitive (but accent sensitive) style, implemented by this function:
            {code:cpp}
              int (*strcasecmp)(CHARSET_INFO *, const char *, const char *);
            {code}
            Note, accent and case sensitivity of the collation does not matter. strcasecmp() always works using accent sensitive case insensitive comparison style.

            These two parallel systems are redundant.

            Note, strcasecmp() is used mostly to compare identifiers, while the functions of the first group are used to compare data.

            Let's get rid of the second comparison system:
            1. Remove MY_COLLATION_HANDLER::strcasecmp()
            2. Introduce a new collation utf8mb4_tolower_as_ci. Note, it should work for the entire Unicode range U+0000..U+10FFFF.
            3. Replace all calls for:
            {code:cpp}
            system_charset_info->coll->strcasecmp()
            {code}
            to calls for
            {code:cpp}
            my_charset_utf8mb4_tolower_as_ci->coll->strnncoll***()
            {code}

            The change would generally be quite mechanic. However, there is one small problem: strcasecmp() accepts 0-terminated strings, while the strnncoll-alike functions accept the pointer and the length. So some refactoring will be needed. Note, Monty earlier changed many MariaDB C data types to use LEX_CSTRING (instead of just a const char pointer) to store names. So this part of the current task will switch some more C data types to LEX_CSTRING.

            h2. tolower vs toupper comparison

            Another option is to implement utf8mb3_toupper_ci which will compare upper cases (instead of lower cases).


            The difference is in a few dozen BMP characters. This script finds all those characters:
            {code:sql}
            CREATE OR REPLACE TABLE t1 (a CHAR(1) CHARACTER SET utf8mb4 COLLATE utf8mb4_uca1400_ai_ci) ENGINE=MyISAM;
            DELIMITER $$
            FOR i IN 0..0xFFFF
            DO
              INSERT INTO t1 VALUES (CHAR(i USING ucs2));
            END FOR;
            $$
            DELIMITER ;
            ALTER TABLE t1
              ADD has_casefolding INT DEFAULT (BINARY LOWER(a)<>a OR BINARY UPPER(a)<>a),
              ADD KEY(has_casefolding, a);

            CREATE OR REPLACE TABLE t21 AS
            SELECT
              HEX(t1.a) AS hex_a, HEX(t2.a) AS hex_b,
              BINARY LOWER(t1.a)=LOWER(t2.a) eq_lower,
              BINARY UPPER(t1.a)=UPPER(t2.a) AS eq_upper,
              t1.a AS a, t2.a AS b
            FROM
              t1 t1, t1 t2
            WHERE
              t1.has_casefolding=1
            AND (BINARY LOWER(t1.a)=LOWER(t2.a))<>(BINARY UPPER(t1.a)=UPPER(t2.a));

            SELECT * FROM t21;
            {code}
            {noformat}
            +--------+--------+----------+----------+------+------+
            | hex_a | hex_b | eq_lower | eq_upper | a | b |
            +--------+--------+----------+----------+------+------+
            | E1BA9E | C39F | 1 | 0 | ẞ | ß |
            | CE99 | CD85 | 0 | 1 | Ι | ͅ |
            | CEB9 | CD85 | 0 | 1 | ι | ͅ |
            | E1BEBE | CD85 | 0 | 1 | ι | ͅ |
            | E284AB | C385 | 1 | 0 | Å | Å |
            | E284AB | C3A5 | 1 | 0 | Å | å |
            | C385 | E284AB | 1 | 0 | Å | Å |
            | C3A5 | E284AB | 1 | 0 | å | Å |
            | C4B0 | 49 | 1 | 0 | İ | I |
            | C4B1 | 49 | 0 | 1 | ı | I |
            | C4B0 | 69 | 1 | 0 | İ | i |
            | C4B1 | 69 | 0 | 1 | ı | i |
            | 49 | C4B0 | 1 | 0 | I | İ |
            | 69 | C4B0 | 1 | 0 | i | İ |
            | 49 | C4B1 | 0 | 1 | I | ı |
            | 69 | C4B1 | 0 | 1 | i | ı |
            | E284AA | 4B | 1 | 0 | K | K |
            | E284AA | 6B | 1 | 0 | K | k |
            | 4B | E284AA | 1 | 0 | K | K |
            | 6B | E284AA | 1 | 0 | k | K |
            | C5BF | 53 | 0 | 1 | ſ | S |
            | C5BF | 73 | 0 | 1 | ſ | s |
            | 53 | C5BF | 0 | 1 | S | ſ |
            | 73 | C5BF | 0 | 1 | s | ſ |
            | E1BA9B | E1B9A0 | 0 | 1 | ẛ | Ṡ |
            | E1BA9B | E1B9A1 | 0 | 1 | ẛ | ṡ |
            | E1B9A0 | E1BA9B | 0 | 1 | Ṡ | ẛ |
            | E1B9A1 | E1BA9B | 0 | 1 | ṡ | ẛ |
            | CF90 | CE92 | 0 | 1 | ϐ | Β |
            | CF90 | CEB2 | 0 | 1 | ϐ | β |
            | CE92 | CF90 | 0 | 1 | Β | ϐ |
            | CEB2 | CF90 | 0 | 1 | β | ϐ |
            | CFB5 | CE95 | 0 | 1 | ϵ | Ε |
            | CFB5 | CEB5 | 0 | 1 | ϵ | ε |
            | CE95 | CFB5 | 0 | 1 | Ε | ϵ |
            | CEB5 | CFB5 | 0 | 1 | ε | ϵ |
            | CF91 | CE98 | 0 | 1 | ϑ | Θ |
            | CFB4 | CE98 | 1 | 0 | ϴ | Θ |
            | CF91 | CEB8 | 0 | 1 | ϑ | θ |
            | CFB4 | CEB8 | 1 | 0 | ϴ | θ |
            | CE98 | CF91 | 0 | 1 | Θ | ϑ |
            | CEB8 | CF91 | 0 | 1 | θ | ϑ |
            | CE98 | CFB4 | 1 | 0 | Θ | ϴ |
            | CEB8 | CFB4 | 1 | 0 | θ | ϴ |
            | CD85 | CE99 | 0 | 1 | ͅ | Ι |
            | E1BEBE | CE99 | 0 | 1 | ι | Ι |
            | CD85 | CEB9 | 0 | 1 | ͅ | ι |
            | E1BEBE | CEB9 | 0 | 1 | ι | ι |
            | CD85 | E1BEBE | 0 | 1 | ͅ | ι |
            | CE99 | E1BEBE | 0 | 1 | Ι | ι |
            | CEB9 | E1BEBE | 0 | 1 | ι | ι |
            | CFB0 | CE9A | 0 | 1 | ϰ | Κ |
            | CFB0 | CEBA | 0 | 1 | ϰ | κ |
            | CE9A | CFB0 | 0 | 1 | Κ | ϰ |
            | CEBA | CFB0 | 0 | 1 | κ | ϰ |
            | CE9C | C2B5 | 0 | 1 | Μ | µ |
            | CEBC | C2B5 | 0 | 1 | μ | µ |
            | C2B5 | CE9C | 0 | 1 | µ | Μ |
            | C2B5 | CEBC | 0 | 1 | µ | μ |
            | CF96 | CEA0 | 0 | 1 | ϖ | Π |
            | CF96 | CF80 | 0 | 1 | ϖ | π |
            | CEA0 | CF96 | 0 | 1 | Π | ϖ |
            | CF80 | CF96 | 0 | 1 | π | ϖ |
            | CFB1 | CEA1 | 0 | 1 | ϱ | Ρ |
            | CFB1 | CF81 | 0 | 1 | ϱ | ρ |
            | CEA1 | CFB1 | 0 | 1 | Ρ | ϱ |
            | CF81 | CFB1 | 0 | 1 | ρ | ϱ |
            | CF82 | CEA3 | 0 | 1 | ς | Σ |
            | CEA3 | CF82 | 0 | 1 | Σ | ς |
            | CF83 | CF82 | 0 | 1 | σ | ς |
            | CF82 | CF83 | 0 | 1 | ς | σ |
            | CF95 | CEA6 | 0 | 1 | ϕ | Φ |
            | CF95 | CF86 | 0 | 1 | ϕ | φ |
            | CEA6 | CF95 | 0 | 1 | Φ | ϕ |
            | CF86 | CF95 | 0 | 1 | φ | ϕ |
            | E284A6 | CEA9 | 1 | 0 | Ω | Ω |
            | E284A6 | CF89 | 1 | 0 | Ω | ω |
            | CEA9 | E284A6 | 1 | 0 | Ω | Ω |
            | CF89 | E284A6 | 1 | 0 | ω | Ω |
            | E1B280 | D092 | 0 | 1 | ᲀ | В |
            | E1B280 | D0B2 | 0 | 1 | ᲀ | в |
            | D092 | E1B280 | 0 | 1 | В | ᲀ |
            | D0B2 | E1B280 | 0 | 1 | в | ᲀ |
            | E1B281 | D094 | 0 | 1 | ᲁ | Д |
            | E1B281 | D0B4 | 0 | 1 | ᲁ | д |
            | D094 | E1B281 | 0 | 1 | Д | ᲁ |
            | D0B4 | E1B281 | 0 | 1 | д | ᲁ |
            | E1B282 | D09E | 0 | 1 | ᲂ | О |
            | E1B282 | D0BE | 0 | 1 | ᲂ | о |
            | D09E | E1B282 | 0 | 1 | О | ᲂ |
            | D0BE | E1B282 | 0 | 1 | о | ᲂ |
            | E1B283 | D0A1 | 0 | 1 | ᲃ | С |
            | E1B283 | D181 | 0 | 1 | ᲃ | с |
            | D0A1 | E1B283 | 0 | 1 | С | ᲃ |
            | D181 | E1B283 | 0 | 1 | с | ᲃ |
            | E1B284 | D0A2 | 0 | 1 | ᲄ | Т |
            | E1B285 | D0A2 | 0 | 1 | ᲅ | Т |
            | E1B284 | D182 | 0 | 1 | ᲄ | т |
            | E1B285 | D182 | 0 | 1 | ᲅ | т |
            | D0A2 | E1B284 | 0 | 1 | Т | ᲄ |
            | D182 | E1B284 | 0 | 1 | т | ᲄ |
            | E1B285 | E1B284 | 0 | 1 | ᲅ | ᲄ |
            | D0A2 | E1B285 | 0 | 1 | Т | ᲅ |
            | D182 | E1B285 | 0 | 1 | т | ᲅ |
            | E1B284 | E1B285 | 0 | 1 | ᲄ | ᲅ |
            | EA998A | E1B288 | 0 | 1 | Ꙋ | ᲈ |
            | EA998B | E1B288 | 0 | 1 | ꙋ | ᲈ |
            | E1B288 | EA998A | 0 | 1 | ᲈ | Ꙋ |
            | E1B288 | EA998B | 0 | 1 | ᲈ | ꙋ |
            | E1B286 | D0AA | 0 | 1 | ᲆ | Ъ |
            | E1B286 | D18A | 0 | 1 | ᲆ | ъ |
            | D0AA | E1B286 | 0 | 1 | Ъ | ᲆ |
            | D18A | E1B286 | 0 | 1 | ъ | ᲆ |
            | E1B287 | D1A2 | 0 | 1 | ᲇ | Ѣ |
            | E1B287 | D1A3 | 0 | 1 | ᲇ | ѣ |
            | D1A2 | E1B287 | 0 | 1 | Ѣ | ᲇ |
            | D1A3 | E1B287 | 0 | 1 | ѣ | ᲇ |
            +--------+--------+----------+----------+------+------+
            {noformat}


            Note, toupper comparison considers more pairs as equal than tolower comparison:
            {code:sql}
            SELECT SUM(eq_lower), SUM(eq_upper) FROM t21;
            {code}
            {noformat}
            +---------------+---------------+
            | SUM(eq_lower) | SUM(eq_upper) |
            +---------------+---------------+
            | 21 | 96 |
            +---------------+---------------+
            {noformat}


            - The tolower variant will compare exactly like strcasecmp() did
            - The toupper variant will compare close to how utf8mb3_general_ci works
            bar Alexander Barkov made changes -
            Description There are two parallel comparison systems in MariaDB collation library, implemented as virtual functions in MY_COLLATION_HANDLER:

            - Comparison according to the collation, provided by these functions
            {code:cpp}
              int (*strnncoll)(CHARSET_INFO *,
                                   const uchar *, size_t, const uchar *, size_t, my_bool);
              int (*strnncollsp)(CHARSET_INFO *,
                                     const uchar *, size_t, const uchar *, size_t);

              int (*strnncollsp_nchars)(CHARSET_INFO *,
                                            const uchar *str1, size_t len1,
                                            const uchar *str2, size_t len2,
                                            size_t nchars,
                                            uint flags);
            {code}


            - Comparison in case insensitive (but accent sensitive) style, implemented by this function:
            {code:cpp}
              int (*strcasecmp)(CHARSET_INFO *, const char *, const char *);
            {code}
            Note, accent and case sensitivity of the collation does not matter. strcasecmp() always works using accent sensitive case insensitive comparison style.

            These two parallel systems are redundant.

            Note, strcasecmp() is used mostly to compare identifiers, while the functions of the first group are used to compare data.

            Let's get rid of the second comparison system:
            1. Remove MY_COLLATION_HANDLER::strcasecmp()
            2. Introduce a new collation utf8mb4_tolower_as_ci. Note, it should work for the entire Unicode range U+0000..U+10FFFF.
            3. Replace all calls for:
            {code:cpp}
            system_charset_info->coll->strcasecmp()
            {code}
            to calls for
            {code:cpp}
            my_charset_utf8mb4_tolower_as_ci->coll->strnncoll***()
            {code}

            The change would generally be quite mechanic. However, there is one small problem: strcasecmp() accepts 0-terminated strings, while the strnncoll-alike functions accept the pointer and the length. So some refactoring will be needed. Note, Monty earlier changed many MariaDB C data types to use LEX_CSTRING (instead of just a const char pointer) to store names. So this part of the current task will switch some more C data types to LEX_CSTRING.

            h2. tolower vs toupper comparison

            Another option is to implement utf8mb3_toupper_ci which will compare upper cases (instead of lower cases).


            The difference is in a few dozen BMP characters. This script finds all those characters:
            {code:sql}
            CREATE OR REPLACE TABLE t1 (a CHAR(1) CHARACTER SET utf8mb4 COLLATE utf8mb4_uca1400_ai_ci) ENGINE=MyISAM;
            DELIMITER $$
            FOR i IN 0..0xFFFF
            DO
              INSERT INTO t1 VALUES (CHAR(i USING ucs2));
            END FOR;
            $$
            DELIMITER ;
            ALTER TABLE t1
              ADD has_casefolding INT DEFAULT (BINARY LOWER(a)<>a OR BINARY UPPER(a)<>a),
              ADD KEY(has_casefolding, a);

            CREATE OR REPLACE TABLE t21 AS
            SELECT
              HEX(t1.a) AS hex_a, HEX(t2.a) AS hex_b,
              BINARY LOWER(t1.a)=LOWER(t2.a) eq_lower,
              BINARY UPPER(t1.a)=UPPER(t2.a) AS eq_upper,
              t1.a AS a, t2.a AS b
            FROM
              t1 t1, t1 t2
            WHERE
              t1.has_casefolding=1
            AND (BINARY LOWER(t1.a)=LOWER(t2.a))<>(BINARY UPPER(t1.a)=UPPER(t2.a));

            SELECT * FROM t21;
            {code}
            {noformat}
            +--------+--------+----------+----------+------+------+
            | hex_a | hex_b | eq_lower | eq_upper | a | b |
            +--------+--------+----------+----------+------+------+
            | E1BA9E | C39F | 1 | 0 | ẞ | ß |
            | CE99 | CD85 | 0 | 1 | Ι | ͅ |
            | CEB9 | CD85 | 0 | 1 | ι | ͅ |
            | E1BEBE | CD85 | 0 | 1 | ι | ͅ |
            | E284AB | C385 | 1 | 0 | Å | Å |
            | E284AB | C3A5 | 1 | 0 | Å | å |
            | C385 | E284AB | 1 | 0 | Å | Å |
            | C3A5 | E284AB | 1 | 0 | å | Å |
            | C4B0 | 49 | 1 | 0 | İ | I |
            | C4B1 | 49 | 0 | 1 | ı | I |
            | C4B0 | 69 | 1 | 0 | İ | i |
            | C4B1 | 69 | 0 | 1 | ı | i |
            | 49 | C4B0 | 1 | 0 | I | İ |
            | 69 | C4B0 | 1 | 0 | i | İ |
            | 49 | C4B1 | 0 | 1 | I | ı |
            | 69 | C4B1 | 0 | 1 | i | ı |
            | E284AA | 4B | 1 | 0 | K | K |
            | E284AA | 6B | 1 | 0 | K | k |
            | 4B | E284AA | 1 | 0 | K | K |
            | 6B | E284AA | 1 | 0 | k | K |
            | C5BF | 53 | 0 | 1 | ſ | S |
            | C5BF | 73 | 0 | 1 | ſ | s |
            | 53 | C5BF | 0 | 1 | S | ſ |
            | 73 | C5BF | 0 | 1 | s | ſ |
            | E1BA9B | E1B9A0 | 0 | 1 | ẛ | Ṡ |
            | E1BA9B | E1B9A1 | 0 | 1 | ẛ | ṡ |
            | E1B9A0 | E1BA9B | 0 | 1 | Ṡ | ẛ |
            | E1B9A1 | E1BA9B | 0 | 1 | ṡ | ẛ |
            | CF90 | CE92 | 0 | 1 | ϐ | Β |
            | CF90 | CEB2 | 0 | 1 | ϐ | β |
            | CE92 | CF90 | 0 | 1 | Β | ϐ |
            | CEB2 | CF90 | 0 | 1 | β | ϐ |
            | CFB5 | CE95 | 0 | 1 | ϵ | Ε |
            | CFB5 | CEB5 | 0 | 1 | ϵ | ε |
            | CE95 | CFB5 | 0 | 1 | Ε | ϵ |
            | CEB5 | CFB5 | 0 | 1 | ε | ϵ |
            | CF91 | CE98 | 0 | 1 | ϑ | Θ |
            | CFB4 | CE98 | 1 | 0 | ϴ | Θ |
            | CF91 | CEB8 | 0 | 1 | ϑ | θ |
            | CFB4 | CEB8 | 1 | 0 | ϴ | θ |
            | CE98 | CF91 | 0 | 1 | Θ | ϑ |
            | CEB8 | CF91 | 0 | 1 | θ | ϑ |
            | CE98 | CFB4 | 1 | 0 | Θ | ϴ |
            | CEB8 | CFB4 | 1 | 0 | θ | ϴ |
            | CD85 | CE99 | 0 | 1 | ͅ | Ι |
            | E1BEBE | CE99 | 0 | 1 | ι | Ι |
            | CD85 | CEB9 | 0 | 1 | ͅ | ι |
            | E1BEBE | CEB9 | 0 | 1 | ι | ι |
            | CD85 | E1BEBE | 0 | 1 | ͅ | ι |
            | CE99 | E1BEBE | 0 | 1 | Ι | ι |
            | CEB9 | E1BEBE | 0 | 1 | ι | ι |
            | CFB0 | CE9A | 0 | 1 | ϰ | Κ |
            | CFB0 | CEBA | 0 | 1 | ϰ | κ |
            | CE9A | CFB0 | 0 | 1 | Κ | ϰ |
            | CEBA | CFB0 | 0 | 1 | κ | ϰ |
            | CE9C | C2B5 | 0 | 1 | Μ | µ |
            | CEBC | C2B5 | 0 | 1 | μ | µ |
            | C2B5 | CE9C | 0 | 1 | µ | Μ |
            | C2B5 | CEBC | 0 | 1 | µ | μ |
            | CF96 | CEA0 | 0 | 1 | ϖ | Π |
            | CF96 | CF80 | 0 | 1 | ϖ | π |
            | CEA0 | CF96 | 0 | 1 | Π | ϖ |
            | CF80 | CF96 | 0 | 1 | π | ϖ |
            | CFB1 | CEA1 | 0 | 1 | ϱ | Ρ |
            | CFB1 | CF81 | 0 | 1 | ϱ | ρ |
            | CEA1 | CFB1 | 0 | 1 | Ρ | ϱ |
            | CF81 | CFB1 | 0 | 1 | ρ | ϱ |
            | CF82 | CEA3 | 0 | 1 | ς | Σ |
            | CEA3 | CF82 | 0 | 1 | Σ | ς |
            | CF83 | CF82 | 0 | 1 | σ | ς |
            | CF82 | CF83 | 0 | 1 | ς | σ |
            | CF95 | CEA6 | 0 | 1 | ϕ | Φ |
            | CF95 | CF86 | 0 | 1 | ϕ | φ |
            | CEA6 | CF95 | 0 | 1 | Φ | ϕ |
            | CF86 | CF95 | 0 | 1 | φ | ϕ |
            | E284A6 | CEA9 | 1 | 0 | Ω | Ω |
            | E284A6 | CF89 | 1 | 0 | Ω | ω |
            | CEA9 | E284A6 | 1 | 0 | Ω | Ω |
            | CF89 | E284A6 | 1 | 0 | ω | Ω |
            | E1B280 | D092 | 0 | 1 | ᲀ | В |
            | E1B280 | D0B2 | 0 | 1 | ᲀ | в |
            | D092 | E1B280 | 0 | 1 | В | ᲀ |
            | D0B2 | E1B280 | 0 | 1 | в | ᲀ |
            | E1B281 | D094 | 0 | 1 | ᲁ | Д |
            | E1B281 | D0B4 | 0 | 1 | ᲁ | д |
            | D094 | E1B281 | 0 | 1 | Д | ᲁ |
            | D0B4 | E1B281 | 0 | 1 | д | ᲁ |
            | E1B282 | D09E | 0 | 1 | ᲂ | О |
            | E1B282 | D0BE | 0 | 1 | ᲂ | о |
            | D09E | E1B282 | 0 | 1 | О | ᲂ |
            | D0BE | E1B282 | 0 | 1 | о | ᲂ |
            | E1B283 | D0A1 | 0 | 1 | ᲃ | С |
            | E1B283 | D181 | 0 | 1 | ᲃ | с |
            | D0A1 | E1B283 | 0 | 1 | С | ᲃ |
            | D181 | E1B283 | 0 | 1 | с | ᲃ |
            | E1B284 | D0A2 | 0 | 1 | ᲄ | Т |
            | E1B285 | D0A2 | 0 | 1 | ᲅ | Т |
            | E1B284 | D182 | 0 | 1 | ᲄ | т |
            | E1B285 | D182 | 0 | 1 | ᲅ | т |
            | D0A2 | E1B284 | 0 | 1 | Т | ᲄ |
            | D182 | E1B284 | 0 | 1 | т | ᲄ |
            | E1B285 | E1B284 | 0 | 1 | ᲅ | ᲄ |
            | D0A2 | E1B285 | 0 | 1 | Т | ᲅ |
            | D182 | E1B285 | 0 | 1 | т | ᲅ |
            | E1B284 | E1B285 | 0 | 1 | ᲄ | ᲅ |
            | EA998A | E1B288 | 0 | 1 | Ꙋ | ᲈ |
            | EA998B | E1B288 | 0 | 1 | ꙋ | ᲈ |
            | E1B288 | EA998A | 0 | 1 | ᲈ | Ꙋ |
            | E1B288 | EA998B | 0 | 1 | ᲈ | ꙋ |
            | E1B286 | D0AA | 0 | 1 | ᲆ | Ъ |
            | E1B286 | D18A | 0 | 1 | ᲆ | ъ |
            | D0AA | E1B286 | 0 | 1 | Ъ | ᲆ |
            | D18A | E1B286 | 0 | 1 | ъ | ᲆ |
            | E1B287 | D1A2 | 0 | 1 | ᲇ | Ѣ |
            | E1B287 | D1A3 | 0 | 1 | ᲇ | ѣ |
            | D1A2 | E1B287 | 0 | 1 | Ѣ | ᲇ |
            | D1A3 | E1B287 | 0 | 1 | ѣ | ᲇ |
            +--------+--------+----------+----------+------+------+
            {noformat}


            Note, toupper comparison considers more pairs as equal than tolower comparison:
            {code:sql}
            SELECT SUM(eq_lower), SUM(eq_upper) FROM t21;
            {code}
            {noformat}
            +---------------+---------------+
            | SUM(eq_lower) | SUM(eq_upper) |
            +---------------+---------------+
            | 21 | 96 |
            +---------------+---------------+
            {noformat}


            - The tolower variant will compare exactly like strcasecmp() did
            - The toupper variant will compare close to how utf8mb3_general_ci works
            There are two parallel comparison systems in MariaDB collation library, implemented as virtual functions in MY_COLLATION_HANDLER:

            - Comparison according to the collation, provided by these functions
            {code:cpp}
              int (*strnncoll)(CHARSET_INFO *,
                                   const uchar *, size_t, const uchar *, size_t, my_bool);
              int (*strnncollsp)(CHARSET_INFO *,
                                     const uchar *, size_t, const uchar *, size_t);

              int (*strnncollsp_nchars)(CHARSET_INFO *,
                                            const uchar *str1, size_t len1,
                                            const uchar *str2, size_t len2,
                                            size_t nchars,
                                            uint flags);
            {code}


            - Comparison in case insensitive (but accent sensitive) style, implemented by this function:
            {code:cpp}
              int (*strcasecmp)(CHARSET_INFO *, const char *, const char *);
            {code}
            Note, accent and case sensitivity of the collation does not matter. strcasecmp() always works using accent sensitive case insensitive comparison style.

            These two parallel systems are redundant.

            Note, strcasecmp() is used mostly to compare identifiers, while the functions of the first group are used to compare data.

            Let's get rid of the second comparison system:
            1. Remove MY_COLLATION_HANDLER::strcasecmp()
            2. Introduce a new collation utf8mb4_tolower_as_ci. Note, it should work for the entire Unicode range U+0000..U+10FFFF.
            3. Replace all calls for:
            {code:cpp}
            system_charset_info->coll->strcasecmp()
            {code}
            to calls for
            {code:cpp}
            my_charset_utf8mb4_tolower_as_ci->coll->strnncoll***()
            {code}

            The change would generally be quite mechanic. However, there is one small problem: strcasecmp() accepts 0-terminated strings, while the strnncoll-alike functions accept the pointer and the length. So some refactoring will be needed. Note, Monty earlier changed many MariaDB C data types to use LEX_CSTRING (instead of just a const char pointer) to store names. So this part of the current task will switch some more C data types to LEX_CSTRING.

            h2. tolower vs toupper comparison

            Another option is to implement utf8mb3_toupper_ci which will compare upper cases (instead of lower cases).


            The difference is in a few dozen BMP characters. This script finds all those characters:
            {code:sql}
            CREATE OR REPLACE TABLE t1 (a CHAR(1) CHARACTER SET utf8mb4 COLLATE utf8mb4_uca1400_ai_ci) ENGINE=MyISAM;
            DELIMITER $$
            FOR i IN 0..0xFFFF
            DO
              INSERT INTO t1 VALUES (CHAR(i USING ucs2));
            END FOR;
            $$
            DELIMITER ;
            ALTER TABLE t1
              ADD has_casefolding INT DEFAULT (BINARY LOWER(a)<>a OR BINARY UPPER(a)<>a),
              ADD KEY(has_casefolding, a);

            CREATE OR REPLACE TABLE t21 AS
            SELECT
              HEX(t1.a) AS hex_a, HEX(t2.a) AS hex_b,
              BINARY LOWER(t1.a)=LOWER(t2.a) eq_lower,
              BINARY UPPER(t1.a)=UPPER(t2.a) AS eq_upper,
              t1.a AS a, t2.a AS b
            FROM
              t1 t1, t1 t2
            WHERE
              t1.has_casefolding=1
            AND (BINARY LOWER(t1.a)=LOWER(t2.a))<>(BINARY UPPER(t1.a)=UPPER(t2.a));

            SELECT * FROM t21;
            {code}
            {noformat}
            +--------+--------+----------+----------+------+------+
            | hex_a | hex_b | eq_lower | eq_upper | a | b |
            +--------+--------+----------+----------+------+------+
            | E1BA9E | C39F | 1 | 0 | ẞ | ß |
            | CE99 | CD85 | 0 | 1 | Ι | ͅ |
            | CEB9 | CD85 | 0 | 1 | ι | ͅ |
            | E1BEBE | CD85 | 0 | 1 | ι | ͅ |
            | E284AB | C385 | 1 | 0 | Å | Å |
            | E284AB | C3A5 | 1 | 0 | Å | å |
            | C385 | E284AB | 1 | 0 | Å | Å |
            | C3A5 | E284AB | 1 | 0 | å | Å |
            | C4B0 | 49 | 1 | 0 | İ | I |
            | C4B1 | 49 | 0 | 1 | ı | I |
            | C4B0 | 69 | 1 | 0 | İ | i |
            | C4B1 | 69 | 0 | 1 | ı | i |
            | 49 | C4B0 | 1 | 0 | I | İ |
            | 69 | C4B0 | 1 | 0 | i | İ |
            | 49 | C4B1 | 0 | 1 | I | ı |
            | 69 | C4B1 | 0 | 1 | i | ı |
            | E284AA | 4B | 1 | 0 | K | K |
            | E284AA | 6B | 1 | 0 | K | k |
            | 4B | E284AA | 1 | 0 | K | K |
            | 6B | E284AA | 1 | 0 | k | K |
            | C5BF | 53 | 0 | 1 | ſ | S |
            | C5BF | 73 | 0 | 1 | ſ | s |
            | 53 | C5BF | 0 | 1 | S | ſ |
            | 73 | C5BF | 0 | 1 | s | ſ |
            | E1BA9B | E1B9A0 | 0 | 1 | ẛ | Ṡ |
            | E1BA9B | E1B9A1 | 0 | 1 | ẛ | ṡ |
            | E1B9A0 | E1BA9B | 0 | 1 | Ṡ | ẛ |
            | E1B9A1 | E1BA9B | 0 | 1 | ṡ | ẛ |
            | CF90 | CE92 | 0 | 1 | ϐ | Β |
            | CF90 | CEB2 | 0 | 1 | ϐ | β |
            | CE92 | CF90 | 0 | 1 | Β | ϐ |
            | CEB2 | CF90 | 0 | 1 | β | ϐ |
            | CFB5 | CE95 | 0 | 1 | ϵ | Ε |
            | CFB5 | CEB5 | 0 | 1 | ϵ | ε |
            | CE95 | CFB5 | 0 | 1 | Ε | ϵ |
            | CEB5 | CFB5 | 0 | 1 | ε | ϵ |
            | CF91 | CE98 | 0 | 1 | ϑ | Θ |
            | CFB4 | CE98 | 1 | 0 | ϴ | Θ |
            | CF91 | CEB8 | 0 | 1 | ϑ | θ |
            | CFB4 | CEB8 | 1 | 0 | ϴ | θ |
            | CE98 | CF91 | 0 | 1 | Θ | ϑ |
            | CEB8 | CF91 | 0 | 1 | θ | ϑ |
            | CE98 | CFB4 | 1 | 0 | Θ | ϴ |
            | CEB8 | CFB4 | 1 | 0 | θ | ϴ |
            | CD85 | CE99 | 0 | 1 | ͅ | Ι |
            | E1BEBE | CE99 | 0 | 1 | ι | Ι |
            | CD85 | CEB9 | 0 | 1 | ͅ | ι |
            | E1BEBE | CEB9 | 0 | 1 | ι | ι |
            | CD85 | E1BEBE | 0 | 1 | ͅ | ι |
            | CE99 | E1BEBE | 0 | 1 | Ι | ι |
            | CEB9 | E1BEBE | 0 | 1 | ι | ι |
            | CFB0 | CE9A | 0 | 1 | ϰ | Κ |
            | CFB0 | CEBA | 0 | 1 | ϰ | κ |
            | CE9A | CFB0 | 0 | 1 | Κ | ϰ |
            | CEBA | CFB0 | 0 | 1 | κ | ϰ |
            | CE9C | C2B5 | 0 | 1 | Μ | µ |
            | CEBC | C2B5 | 0 | 1 | μ | µ |
            | C2B5 | CE9C | 0 | 1 | µ | Μ |
            | C2B5 | CEBC | 0 | 1 | µ | μ |
            | CF96 | CEA0 | 0 | 1 | ϖ | Π |
            | CF96 | CF80 | 0 | 1 | ϖ | π |
            | CEA0 | CF96 | 0 | 1 | Π | ϖ |
            | CF80 | CF96 | 0 | 1 | π | ϖ |
            | CFB1 | CEA1 | 0 | 1 | ϱ | Ρ |
            | CFB1 | CF81 | 0 | 1 | ϱ | ρ |
            | CEA1 | CFB1 | 0 | 1 | Ρ | ϱ |
            | CF81 | CFB1 | 0 | 1 | ρ | ϱ |
            | CF82 | CEA3 | 0 | 1 | ς | Σ |
            | CEA3 | CF82 | 0 | 1 | Σ | ς |
            | CF83 | CF82 | 0 | 1 | σ | ς |
            | CF82 | CF83 | 0 | 1 | ς | σ |
            | CF95 | CEA6 | 0 | 1 | ϕ | Φ |
            | CF95 | CF86 | 0 | 1 | ϕ | φ |
            | CEA6 | CF95 | 0 | 1 | Φ | ϕ |
            | CF86 | CF95 | 0 | 1 | φ | ϕ |
            | E284A6 | CEA9 | 1 | 0 | Ω | Ω |
            | E284A6 | CF89 | 1 | 0 | Ω | ω |
            | CEA9 | E284A6 | 1 | 0 | Ω | Ω |
            | CF89 | E284A6 | 1 | 0 | ω | Ω |
            | E1B280 | D092 | 0 | 1 | ᲀ | В |
            | E1B280 | D0B2 | 0 | 1 | ᲀ | в |
            | D092 | E1B280 | 0 | 1 | В | ᲀ |
            | D0B2 | E1B280 | 0 | 1 | в | ᲀ |
            | E1B281 | D094 | 0 | 1 | ᲁ | Д |
            | E1B281 | D0B4 | 0 | 1 | ᲁ | д |
            | D094 | E1B281 | 0 | 1 | Д | ᲁ |
            | D0B4 | E1B281 | 0 | 1 | д | ᲁ |
            | E1B282 | D09E | 0 | 1 | ᲂ | О |
            | E1B282 | D0BE | 0 | 1 | ᲂ | о |
            | D09E | E1B282 | 0 | 1 | О | ᲂ |
            | D0BE | E1B282 | 0 | 1 | о | ᲂ |
            | E1B283 | D0A1 | 0 | 1 | ᲃ | С |
            | E1B283 | D181 | 0 | 1 | ᲃ | с |
            | D0A1 | E1B283 | 0 | 1 | С | ᲃ |
            | D181 | E1B283 | 0 | 1 | с | ᲃ |
            | E1B284 | D0A2 | 0 | 1 | ᲄ | Т |
            | E1B285 | D0A2 | 0 | 1 | ᲅ | Т |
            | E1B284 | D182 | 0 | 1 | ᲄ | т |
            | E1B285 | D182 | 0 | 1 | ᲅ | т |
            | D0A2 | E1B284 | 0 | 1 | Т | ᲄ |
            | D182 | E1B284 | 0 | 1 | т | ᲄ |
            | E1B285 | E1B284 | 0 | 1 | ᲅ | ᲄ |
            | D0A2 | E1B285 | 0 | 1 | Т | ᲅ |
            | D182 | E1B285 | 0 | 1 | т | ᲅ |
            | E1B284 | E1B285 | 0 | 1 | ᲄ | ᲅ |
            | EA998A | E1B288 | 0 | 1 | Ꙋ | ᲈ |
            | EA998B | E1B288 | 0 | 1 | ꙋ | ᲈ |
            | E1B288 | EA998A | 0 | 1 | ᲈ | Ꙋ |
            | E1B288 | EA998B | 0 | 1 | ᲈ | ꙋ |
            | E1B286 | D0AA | 0 | 1 | ᲆ | Ъ |
            | E1B286 | D18A | 0 | 1 | ᲆ | ъ |
            | D0AA | E1B286 | 0 | 1 | Ъ | ᲆ |
            | D18A | E1B286 | 0 | 1 | ъ | ᲆ |
            | E1B287 | D1A2 | 0 | 1 | ᲇ | Ѣ |
            | E1B287 | D1A3 | 0 | 1 | ᲇ | ѣ |
            | D1A2 | E1B287 | 0 | 1 | Ѣ | ᲇ |
            | D1A3 | E1B287 | 0 | 1 | ѣ | ᲇ |
            +--------+--------+----------+----------+------+------+
            {noformat}


            Note, toupper comparison considers more pairs as equal than tolower comparison:
            {code:sql}
            SELECT SUM(eq_lower), SUM(eq_upper) FROM t21;
            {code}
            {noformat}
            +---------------+---------------+
            | SUM(eq_lower) | SUM(eq_upper) |
            +---------------+---------------+
            | 21 | 96 |
            +---------------+---------------+
            {noformat}

            h2. Cons of toupper vs tolower comparison
            - The tolower variant will compare exactly like strcasecmp() did
            - The toupper variant will compare close to how utf8mb3_general_ci works
            bar Alexander Barkov made changes -
            Description There are two parallel comparison systems in MariaDB collation library, implemented as virtual functions in MY_COLLATION_HANDLER:

            - Comparison according to the collation, provided by these functions
            {code:cpp}
              int (*strnncoll)(CHARSET_INFO *,
                                   const uchar *, size_t, const uchar *, size_t, my_bool);
              int (*strnncollsp)(CHARSET_INFO *,
                                     const uchar *, size_t, const uchar *, size_t);

              int (*strnncollsp_nchars)(CHARSET_INFO *,
                                            const uchar *str1, size_t len1,
                                            const uchar *str2, size_t len2,
                                            size_t nchars,
                                            uint flags);
            {code}


            - Comparison in case insensitive (but accent sensitive) style, implemented by this function:
            {code:cpp}
              int (*strcasecmp)(CHARSET_INFO *, const char *, const char *);
            {code}
            Note, accent and case sensitivity of the collation does not matter. strcasecmp() always works using accent sensitive case insensitive comparison style.

            These two parallel systems are redundant.

            Note, strcasecmp() is used mostly to compare identifiers, while the functions of the first group are used to compare data.

            Let's get rid of the second comparison system:
            1. Remove MY_COLLATION_HANDLER::strcasecmp()
            2. Introduce a new collation utf8mb4_tolower_as_ci. Note, it should work for the entire Unicode range U+0000..U+10FFFF.
            3. Replace all calls for:
            {code:cpp}
            system_charset_info->coll->strcasecmp()
            {code}
            to calls for
            {code:cpp}
            my_charset_utf8mb4_tolower_as_ci->coll->strnncoll***()
            {code}

            The change would generally be quite mechanic. However, there is one small problem: strcasecmp() accepts 0-terminated strings, while the strnncoll-alike functions accept the pointer and the length. So some refactoring will be needed. Note, Monty earlier changed many MariaDB C data types to use LEX_CSTRING (instead of just a const char pointer) to store names. So this part of the current task will switch some more C data types to LEX_CSTRING.

            h2. tolower vs toupper comparison

            Another option is to implement utf8mb3_toupper_ci which will compare upper cases (instead of lower cases).


            The difference is in a few dozen BMP characters. This script finds all those characters:
            {code:sql}
            CREATE OR REPLACE TABLE t1 (a CHAR(1) CHARACTER SET utf8mb4 COLLATE utf8mb4_uca1400_ai_ci) ENGINE=MyISAM;
            DELIMITER $$
            FOR i IN 0..0xFFFF
            DO
              INSERT INTO t1 VALUES (CHAR(i USING ucs2));
            END FOR;
            $$
            DELIMITER ;
            ALTER TABLE t1
              ADD has_casefolding INT DEFAULT (BINARY LOWER(a)<>a OR BINARY UPPER(a)<>a),
              ADD KEY(has_casefolding, a);

            CREATE OR REPLACE TABLE t21 AS
            SELECT
              HEX(t1.a) AS hex_a, HEX(t2.a) AS hex_b,
              BINARY LOWER(t1.a)=LOWER(t2.a) eq_lower,
              BINARY UPPER(t1.a)=UPPER(t2.a) AS eq_upper,
              t1.a AS a, t2.a AS b
            FROM
              t1 t1, t1 t2
            WHERE
              t1.has_casefolding=1
            AND (BINARY LOWER(t1.a)=LOWER(t2.a))<>(BINARY UPPER(t1.a)=UPPER(t2.a));

            SELECT * FROM t21;
            {code}
            {noformat}
            +--------+--------+----------+----------+------+------+
            | hex_a | hex_b | eq_lower | eq_upper | a | b |
            +--------+--------+----------+----------+------+------+
            | E1BA9E | C39F | 1 | 0 | ẞ | ß |
            | CE99 | CD85 | 0 | 1 | Ι | ͅ |
            | CEB9 | CD85 | 0 | 1 | ι | ͅ |
            | E1BEBE | CD85 | 0 | 1 | ι | ͅ |
            | E284AB | C385 | 1 | 0 | Å | Å |
            | E284AB | C3A5 | 1 | 0 | Å | å |
            | C385 | E284AB | 1 | 0 | Å | Å |
            | C3A5 | E284AB | 1 | 0 | å | Å |
            | C4B0 | 49 | 1 | 0 | İ | I |
            | C4B1 | 49 | 0 | 1 | ı | I |
            | C4B0 | 69 | 1 | 0 | İ | i |
            | C4B1 | 69 | 0 | 1 | ı | i |
            | 49 | C4B0 | 1 | 0 | I | İ |
            | 69 | C4B0 | 1 | 0 | i | İ |
            | 49 | C4B1 | 0 | 1 | I | ı |
            | 69 | C4B1 | 0 | 1 | i | ı |
            | E284AA | 4B | 1 | 0 | K | K |
            | E284AA | 6B | 1 | 0 | K | k |
            | 4B | E284AA | 1 | 0 | K | K |
            | 6B | E284AA | 1 | 0 | k | K |
            | C5BF | 53 | 0 | 1 | ſ | S |
            | C5BF | 73 | 0 | 1 | ſ | s |
            | 53 | C5BF | 0 | 1 | S | ſ |
            | 73 | C5BF | 0 | 1 | s | ſ |
            | E1BA9B | E1B9A0 | 0 | 1 | ẛ | Ṡ |
            | E1BA9B | E1B9A1 | 0 | 1 | ẛ | ṡ |
            | E1B9A0 | E1BA9B | 0 | 1 | Ṡ | ẛ |
            | E1B9A1 | E1BA9B | 0 | 1 | ṡ | ẛ |
            | CF90 | CE92 | 0 | 1 | ϐ | Β |
            | CF90 | CEB2 | 0 | 1 | ϐ | β |
            | CE92 | CF90 | 0 | 1 | Β | ϐ |
            | CEB2 | CF90 | 0 | 1 | β | ϐ |
            | CFB5 | CE95 | 0 | 1 | ϵ | Ε |
            | CFB5 | CEB5 | 0 | 1 | ϵ | ε |
            | CE95 | CFB5 | 0 | 1 | Ε | ϵ |
            | CEB5 | CFB5 | 0 | 1 | ε | ϵ |
            | CF91 | CE98 | 0 | 1 | ϑ | Θ |
            | CFB4 | CE98 | 1 | 0 | ϴ | Θ |
            | CF91 | CEB8 | 0 | 1 | ϑ | θ |
            | CFB4 | CEB8 | 1 | 0 | ϴ | θ |
            | CE98 | CF91 | 0 | 1 | Θ | ϑ |
            | CEB8 | CF91 | 0 | 1 | θ | ϑ |
            | CE98 | CFB4 | 1 | 0 | Θ | ϴ |
            | CEB8 | CFB4 | 1 | 0 | θ | ϴ |
            | CD85 | CE99 | 0 | 1 | ͅ | Ι |
            | E1BEBE | CE99 | 0 | 1 | ι | Ι |
            | CD85 | CEB9 | 0 | 1 | ͅ | ι |
            | E1BEBE | CEB9 | 0 | 1 | ι | ι |
            | CD85 | E1BEBE | 0 | 1 | ͅ | ι |
            | CE99 | E1BEBE | 0 | 1 | Ι | ι |
            | CEB9 | E1BEBE | 0 | 1 | ι | ι |
            | CFB0 | CE9A | 0 | 1 | ϰ | Κ |
            | CFB0 | CEBA | 0 | 1 | ϰ | κ |
            | CE9A | CFB0 | 0 | 1 | Κ | ϰ |
            | CEBA | CFB0 | 0 | 1 | κ | ϰ |
            | CE9C | C2B5 | 0 | 1 | Μ | µ |
            | CEBC | C2B5 | 0 | 1 | μ | µ |
            | C2B5 | CE9C | 0 | 1 | µ | Μ |
            | C2B5 | CEBC | 0 | 1 | µ | μ |
            | CF96 | CEA0 | 0 | 1 | ϖ | Π |
            | CF96 | CF80 | 0 | 1 | ϖ | π |
            | CEA0 | CF96 | 0 | 1 | Π | ϖ |
            | CF80 | CF96 | 0 | 1 | π | ϖ |
            | CFB1 | CEA1 | 0 | 1 | ϱ | Ρ |
            | CFB1 | CF81 | 0 | 1 | ϱ | ρ |
            | CEA1 | CFB1 | 0 | 1 | Ρ | ϱ |
            | CF81 | CFB1 | 0 | 1 | ρ | ϱ |
            | CF82 | CEA3 | 0 | 1 | ς | Σ |
            | CEA3 | CF82 | 0 | 1 | Σ | ς |
            | CF83 | CF82 | 0 | 1 | σ | ς |
            | CF82 | CF83 | 0 | 1 | ς | σ |
            | CF95 | CEA6 | 0 | 1 | ϕ | Φ |
            | CF95 | CF86 | 0 | 1 | ϕ | φ |
            | CEA6 | CF95 | 0 | 1 | Φ | ϕ |
            | CF86 | CF95 | 0 | 1 | φ | ϕ |
            | E284A6 | CEA9 | 1 | 0 | Ω | Ω |
            | E284A6 | CF89 | 1 | 0 | Ω | ω |
            | CEA9 | E284A6 | 1 | 0 | Ω | Ω |
            | CF89 | E284A6 | 1 | 0 | ω | Ω |
            | E1B280 | D092 | 0 | 1 | ᲀ | В |
            | E1B280 | D0B2 | 0 | 1 | ᲀ | в |
            | D092 | E1B280 | 0 | 1 | В | ᲀ |
            | D0B2 | E1B280 | 0 | 1 | в | ᲀ |
            | E1B281 | D094 | 0 | 1 | ᲁ | Д |
            | E1B281 | D0B4 | 0 | 1 | ᲁ | д |
            | D094 | E1B281 | 0 | 1 | Д | ᲁ |
            | D0B4 | E1B281 | 0 | 1 | д | ᲁ |
            | E1B282 | D09E | 0 | 1 | ᲂ | О |
            | E1B282 | D0BE | 0 | 1 | ᲂ | о |
            | D09E | E1B282 | 0 | 1 | О | ᲂ |
            | D0BE | E1B282 | 0 | 1 | о | ᲂ |
            | E1B283 | D0A1 | 0 | 1 | ᲃ | С |
            | E1B283 | D181 | 0 | 1 | ᲃ | с |
            | D0A1 | E1B283 | 0 | 1 | С | ᲃ |
            | D181 | E1B283 | 0 | 1 | с | ᲃ |
            | E1B284 | D0A2 | 0 | 1 | ᲄ | Т |
            | E1B285 | D0A2 | 0 | 1 | ᲅ | Т |
            | E1B284 | D182 | 0 | 1 | ᲄ | т |
            | E1B285 | D182 | 0 | 1 | ᲅ | т |
            | D0A2 | E1B284 | 0 | 1 | Т | ᲄ |
            | D182 | E1B284 | 0 | 1 | т | ᲄ |
            | E1B285 | E1B284 | 0 | 1 | ᲅ | ᲄ |
            | D0A2 | E1B285 | 0 | 1 | Т | ᲅ |
            | D182 | E1B285 | 0 | 1 | т | ᲅ |
            | E1B284 | E1B285 | 0 | 1 | ᲄ | ᲅ |
            | EA998A | E1B288 | 0 | 1 | Ꙋ | ᲈ |
            | EA998B | E1B288 | 0 | 1 | ꙋ | ᲈ |
            | E1B288 | EA998A | 0 | 1 | ᲈ | Ꙋ |
            | E1B288 | EA998B | 0 | 1 | ᲈ | ꙋ |
            | E1B286 | D0AA | 0 | 1 | ᲆ | Ъ |
            | E1B286 | D18A | 0 | 1 | ᲆ | ъ |
            | D0AA | E1B286 | 0 | 1 | Ъ | ᲆ |
            | D18A | E1B286 | 0 | 1 | ъ | ᲆ |
            | E1B287 | D1A2 | 0 | 1 | ᲇ | Ѣ |
            | E1B287 | D1A3 | 0 | 1 | ᲇ | ѣ |
            | D1A2 | E1B287 | 0 | 1 | Ѣ | ᲇ |
            | D1A3 | E1B287 | 0 | 1 | ѣ | ᲇ |
            +--------+--------+----------+----------+------+------+
            {noformat}


            Note, toupper comparison considers more pairs as equal than tolower comparison:
            {code:sql}
            SELECT SUM(eq_lower), SUM(eq_upper) FROM t21;
            {code}
            {noformat}
            +---------------+---------------+
            | SUM(eq_lower) | SUM(eq_upper) |
            +---------------+---------------+
            | 21 | 96 |
            +---------------+---------------+
            {noformat}

            h2. Cons of toupper vs tolower comparison
            - The tolower variant will compare exactly like strcasecmp() did
            - The toupper variant will compare close to how utf8mb3_general_ci works
            There are two parallel comparison systems in MariaDB collation library, implemented as virtual functions in MY_COLLATION_HANDLER:

            - Comparison according to the collation, provided by these functions
            {code:cpp}
              int (*strnncoll)(CHARSET_INFO *,
                                   const uchar *, size_t, const uchar *, size_t, my_bool);
              int (*strnncollsp)(CHARSET_INFO *,
                                     const uchar *, size_t, const uchar *, size_t);

              int (*strnncollsp_nchars)(CHARSET_INFO *,
                                            const uchar *str1, size_t len1,
                                            const uchar *str2, size_t len2,
                                            size_t nchars,
                                            uint flags);
            {code}


            - Comparison in case insensitive (but accent sensitive) style, implemented by this function:
            {code:cpp}
              int (*strcasecmp)(CHARSET_INFO *, const char *, const char *);
            {code}
            Note, accent and case sensitivity of the collation does not matter. strcasecmp() always works using accent sensitive case insensitive comparison style.

            These two parallel systems are redundant.

            Note, strcasecmp() is used mostly to compare identifiers, while the functions of the first group are used to compare data.

            Let's get rid of the second comparison system:
            1. Remove MY_COLLATION_HANDLER::strcasecmp()
            2. Introduce a new collation utf8mb4_tolower_as_ci. Note, it should work for the entire Unicode range U+0000..U+10FFFF.
            3. Replace all calls for:
            {code:cpp}
            system_charset_info->coll->strcasecmp()
            {code}
            to calls for
            {code:cpp}
            my_charset_utf8mb4_tolower_as_ci->coll->strnncoll***()
            {code}

            The change would generally be quite mechanic. However, there is one small problem: strcasecmp() accepts 0-terminated strings, while the strnncoll-alike functions accept the pointer and the length. So some refactoring will be needed. Note, Monty earlier changed many MariaDB C data types to use LEX_CSTRING (instead of just a const char pointer) to store names. So this part of the current task will switch some more C data types to LEX_CSTRING.

            h2. tolower vs toupper comparison

            Another option is to implement utf8mb3_toupper_ci which will compare upper cases (instead of lower cases).


            The difference is in a few dozen BMP characters. This script finds all those characters:
            {code:sql}
            CREATE OR REPLACE TABLE t1 (a CHAR(1) CHARACTER SET utf8mb4 COLLATE utf8mb4_uca1400_ai_ci) ENGINE=MyISAM;
            DELIMITER $$
            FOR i IN 0..0xFFFF
            DO
              INSERT INTO t1 VALUES (CHAR(i USING ucs2));
            END FOR;
            $$
            DELIMITER ;
            ALTER TABLE t1
              ADD has_casefolding INT DEFAULT (BINARY LOWER(a)<>a OR BINARY UPPER(a)<>a),
              ADD KEY(has_casefolding, a);

            CREATE OR REPLACE TABLE t21 AS
            SELECT
              HEX(t1.a) AS hex_a, HEX(t2.a) AS hex_b,
              BINARY LOWER(t1.a)=LOWER(t2.a) eq_lower,
              BINARY UPPER(t1.a)=UPPER(t2.a) AS eq_upper,
              t1.a AS a, t2.a AS b
            FROM
              t1 t1, t1 t2
            WHERE
              t1.has_casefolding=1
            AND (BINARY LOWER(t1.a)=LOWER(t2.a))<>(BINARY UPPER(t1.a)=UPPER(t2.a));

            SELECT
              HEX(CONVERT(a USING ucs2)) AS unicode_a,
              HEX(CONVERT(b USING ucs2)) AS unicode_b,
              t21.* FROM t21;
            {code}
            {noformat}
            +-----------+-----------+--------+--------+----------+----------+------+------+
            | unicode_a | unicode_b | hex_a | hex_b | eq_lower | eq_upper | a | b |
            +-----------+-----------+--------+--------+----------+----------+------+------+
            | 1E9E | 00DF | E1BA9E | C39F | 1 | 0 | ẞ | ß |
            | 0399 | 0345 | CE99 | CD85 | 0 | 1 | Ι | ͅ |
            | 03B9 | 0345 | CEB9 | CD85 | 0 | 1 | ι | ͅ |
            | 1FBE | 0345 | E1BEBE | CD85 | 0 | 1 | ι | ͅ |
            | 212B | 00C5 | E284AB | C385 | 1 | 0 | Å | Å |
            | 212B | 00E5 | E284AB | C3A5 | 1 | 0 | Å | å |
            | 00C5 | 212B | C385 | E284AB | 1 | 0 | Å | Å |
            | 00E5 | 212B | C3A5 | E284AB | 1 | 0 | å | Å |
            | 0130 | 0049 | C4B0 | 49 | 1 | 0 | İ | I |
            | 0131 | 0049 | C4B1 | 49 | 0 | 1 | ı | I |
            | 0130 | 0069 | C4B0 | 69 | 1 | 0 | İ | i |
            | 0131 | 0069 | C4B1 | 69 | 0 | 1 | ı | i |
            | 0049 | 0130 | 49 | C4B0 | 1 | 0 | I | İ |
            | 0069 | 0130 | 69 | C4B0 | 1 | 0 | i | İ |
            | 0049 | 0131 | 49 | C4B1 | 0 | 1 | I | ı |
            | 0069 | 0131 | 69 | C4B1 | 0 | 1 | i | ı |
            | 212A | 004B | E284AA | 4B | 1 | 0 | K | K |
            | 212A | 006B | E284AA | 6B | 1 | 0 | K | k |
            | 004B | 212A | 4B | E284AA | 1 | 0 | K | K |
            | 006B | 212A | 6B | E284AA | 1 | 0 | k | K |
            | 017F | 0053 | C5BF | 53 | 0 | 1 | ſ | S |
            | 017F | 0073 | C5BF | 73 | 0 | 1 | ſ | s |
            | 0053 | 017F | 53 | C5BF | 0 | 1 | S | ſ |
            | 0073 | 017F | 73 | C5BF | 0 | 1 | s | ſ |
            | 1E9B | 1E60 | E1BA9B | E1B9A0 | 0 | 1 | ẛ | Ṡ |
            | 1E9B | 1E61 | E1BA9B | E1B9A1 | 0 | 1 | ẛ | ṡ |
            | 1E60 | 1E9B | E1B9A0 | E1BA9B | 0 | 1 | Ṡ | ẛ |
            | 1E61 | 1E9B | E1B9A1 | E1BA9B | 0 | 1 | ṡ | ẛ |
            | 03D0 | 0392 | CF90 | CE92 | 0 | 1 | ϐ | Β |
            | 03D0 | 03B2 | CF90 | CEB2 | 0 | 1 | ϐ | β |
            | 0392 | 03D0 | CE92 | CF90 | 0 | 1 | Β | ϐ |
            | 03B2 | 03D0 | CEB2 | CF90 | 0 | 1 | β | ϐ |
            | 03F5 | 0395 | CFB5 | CE95 | 0 | 1 | ϵ | Ε |
            | 03F5 | 03B5 | CFB5 | CEB5 | 0 | 1 | ϵ | ε |
            | 0395 | 03F5 | CE95 | CFB5 | 0 | 1 | Ε | ϵ |
            | 03B5 | 03F5 | CEB5 | CFB5 | 0 | 1 | ε | ϵ |
            | 03D1 | 0398 | CF91 | CE98 | 0 | 1 | ϑ | Θ |
            | 03F4 | 0398 | CFB4 | CE98 | 1 | 0 | ϴ | Θ |
            | 03D1 | 03B8 | CF91 | CEB8 | 0 | 1 | ϑ | θ |
            | 03F4 | 03B8 | CFB4 | CEB8 | 1 | 0 | ϴ | θ |
            | 0398 | 03D1 | CE98 | CF91 | 0 | 1 | Θ | ϑ |
            | 03B8 | 03D1 | CEB8 | CF91 | 0 | 1 | θ | ϑ |
            | 0398 | 03F4 | CE98 | CFB4 | 1 | 0 | Θ | ϴ |
            | 03B8 | 03F4 | CEB8 | CFB4 | 1 | 0 | θ | ϴ |
            | 0345 | 0399 | CD85 | CE99 | 0 | 1 | ͅ | Ι |
            | 1FBE | 0399 | E1BEBE | CE99 | 0 | 1 | ι | Ι |
            | 0345 | 03B9 | CD85 | CEB9 | 0 | 1 | ͅ | ι |
            | 1FBE | 03B9 | E1BEBE | CEB9 | 0 | 1 | ι | ι |
            | 0345 | 1FBE | CD85 | E1BEBE | 0 | 1 | ͅ | ι |
            | 0399 | 1FBE | CE99 | E1BEBE | 0 | 1 | Ι | ι |
            | 03B9 | 1FBE | CEB9 | E1BEBE | 0 | 1 | ι | ι |
            | 03F0 | 039A | CFB0 | CE9A | 0 | 1 | ϰ | Κ |
            | 03F0 | 03BA | CFB0 | CEBA | 0 | 1 | ϰ | κ |
            | 039A | 03F0 | CE9A | CFB0 | 0 | 1 | Κ | ϰ |
            | 03BA | 03F0 | CEBA | CFB0 | 0 | 1 | κ | ϰ |
            | 039C | 00B5 | CE9C | C2B5 | 0 | 1 | Μ | µ |
            | 03BC | 00B5 | CEBC | C2B5 | 0 | 1 | μ | µ |
            | 00B5 | 039C | C2B5 | CE9C | 0 | 1 | µ | Μ |
            | 00B5 | 03BC | C2B5 | CEBC | 0 | 1 | µ | μ |
            | 03D6 | 03A0 | CF96 | CEA0 | 0 | 1 | ϖ | Π |
            | 03D6 | 03C0 | CF96 | CF80 | 0 | 1 | ϖ | π |
            | 03A0 | 03D6 | CEA0 | CF96 | 0 | 1 | Π | ϖ |
            | 03C0 | 03D6 | CF80 | CF96 | 0 | 1 | π | ϖ |
            | 03F1 | 03A1 | CFB1 | CEA1 | 0 | 1 | ϱ | Ρ |
            | 03F1 | 03C1 | CFB1 | CF81 | 0 | 1 | ϱ | ρ |
            | 03A1 | 03F1 | CEA1 | CFB1 | 0 | 1 | Ρ | ϱ |
            | 03C1 | 03F1 | CF81 | CFB1 | 0 | 1 | ρ | ϱ |
            | 03C2 | 03A3 | CF82 | CEA3 | 0 | 1 | ς | Σ |
            | 03A3 | 03C2 | CEA3 | CF82 | 0 | 1 | Σ | ς |
            | 03C3 | 03C2 | CF83 | CF82 | 0 | 1 | σ | ς |
            | 03C2 | 03C3 | CF82 | CF83 | 0 | 1 | ς | σ |
            | 03D5 | 03A6 | CF95 | CEA6 | 0 | 1 | ϕ | Φ |
            | 03D5 | 03C6 | CF95 | CF86 | 0 | 1 | ϕ | φ |
            | 03A6 | 03D5 | CEA6 | CF95 | 0 | 1 | Φ | ϕ |
            | 03C6 | 03D5 | CF86 | CF95 | 0 | 1 | φ | ϕ |
            | 2126 | 03A9 | E284A6 | CEA9 | 1 | 0 | Ω | Ω |
            | 2126 | 03C9 | E284A6 | CF89 | 1 | 0 | Ω | ω |
            | 03A9 | 2126 | CEA9 | E284A6 | 1 | 0 | Ω | Ω |
            | 03C9 | 2126 | CF89 | E284A6 | 1 | 0 | ω | Ω |
            | 1C80 | 0412 | E1B280 | D092 | 0 | 1 | ᲀ | В |
            | 1C80 | 0432 | E1B280 | D0B2 | 0 | 1 | ᲀ | в |
            | 0412 | 1C80 | D092 | E1B280 | 0 | 1 | В | ᲀ |
            | 0432 | 1C80 | D0B2 | E1B280 | 0 | 1 | в | ᲀ |
            | 1C81 | 0414 | E1B281 | D094 | 0 | 1 | ᲁ | Д |
            | 1C81 | 0434 | E1B281 | D0B4 | 0 | 1 | ᲁ | д |
            | 0414 | 1C81 | D094 | E1B281 | 0 | 1 | Д | ᲁ |
            | 0434 | 1C81 | D0B4 | E1B281 | 0 | 1 | д | ᲁ |
            | 1C82 | 041E | E1B282 | D09E | 0 | 1 | ᲂ | О |
            | 1C82 | 043E | E1B282 | D0BE | 0 | 1 | ᲂ | о |
            | 041E | 1C82 | D09E | E1B282 | 0 | 1 | О | ᲂ |
            | 043E | 1C82 | D0BE | E1B282 | 0 | 1 | о | ᲂ |
            | 1C83 | 0421 | E1B283 | D0A1 | 0 | 1 | ᲃ | С |
            | 1C83 | 0441 | E1B283 | D181 | 0 | 1 | ᲃ | с |
            | 0421 | 1C83 | D0A1 | E1B283 | 0 | 1 | С | ᲃ |
            | 0441 | 1C83 | D181 | E1B283 | 0 | 1 | с | ᲃ |
            | 1C84 | 0422 | E1B284 | D0A2 | 0 | 1 | ᲄ | Т |
            | 1C85 | 0422 | E1B285 | D0A2 | 0 | 1 | ᲅ | Т |
            | 1C84 | 0442 | E1B284 | D182 | 0 | 1 | ᲄ | т |
            | 1C85 | 0442 | E1B285 | D182 | 0 | 1 | ᲅ | т |
            | 0422 | 1C84 | D0A2 | E1B284 | 0 | 1 | Т | ᲄ |
            | 0442 | 1C84 | D182 | E1B284 | 0 | 1 | т | ᲄ |
            | 1C85 | 1C84 | E1B285 | E1B284 | 0 | 1 | ᲅ | ᲄ |
            | 0422 | 1C85 | D0A2 | E1B285 | 0 | 1 | Т | ᲅ |
            | 0442 | 1C85 | D182 | E1B285 | 0 | 1 | т | ᲅ |
            | 1C84 | 1C85 | E1B284 | E1B285 | 0 | 1 | ᲄ | ᲅ |
            | A64A | 1C88 | EA998A | E1B288 | 0 | 1 | Ꙋ | ᲈ |
            | A64B | 1C88 | EA998B | E1B288 | 0 | 1 | ꙋ | ᲈ |
            | 1C88 | A64A | E1B288 | EA998A | 0 | 1 | ᲈ | Ꙋ |
            | 1C88 | A64B | E1B288 | EA998B | 0 | 1 | ᲈ | ꙋ |
            | 1C86 | 042A | E1B286 | D0AA | 0 | 1 | ᲆ | Ъ |
            | 1C86 | 044A | E1B286 | D18A | 0 | 1 | ᲆ | ъ |
            | 042A | 1C86 | D0AA | E1B286 | 0 | 1 | Ъ | ᲆ |
            | 044A | 1C86 | D18A | E1B286 | 0 | 1 | ъ | ᲆ |
            | 1C87 | 0462 | E1B287 | D1A2 | 0 | 1 | ᲇ | Ѣ |
            | 1C87 | 0463 | E1B287 | D1A3 | 0 | 1 | ᲇ | ѣ |
            | 0462 | 1C87 | D1A2 | E1B287 | 0 | 1 | Ѣ | ᲇ |
            | 0463 | 1C87 | D1A3 | E1B287 | 0 | 1 | ѣ | ᲇ |
            +-----------+-----------+--------+--------+----------+----------+------+------+
            {noformat}


            Note, toupper comparison considers more pairs as equal than tolower comparison:
            {code:sql}
            SELECT SUM(eq_lower), SUM(eq_upper) FROM t21;
            {code}
            {noformat}
            +---------------+---------------+
            | SUM(eq_lower) | SUM(eq_upper) |
            +---------------+---------------+
            | 21 | 96 |
            +---------------+---------------+
            {noformat}

            h2. Cons of toupper vs tolower comparison
            - The tolower variant will compare exactly like strcasecmp() did
            - The toupper variant will compare close to how utf8mb3_general_ci works
            bar Alexander Barkov made changes -
            Description There are two parallel comparison systems in MariaDB collation library, implemented as virtual functions in MY_COLLATION_HANDLER:

            - Comparison according to the collation, provided by these functions
            {code:cpp}
              int (*strnncoll)(CHARSET_INFO *,
                                   const uchar *, size_t, const uchar *, size_t, my_bool);
              int (*strnncollsp)(CHARSET_INFO *,
                                     const uchar *, size_t, const uchar *, size_t);

              int (*strnncollsp_nchars)(CHARSET_INFO *,
                                            const uchar *str1, size_t len1,
                                            const uchar *str2, size_t len2,
                                            size_t nchars,
                                            uint flags);
            {code}


            - Comparison in case insensitive (but accent sensitive) style, implemented by this function:
            {code:cpp}
              int (*strcasecmp)(CHARSET_INFO *, const char *, const char *);
            {code}
            Note, accent and case sensitivity of the collation does not matter. strcasecmp() always works using accent sensitive case insensitive comparison style.

            These two parallel systems are redundant.

            Note, strcasecmp() is used mostly to compare identifiers, while the functions of the first group are used to compare data.

            Let's get rid of the second comparison system:
            1. Remove MY_COLLATION_HANDLER::strcasecmp()
            2. Introduce a new collation utf8mb4_tolower_as_ci. Note, it should work for the entire Unicode range U+0000..U+10FFFF.
            3. Replace all calls for:
            {code:cpp}
            system_charset_info->coll->strcasecmp()
            {code}
            to calls for
            {code:cpp}
            my_charset_utf8mb4_tolower_as_ci->coll->strnncoll***()
            {code}

            The change would generally be quite mechanic. However, there is one small problem: strcasecmp() accepts 0-terminated strings, while the strnncoll-alike functions accept the pointer and the length. So some refactoring will be needed. Note, Monty earlier changed many MariaDB C data types to use LEX_CSTRING (instead of just a const char pointer) to store names. So this part of the current task will switch some more C data types to LEX_CSTRING.

            h2. tolower vs toupper comparison

            Another option is to implement utf8mb3_toupper_ci which will compare upper cases (instead of lower cases).


            The difference is in a few dozen BMP characters. This script finds all those characters:
            {code:sql}
            CREATE OR REPLACE TABLE t1 (a CHAR(1) CHARACTER SET utf8mb4 COLLATE utf8mb4_uca1400_ai_ci) ENGINE=MyISAM;
            DELIMITER $$
            FOR i IN 0..0xFFFF
            DO
              INSERT INTO t1 VALUES (CHAR(i USING ucs2));
            END FOR;
            $$
            DELIMITER ;
            ALTER TABLE t1
              ADD has_casefolding INT DEFAULT (BINARY LOWER(a)<>a OR BINARY UPPER(a)<>a),
              ADD KEY(has_casefolding, a);

            CREATE OR REPLACE TABLE t21 AS
            SELECT
              HEX(t1.a) AS hex_a, HEX(t2.a) AS hex_b,
              BINARY LOWER(t1.a)=LOWER(t2.a) eq_lower,
              BINARY UPPER(t1.a)=UPPER(t2.a) AS eq_upper,
              t1.a AS a, t2.a AS b
            FROM
              t1 t1, t1 t2
            WHERE
              t1.has_casefolding=1
            AND (BINARY LOWER(t1.a)=LOWER(t2.a))<>(BINARY UPPER(t1.a)=UPPER(t2.a));

            SELECT
              HEX(CONVERT(a USING ucs2)) AS unicode_a,
              HEX(CONVERT(b USING ucs2)) AS unicode_b,
              t21.* FROM t21;
            {code}
            {noformat}
            +-----------+-----------+--------+--------+----------+----------+------+------+
            | unicode_a | unicode_b | hex_a | hex_b | eq_lower | eq_upper | a | b |
            +-----------+-----------+--------+--------+----------+----------+------+------+
            | 1E9E | 00DF | E1BA9E | C39F | 1 | 0 | ẞ | ß |
            | 0399 | 0345 | CE99 | CD85 | 0 | 1 | Ι | ͅ |
            | 03B9 | 0345 | CEB9 | CD85 | 0 | 1 | ι | ͅ |
            | 1FBE | 0345 | E1BEBE | CD85 | 0 | 1 | ι | ͅ |
            | 212B | 00C5 | E284AB | C385 | 1 | 0 | Å | Å |
            | 212B | 00E5 | E284AB | C3A5 | 1 | 0 | Å | å |
            | 00C5 | 212B | C385 | E284AB | 1 | 0 | Å | Å |
            | 00E5 | 212B | C3A5 | E284AB | 1 | 0 | å | Å |
            | 0130 | 0049 | C4B0 | 49 | 1 | 0 | İ | I |
            | 0131 | 0049 | C4B1 | 49 | 0 | 1 | ı | I |
            | 0130 | 0069 | C4B0 | 69 | 1 | 0 | İ | i |
            | 0131 | 0069 | C4B1 | 69 | 0 | 1 | ı | i |
            | 0049 | 0130 | 49 | C4B0 | 1 | 0 | I | İ |
            | 0069 | 0130 | 69 | C4B0 | 1 | 0 | i | İ |
            | 0049 | 0131 | 49 | C4B1 | 0 | 1 | I | ı |
            | 0069 | 0131 | 69 | C4B1 | 0 | 1 | i | ı |
            | 212A | 004B | E284AA | 4B | 1 | 0 | K | K |
            | 212A | 006B | E284AA | 6B | 1 | 0 | K | k |
            | 004B | 212A | 4B | E284AA | 1 | 0 | K | K |
            | 006B | 212A | 6B | E284AA | 1 | 0 | k | K |
            | 017F | 0053 | C5BF | 53 | 0 | 1 | ſ | S |
            | 017F | 0073 | C5BF | 73 | 0 | 1 | ſ | s |
            | 0053 | 017F | 53 | C5BF | 0 | 1 | S | ſ |
            | 0073 | 017F | 73 | C5BF | 0 | 1 | s | ſ |
            | 1E9B | 1E60 | E1BA9B | E1B9A0 | 0 | 1 | ẛ | Ṡ |
            | 1E9B | 1E61 | E1BA9B | E1B9A1 | 0 | 1 | ẛ | ṡ |
            | 1E60 | 1E9B | E1B9A0 | E1BA9B | 0 | 1 | Ṡ | ẛ |
            | 1E61 | 1E9B | E1B9A1 | E1BA9B | 0 | 1 | ṡ | ẛ |
            | 03D0 | 0392 | CF90 | CE92 | 0 | 1 | ϐ | Β |
            | 03D0 | 03B2 | CF90 | CEB2 | 0 | 1 | ϐ | β |
            | 0392 | 03D0 | CE92 | CF90 | 0 | 1 | Β | ϐ |
            | 03B2 | 03D0 | CEB2 | CF90 | 0 | 1 | β | ϐ |
            | 03F5 | 0395 | CFB5 | CE95 | 0 | 1 | ϵ | Ε |
            | 03F5 | 03B5 | CFB5 | CEB5 | 0 | 1 | ϵ | ε |
            | 0395 | 03F5 | CE95 | CFB5 | 0 | 1 | Ε | ϵ |
            | 03B5 | 03F5 | CEB5 | CFB5 | 0 | 1 | ε | ϵ |
            | 03D1 | 0398 | CF91 | CE98 | 0 | 1 | ϑ | Θ |
            | 03F4 | 0398 | CFB4 | CE98 | 1 | 0 | ϴ | Θ |
            | 03D1 | 03B8 | CF91 | CEB8 | 0 | 1 | ϑ | θ |
            | 03F4 | 03B8 | CFB4 | CEB8 | 1 | 0 | ϴ | θ |
            | 0398 | 03D1 | CE98 | CF91 | 0 | 1 | Θ | ϑ |
            | 03B8 | 03D1 | CEB8 | CF91 | 0 | 1 | θ | ϑ |
            | 0398 | 03F4 | CE98 | CFB4 | 1 | 0 | Θ | ϴ |
            | 03B8 | 03F4 | CEB8 | CFB4 | 1 | 0 | θ | ϴ |
            | 0345 | 0399 | CD85 | CE99 | 0 | 1 | ͅ | Ι |
            | 1FBE | 0399 | E1BEBE | CE99 | 0 | 1 | ι | Ι |
            | 0345 | 03B9 | CD85 | CEB9 | 0 | 1 | ͅ | ι |
            | 1FBE | 03B9 | E1BEBE | CEB9 | 0 | 1 | ι | ι |
            | 0345 | 1FBE | CD85 | E1BEBE | 0 | 1 | ͅ | ι |
            | 0399 | 1FBE | CE99 | E1BEBE | 0 | 1 | Ι | ι |
            | 03B9 | 1FBE | CEB9 | E1BEBE | 0 | 1 | ι | ι |
            | 03F0 | 039A | CFB0 | CE9A | 0 | 1 | ϰ | Κ |
            | 03F0 | 03BA | CFB0 | CEBA | 0 | 1 | ϰ | κ |
            | 039A | 03F0 | CE9A | CFB0 | 0 | 1 | Κ | ϰ |
            | 03BA | 03F0 | CEBA | CFB0 | 0 | 1 | κ | ϰ |
            | 039C | 00B5 | CE9C | C2B5 | 0 | 1 | Μ | µ |
            | 03BC | 00B5 | CEBC | C2B5 | 0 | 1 | μ | µ |
            | 00B5 | 039C | C2B5 | CE9C | 0 | 1 | µ | Μ |
            | 00B5 | 03BC | C2B5 | CEBC | 0 | 1 | µ | μ |
            | 03D6 | 03A0 | CF96 | CEA0 | 0 | 1 | ϖ | Π |
            | 03D6 | 03C0 | CF96 | CF80 | 0 | 1 | ϖ | π |
            | 03A0 | 03D6 | CEA0 | CF96 | 0 | 1 | Π | ϖ |
            | 03C0 | 03D6 | CF80 | CF96 | 0 | 1 | π | ϖ |
            | 03F1 | 03A1 | CFB1 | CEA1 | 0 | 1 | ϱ | Ρ |
            | 03F1 | 03C1 | CFB1 | CF81 | 0 | 1 | ϱ | ρ |
            | 03A1 | 03F1 | CEA1 | CFB1 | 0 | 1 | Ρ | ϱ |
            | 03C1 | 03F1 | CF81 | CFB1 | 0 | 1 | ρ | ϱ |
            | 03C2 | 03A3 | CF82 | CEA3 | 0 | 1 | ς | Σ |
            | 03A3 | 03C2 | CEA3 | CF82 | 0 | 1 | Σ | ς |
            | 03C3 | 03C2 | CF83 | CF82 | 0 | 1 | σ | ς |
            | 03C2 | 03C3 | CF82 | CF83 | 0 | 1 | ς | σ |
            | 03D5 | 03A6 | CF95 | CEA6 | 0 | 1 | ϕ | Φ |
            | 03D5 | 03C6 | CF95 | CF86 | 0 | 1 | ϕ | φ |
            | 03A6 | 03D5 | CEA6 | CF95 | 0 | 1 | Φ | ϕ |
            | 03C6 | 03D5 | CF86 | CF95 | 0 | 1 | φ | ϕ |
            | 2126 | 03A9 | E284A6 | CEA9 | 1 | 0 | Ω | Ω |
            | 2126 | 03C9 | E284A6 | CF89 | 1 | 0 | Ω | ω |
            | 03A9 | 2126 | CEA9 | E284A6 | 1 | 0 | Ω | Ω |
            | 03C9 | 2126 | CF89 | E284A6 | 1 | 0 | ω | Ω |
            | 1C80 | 0412 | E1B280 | D092 | 0 | 1 | ᲀ | В |
            | 1C80 | 0432 | E1B280 | D0B2 | 0 | 1 | ᲀ | в |
            | 0412 | 1C80 | D092 | E1B280 | 0 | 1 | В | ᲀ |
            | 0432 | 1C80 | D0B2 | E1B280 | 0 | 1 | в | ᲀ |
            | 1C81 | 0414 | E1B281 | D094 | 0 | 1 | ᲁ | Д |
            | 1C81 | 0434 | E1B281 | D0B4 | 0 | 1 | ᲁ | д |
            | 0414 | 1C81 | D094 | E1B281 | 0 | 1 | Д | ᲁ |
            | 0434 | 1C81 | D0B4 | E1B281 | 0 | 1 | д | ᲁ |
            | 1C82 | 041E | E1B282 | D09E | 0 | 1 | ᲂ | О |
            | 1C82 | 043E | E1B282 | D0BE | 0 | 1 | ᲂ | о |
            | 041E | 1C82 | D09E | E1B282 | 0 | 1 | О | ᲂ |
            | 043E | 1C82 | D0BE | E1B282 | 0 | 1 | о | ᲂ |
            | 1C83 | 0421 | E1B283 | D0A1 | 0 | 1 | ᲃ | С |
            | 1C83 | 0441 | E1B283 | D181 | 0 | 1 | ᲃ | с |
            | 0421 | 1C83 | D0A1 | E1B283 | 0 | 1 | С | ᲃ |
            | 0441 | 1C83 | D181 | E1B283 | 0 | 1 | с | ᲃ |
            | 1C84 | 0422 | E1B284 | D0A2 | 0 | 1 | ᲄ | Т |
            | 1C85 | 0422 | E1B285 | D0A2 | 0 | 1 | ᲅ | Т |
            | 1C84 | 0442 | E1B284 | D182 | 0 | 1 | ᲄ | т |
            | 1C85 | 0442 | E1B285 | D182 | 0 | 1 | ᲅ | т |
            | 0422 | 1C84 | D0A2 | E1B284 | 0 | 1 | Т | ᲄ |
            | 0442 | 1C84 | D182 | E1B284 | 0 | 1 | т | ᲄ |
            | 1C85 | 1C84 | E1B285 | E1B284 | 0 | 1 | ᲅ | ᲄ |
            | 0422 | 1C85 | D0A2 | E1B285 | 0 | 1 | Т | ᲅ |
            | 0442 | 1C85 | D182 | E1B285 | 0 | 1 | т | ᲅ |
            | 1C84 | 1C85 | E1B284 | E1B285 | 0 | 1 | ᲄ | ᲅ |
            | A64A | 1C88 | EA998A | E1B288 | 0 | 1 | Ꙋ | ᲈ |
            | A64B | 1C88 | EA998B | E1B288 | 0 | 1 | ꙋ | ᲈ |
            | 1C88 | A64A | E1B288 | EA998A | 0 | 1 | ᲈ | Ꙋ |
            | 1C88 | A64B | E1B288 | EA998B | 0 | 1 | ᲈ | ꙋ |
            | 1C86 | 042A | E1B286 | D0AA | 0 | 1 | ᲆ | Ъ |
            | 1C86 | 044A | E1B286 | D18A | 0 | 1 | ᲆ | ъ |
            | 042A | 1C86 | D0AA | E1B286 | 0 | 1 | Ъ | ᲆ |
            | 044A | 1C86 | D18A | E1B286 | 0 | 1 | ъ | ᲆ |
            | 1C87 | 0462 | E1B287 | D1A2 | 0 | 1 | ᲇ | Ѣ |
            | 1C87 | 0463 | E1B287 | D1A3 | 0 | 1 | ᲇ | ѣ |
            | 0462 | 1C87 | D1A2 | E1B287 | 0 | 1 | Ѣ | ᲇ |
            | 0463 | 1C87 | D1A3 | E1B287 | 0 | 1 | ѣ | ᲇ |
            +-----------+-----------+--------+--------+----------+----------+------+------+
            {noformat}


            Note, toupper comparison considers more pairs as equal than tolower comparison:
            {code:sql}
            SELECT SUM(eq_lower), SUM(eq_upper) FROM t21;
            {code}
            {noformat}
            +---------------+---------------+
            | SUM(eq_lower) | SUM(eq_upper) |
            +---------------+---------------+
            | 21 | 96 |
            +---------------+---------------+
            {noformat}

            h2. Cons of toupper vs tolower comparison
            - The tolower variant will compare exactly like strcasecmp() did
            - The toupper variant will compare close to how utf8mb3_general_ci works
            There are two parallel comparison systems in MariaDB collation library, implemented as virtual functions in MY_COLLATION_HANDLER:

            - Comparison according to the collation, provided by these functions
            {code:cpp}
              int (*strnncoll)(CHARSET_INFO *,
                                   const uchar *, size_t, const uchar *, size_t, my_bool);
              int (*strnncollsp)(CHARSET_INFO *,
                                     const uchar *, size_t, const uchar *, size_t);

              int (*strnncollsp_nchars)(CHARSET_INFO *,
                                            const uchar *str1, size_t len1,
                                            const uchar *str2, size_t len2,
                                            size_t nchars,
                                            uint flags);
            {code}


            - Comparison in case insensitive (but accent sensitive) style, implemented by this function:
            {code:cpp}
              int (*strcasecmp)(CHARSET_INFO *, const char *, const char *);
            {code}
            Note, accent and case sensitivity of the collation does not matter. strcasecmp() always works using accent sensitive case insensitive comparison style.

            These two parallel systems are redundant.

            Note, strcasecmp() is used mostly to compare identifiers, while the functions of the first group are used to compare data.

            Let's get rid of the second comparison system:
            1. Remove MY_COLLATION_HANDLER::strcasecmp()
            2. Introduce a new collation utf8mb4_tolower_as_ci. Note, it should work for the entire Unicode range U+0000..U+10FFFF.
            3. Replace all calls for:
            {code:cpp}
            system_charset_info->coll->strcasecmp()
            {code}
            to calls for
            {code:cpp}
            my_charset_utf8mb4_tolower_as_ci->coll->strnncoll***()
            {code}

            The change would generally be quite mechanic. However, there is one small problem: strcasecmp() accepts 0-terminated strings, while the strnncoll-alike functions accept the pointer and the length. So some refactoring will be needed. Note, Monty earlier changed many MariaDB C data types to use LEX_CSTRING (instead of just a const char pointer) to store names. So this part of the current task will switch some more C data types to LEX_CSTRING.

            h2. tolower vs toupper comparison

            Another option is to implement utf8mb3_toupper_ci which will compare upper cases (instead of lower cases).


            The difference is in a few dozen BMP characters. This script finds all those characters:
            {code:sql}
            CREATE OR REPLACE TABLE t1 (a CHAR(1) CHARACTER SET utf8mb4 COLLATE utf8mb4_uca1400_ai_ci) ENGINE=MyISAM;
            DELIMITER $$
            FOR i IN 0..0xFFFF
            DO
              INSERT INTO t1 VALUES (CHAR(i USING ucs2));
            END FOR;
            $$
            DELIMITER ;
            ALTER TABLE t1
              ADD has_casefolding INT DEFAULT (BINARY LOWER(a)<>a OR BINARY UPPER(a)<>a),
              ADD KEY(has_casefolding, a);

            CREATE OR REPLACE TABLE t21 AS
            SELECT
              HEX(t1.a) AS hex_a, HEX(t2.a) AS hex_b,
              BINARY LOWER(t1.a)=LOWER(t2.a) eq_lower,
              BINARY UPPER(t1.a)=UPPER(t2.a) AS eq_upper,
              t1.a AS a, t2.a AS b
            FROM
              t1 t1, t1 t2
            WHERE
              t1.has_casefolding=1
            AND (BINARY LOWER(t1.a)=LOWER(t2.a))<>(BINARY UPPER(t1.a)=UPPER(t2.a));

            SELECT
              HEX(CONVERT(a USING ucs2)) AS unicode_a,
              HEX(CONVERT(b USING ucs2)) AS unicode_b,
              t21.* FROM t21;
            {code}
            {noformat}
            +-----------+-----------+--------+--------+----------+----------+------+------+
            | unicode_a | unicode_b | hex_a | hex_b | eq_lower | eq_upper | a | b |
            +-----------+-----------+--------+--------+----------+----------+------+------+
            | 1E9E | 00DF | E1BA9E | C39F | 1 | 0 | ẞ | ß |
            | 0399 | 0345 | CE99 | CD85 | 0 | 1 | Ι | ͅ |
            | 03B9 | 0345 | CEB9 | CD85 | 0 | 1 | ι | ͅ |
            | 1FBE | 0345 | E1BEBE | CD85 | 0 | 1 | ι | ͅ |
            | 212B | 00C5 | E284AB | C385 | 1 | 0 | Å | Å |
            | 212B | 00E5 | E284AB | C3A5 | 1 | 0 | Å | å |
            | 00C5 | 212B | C385 | E284AB | 1 | 0 | Å | Å |
            | 00E5 | 212B | C3A5 | E284AB | 1 | 0 | å | Å |
            | 0130 | 0049 | C4B0 | 49 | 1 | 0 | İ | I |
            | 0131 | 0049 | C4B1 | 49 | 0 | 1 | ı | I |
            | 0130 | 0069 | C4B0 | 69 | 1 | 0 | İ | i |
            | 0131 | 0069 | C4B1 | 69 | 0 | 1 | ı | i |
            | 0049 | 0130 | 49 | C4B0 | 1 | 0 | I | İ |
            | 0069 | 0130 | 69 | C4B0 | 1 | 0 | i | İ |
            | 0049 | 0131 | 49 | C4B1 | 0 | 1 | I | ı |
            | 0069 | 0131 | 69 | C4B1 | 0 | 1 | i | ı |
            | 212A | 004B | E284AA | 4B | 1 | 0 | K | K |
            | 212A | 006B | E284AA | 6B | 1 | 0 | K | k |
            | 004B | 212A | 4B | E284AA | 1 | 0 | K | K |
            | 006B | 212A | 6B | E284AA | 1 | 0 | k | K |
            | 017F | 0053 | C5BF | 53 | 0 | 1 | ſ | S |
            | 017F | 0073 | C5BF | 73 | 0 | 1 | ſ | s |
            | 0053 | 017F | 53 | C5BF | 0 | 1 | S | ſ |
            | 0073 | 017F | 73 | C5BF | 0 | 1 | s | ſ |
            | 1E9B | 1E60 | E1BA9B | E1B9A0 | 0 | 1 | ẛ | Ṡ |
            | 1E9B | 1E61 | E1BA9B | E1B9A1 | 0 | 1 | ẛ | ṡ |
            | 1E60 | 1E9B | E1B9A0 | E1BA9B | 0 | 1 | Ṡ | ẛ |
            | 1E61 | 1E9B | E1B9A1 | E1BA9B | 0 | 1 | ṡ | ẛ |
            | 03D0 | 0392 | CF90 | CE92 | 0 | 1 | ϐ | Β |
            | 03D0 | 03B2 | CF90 | CEB2 | 0 | 1 | ϐ | β |
            | 0392 | 03D0 | CE92 | CF90 | 0 | 1 | Β | ϐ |
            | 03B2 | 03D0 | CEB2 | CF90 | 0 | 1 | β | ϐ |
            | 03F5 | 0395 | CFB5 | CE95 | 0 | 1 | ϵ | Ε |
            | 03F5 | 03B5 | CFB5 | CEB5 | 0 | 1 | ϵ | ε |
            | 0395 | 03F5 | CE95 | CFB5 | 0 | 1 | Ε | ϵ |
            | 03B5 | 03F5 | CEB5 | CFB5 | 0 | 1 | ε | ϵ |
            | 03D1 | 0398 | CF91 | CE98 | 0 | 1 | ϑ | Θ |
            | 03F4 | 0398 | CFB4 | CE98 | 1 | 0 | ϴ | Θ |
            | 03D1 | 03B8 | CF91 | CEB8 | 0 | 1 | ϑ | θ |
            | 03F4 | 03B8 | CFB4 | CEB8 | 1 | 0 | ϴ | θ |
            | 0398 | 03D1 | CE98 | CF91 | 0 | 1 | Θ | ϑ |
            | 03B8 | 03D1 | CEB8 | CF91 | 0 | 1 | θ | ϑ |
            | 0398 | 03F4 | CE98 | CFB4 | 1 | 0 | Θ | ϴ |
            | 03B8 | 03F4 | CEB8 | CFB4 | 1 | 0 | θ | ϴ |
            | 0345 | 0399 | CD85 | CE99 | 0 | 1 | ͅ | Ι |
            | 1FBE | 0399 | E1BEBE | CE99 | 0 | 1 | ι | Ι |
            | 0345 | 03B9 | CD85 | CEB9 | 0 | 1 | ͅ | ι |
            | 1FBE | 03B9 | E1BEBE | CEB9 | 0 | 1 | ι | ι |
            | 0345 | 1FBE | CD85 | E1BEBE | 0 | 1 | ͅ | ι |
            | 0399 | 1FBE | CE99 | E1BEBE | 0 | 1 | Ι | ι |
            | 03B9 | 1FBE | CEB9 | E1BEBE | 0 | 1 | ι | ι |
            | 03F0 | 039A | CFB0 | CE9A | 0 | 1 | ϰ | Κ |
            | 03F0 | 03BA | CFB0 | CEBA | 0 | 1 | ϰ | κ |
            | 039A | 03F0 | CE9A | CFB0 | 0 | 1 | Κ | ϰ |
            | 03BA | 03F0 | CEBA | CFB0 | 0 | 1 | κ | ϰ |
            | 039C | 00B5 | CE9C | C2B5 | 0 | 1 | Μ | µ |
            | 03BC | 00B5 | CEBC | C2B5 | 0 | 1 | μ | µ |
            | 00B5 | 039C | C2B5 | CE9C | 0 | 1 | µ | Μ |
            | 00B5 | 03BC | C2B5 | CEBC | 0 | 1 | µ | μ |
            | 03D6 | 03A0 | CF96 | CEA0 | 0 | 1 | ϖ | Π |
            | 03D6 | 03C0 | CF96 | CF80 | 0 | 1 | ϖ | π |
            | 03A0 | 03D6 | CEA0 | CF96 | 0 | 1 | Π | ϖ |
            | 03C0 | 03D6 | CF80 | CF96 | 0 | 1 | π | ϖ |
            | 03F1 | 03A1 | CFB1 | CEA1 | 0 | 1 | ϱ | Ρ |
            | 03F1 | 03C1 | CFB1 | CF81 | 0 | 1 | ϱ | ρ |
            | 03A1 | 03F1 | CEA1 | CFB1 | 0 | 1 | Ρ | ϱ |
            | 03C1 | 03F1 | CF81 | CFB1 | 0 | 1 | ρ | ϱ |
            | 03C2 | 03A3 | CF82 | CEA3 | 0 | 1 | ς | Σ |
            | 03A3 | 03C2 | CEA3 | CF82 | 0 | 1 | Σ | ς |
            | 03C3 | 03C2 | CF83 | CF82 | 0 | 1 | σ | ς |
            | 03C2 | 03C3 | CF82 | CF83 | 0 | 1 | ς | σ |
            | 03D5 | 03A6 | CF95 | CEA6 | 0 | 1 | ϕ | Φ |
            | 03D5 | 03C6 | CF95 | CF86 | 0 | 1 | ϕ | φ |
            | 03A6 | 03D5 | CEA6 | CF95 | 0 | 1 | Φ | ϕ |
            | 03C6 | 03D5 | CF86 | CF95 | 0 | 1 | φ | ϕ |
            | 2126 | 03A9 | E284A6 | CEA9 | 1 | 0 | Ω | Ω |
            | 2126 | 03C9 | E284A6 | CF89 | 1 | 0 | Ω | ω |
            | 03A9 | 2126 | CEA9 | E284A6 | 1 | 0 | Ω | Ω |
            | 03C9 | 2126 | CF89 | E284A6 | 1 | 0 | ω | Ω |
            | 1C80 | 0412 | E1B280 | D092 | 0 | 1 | ᲀ | В |
            | 1C80 | 0432 | E1B280 | D0B2 | 0 | 1 | ᲀ | в |
            | 0412 | 1C80 | D092 | E1B280 | 0 | 1 | В | ᲀ |
            | 0432 | 1C80 | D0B2 | E1B280 | 0 | 1 | в | ᲀ |
            | 1C81 | 0414 | E1B281 | D094 | 0 | 1 | ᲁ | Д |
            | 1C81 | 0434 | E1B281 | D0B4 | 0 | 1 | ᲁ | д |
            | 0414 | 1C81 | D094 | E1B281 | 0 | 1 | Д | ᲁ |
            | 0434 | 1C81 | D0B4 | E1B281 | 0 | 1 | д | ᲁ |
            | 1C82 | 041E | E1B282 | D09E | 0 | 1 | ᲂ | О |
            | 1C82 | 043E | E1B282 | D0BE | 0 | 1 | ᲂ | о |
            | 041E | 1C82 | D09E | E1B282 | 0 | 1 | О | ᲂ |
            | 043E | 1C82 | D0BE | E1B282 | 0 | 1 | о | ᲂ |
            | 1C83 | 0421 | E1B283 | D0A1 | 0 | 1 | ᲃ | С |
            | 1C83 | 0441 | E1B283 | D181 | 0 | 1 | ᲃ | с |
            | 0421 | 1C83 | D0A1 | E1B283 | 0 | 1 | С | ᲃ |
            | 0441 | 1C83 | D181 | E1B283 | 0 | 1 | с | ᲃ |
            | 1C84 | 0422 | E1B284 | D0A2 | 0 | 1 | ᲄ | Т |
            | 1C85 | 0422 | E1B285 | D0A2 | 0 | 1 | ᲅ | Т |
            | 1C84 | 0442 | E1B284 | D182 | 0 | 1 | ᲄ | т |
            | 1C85 | 0442 | E1B285 | D182 | 0 | 1 | ᲅ | т |
            | 0422 | 1C84 | D0A2 | E1B284 | 0 | 1 | Т | ᲄ |
            | 0442 | 1C84 | D182 | E1B284 | 0 | 1 | т | ᲄ |
            | 1C85 | 1C84 | E1B285 | E1B284 | 0 | 1 | ᲅ | ᲄ |
            | 0422 | 1C85 | D0A2 | E1B285 | 0 | 1 | Т | ᲅ |
            | 0442 | 1C85 | D182 | E1B285 | 0 | 1 | т | ᲅ |
            | 1C84 | 1C85 | E1B284 | E1B285 | 0 | 1 | ᲄ | ᲅ |
            | A64A | 1C88 | EA998A | E1B288 | 0 | 1 | Ꙋ | ᲈ |
            | A64B | 1C88 | EA998B | E1B288 | 0 | 1 | ꙋ | ᲈ |
            | 1C88 | A64A | E1B288 | EA998A | 0 | 1 | ᲈ | Ꙋ |
            | 1C88 | A64B | E1B288 | EA998B | 0 | 1 | ᲈ | ꙋ |
            | 1C86 | 042A | E1B286 | D0AA | 0 | 1 | ᲆ | Ъ |
            | 1C86 | 044A | E1B286 | D18A | 0 | 1 | ᲆ | ъ |
            | 042A | 1C86 | D0AA | E1B286 | 0 | 1 | Ъ | ᲆ |
            | 044A | 1C86 | D18A | E1B286 | 0 | 1 | ъ | ᲆ |
            | 1C87 | 0462 | E1B287 | D1A2 | 0 | 1 | ᲇ | Ѣ |
            | 1C87 | 0463 | E1B287 | D1A3 | 0 | 1 | ᲇ | ѣ |
            | 0462 | 1C87 | D1A2 | E1B287 | 0 | 1 | Ѣ | ᲇ |
            | 0463 | 1C87 | D1A3 | E1B287 | 0 | 1 | ѣ | ᲇ |
            +-----------+-----------+--------+--------+----------+----------+------+------+
            {noformat}


            Note, toupper comparison considers more pairs as equal than tolower comparison:
            {code:sql}
            SELECT SUM(eq_lower), SUM(eq_upper) FROM t21;
            {code}
            {noformat}
            +---------------+---------------+
            | SUM(eq_lower) | SUM(eq_upper) |
            +---------------+---------------+
            | 21 | 96 |
            +---------------+---------------+
            {noformat}

            h2. A more compact table with distinct pairs (the character with a smaller code point is on the left)
            {code:sql}
            SELECT
              HEX(CONVERT(a USING ucs2)) AS unicode_a,
              HEX(CONVERT(b USING ucs2)) AS unicode_b,
              HEX(a), HEX(b),
              BINARY LOWER(a)=LOWER(b) AS eq_lower,
              BINARY UPPER(a)=UPPER(b) AS eq_upper,
              a,b
            FROM
            (
              SELECT DISTINCT IF(BINARY a<b,a,b) AS a,IF(binary a>=b,a,b) AS b from t21
            ) d1
            ORDER BY eq_lower, a, b;
            {code}
            {noformat}
            +-----------+-----------+--------+--------+----------+----------+-----+-----+
            | unicode_a | unicode_b | HEX(a) | HEX(b) | eq_lower | eq_upper | a | b |
            +-----------+-----------+--------+--------+----------+----------+-----+-----+
            | 0345 | 0399 | CD85 | CE99 | 0 | 1 | ͅ | Ι |
            | 0049 | 0131 | 49 | C4B1 | 0 | 1 | I | ı |
            | 0053 | 017F | 53 | C5BF | 0 | 1 | S | ſ |
            | 0392 | 03D0 | CE92 | CF90 | 0 | 1 | Β | ϐ |
            | 0395 | 03F5 | CE95 | CFB5 | 0 | 1 | Ε | ϵ |
            | 0398 | 03D1 | CE98 | CF91 | 0 | 1 | Θ | ϑ |
            | 0399 | 1FBE | CE99 | E1BEBE | 0 | 1 | Ι | ι |
            | 039A | 03F0 | CE9A | CFB0 | 0 | 1 | Κ | ϰ |
            | 00B5 | 039C | C2B5 | CE9C | 0 | 1 | µ | Μ |
            | 03A0 | 03D6 | CEA0 | CF96 | 0 | 1 | Π | ϖ |
            | 03A1 | 03F1 | CEA1 | CFB1 | 0 | 1 | Ρ | ϱ |
            | 03A3 | 03C2 | CEA3 | CF82 | 0 | 1 | Σ | ς |
            | 03A6 | 03D5 | CEA6 | CF95 | 0 | 1 | Φ | ϕ |
            | 0412 | 1C80 | D092 | E1B280 | 0 | 1 | В | ᲀ |
            | 0414 | 1C81 | D094 | E1B281 | 0 | 1 | Д | ᲁ |
            | 041E | 1C82 | D09E | E1B282 | 0 | 1 | О | ᲂ |
            | 0421 | 1C83 | D0A1 | E1B283 | 0 | 1 | С | ᲃ |
            | 0422 | 1C84 | D0A2 | E1B284 | 0 | 1 | Т | ᲄ |
            | 1C88 | A64A | E1B288 | EA998A | 0 | 1 | ᲈ | Ꙋ |
            | 042A | 1C86 | D0AA | E1B286 | 0 | 1 | Ъ | ᲆ |
            | 0462 | 1C87 | D1A2 | E1B287 | 0 | 1 | Ѣ | ᲇ |
            | 00C5 | 212B | C385 | E284AB | 1 | 0 | Å | Å |
            | 0049 | 0130 | 49 | C4B0 | 1 | 0 | I | İ |
            | 004B | 212A | 4B | E284AA | 1 | 0 | K | K |
            | 00DF | 1E9E | C39F | E1BA9E | 1 | 0 | ß | ẞ |
            | 03A9 | 2126 | CEA9 | E284A6 | 1 | 0 | Ω | Ω |
            +-----------+-----------+--------+--------+----------+----------+-----+-----+
            26 rows in set (0.005 sec)
            {noformat}

            h2. Cons of toupper vs tolower comparison
            - The tolower variant will compare exactly like strcasecmp() did
            - The toupper variant will compare close to how utf8mb3_general_ci works
            bar Alexander Barkov made changes -
            Description There are two parallel comparison systems in MariaDB collation library, implemented as virtual functions in MY_COLLATION_HANDLER:

            - Comparison according to the collation, provided by these functions
            {code:cpp}
              int (*strnncoll)(CHARSET_INFO *,
                                   const uchar *, size_t, const uchar *, size_t, my_bool);
              int (*strnncollsp)(CHARSET_INFO *,
                                     const uchar *, size_t, const uchar *, size_t);

              int (*strnncollsp_nchars)(CHARSET_INFO *,
                                            const uchar *str1, size_t len1,
                                            const uchar *str2, size_t len2,
                                            size_t nchars,
                                            uint flags);
            {code}


            - Comparison in case insensitive (but accent sensitive) style, implemented by this function:
            {code:cpp}
              int (*strcasecmp)(CHARSET_INFO *, const char *, const char *);
            {code}
            Note, accent and case sensitivity of the collation does not matter. strcasecmp() always works using accent sensitive case insensitive comparison style.

            These two parallel systems are redundant.

            Note, strcasecmp() is used mostly to compare identifiers, while the functions of the first group are used to compare data.

            Let's get rid of the second comparison system:
            1. Remove MY_COLLATION_HANDLER::strcasecmp()
            2. Introduce a new collation utf8mb4_tolower_as_ci. Note, it should work for the entire Unicode range U+0000..U+10FFFF.
            3. Replace all calls for:
            {code:cpp}
            system_charset_info->coll->strcasecmp()
            {code}
            to calls for
            {code:cpp}
            my_charset_utf8mb4_tolower_as_ci->coll->strnncoll***()
            {code}

            The change would generally be quite mechanic. However, there is one small problem: strcasecmp() accepts 0-terminated strings, while the strnncoll-alike functions accept the pointer and the length. So some refactoring will be needed. Note, Monty earlier changed many MariaDB C data types to use LEX_CSTRING (instead of just a const char pointer) to store names. So this part of the current task will switch some more C data types to LEX_CSTRING.

            h2. tolower vs toupper comparison

            Another option is to implement utf8mb3_toupper_ci which will compare upper cases (instead of lower cases).


            The difference is in a few dozen BMP characters. This script finds all those characters:
            {code:sql}
            CREATE OR REPLACE TABLE t1 (a CHAR(1) CHARACTER SET utf8mb4 COLLATE utf8mb4_uca1400_ai_ci) ENGINE=MyISAM;
            DELIMITER $$
            FOR i IN 0..0xFFFF
            DO
              INSERT INTO t1 VALUES (CHAR(i USING ucs2));
            END FOR;
            $$
            DELIMITER ;
            ALTER TABLE t1
              ADD has_casefolding INT DEFAULT (BINARY LOWER(a)<>a OR BINARY UPPER(a)<>a),
              ADD KEY(has_casefolding, a);

            CREATE OR REPLACE TABLE t21 AS
            SELECT
              HEX(t1.a) AS hex_a, HEX(t2.a) AS hex_b,
              BINARY LOWER(t1.a)=LOWER(t2.a) eq_lower,
              BINARY UPPER(t1.a)=UPPER(t2.a) AS eq_upper,
              t1.a AS a, t2.a AS b
            FROM
              t1 t1, t1 t2
            WHERE
              t1.has_casefolding=1
            AND (BINARY LOWER(t1.a)=LOWER(t2.a))<>(BINARY UPPER(t1.a)=UPPER(t2.a));

            SELECT
              HEX(CONVERT(a USING ucs2)) AS unicode_a,
              HEX(CONVERT(b USING ucs2)) AS unicode_b,
              t21.* FROM t21;
            {code}
            {noformat}
            +-----------+-----------+--------+--------+----------+----------+------+------+
            | unicode_a | unicode_b | hex_a | hex_b | eq_lower | eq_upper | a | b |
            +-----------+-----------+--------+--------+----------+----------+------+------+
            | 1E9E | 00DF | E1BA9E | C39F | 1 | 0 | ẞ | ß |
            | 0399 | 0345 | CE99 | CD85 | 0 | 1 | Ι | ͅ |
            | 03B9 | 0345 | CEB9 | CD85 | 0 | 1 | ι | ͅ |
            | 1FBE | 0345 | E1BEBE | CD85 | 0 | 1 | ι | ͅ |
            | 212B | 00C5 | E284AB | C385 | 1 | 0 | Å | Å |
            | 212B | 00E5 | E284AB | C3A5 | 1 | 0 | Å | å |
            | 00C5 | 212B | C385 | E284AB | 1 | 0 | Å | Å |
            | 00E5 | 212B | C3A5 | E284AB | 1 | 0 | å | Å |
            | 0130 | 0049 | C4B0 | 49 | 1 | 0 | İ | I |
            | 0131 | 0049 | C4B1 | 49 | 0 | 1 | ı | I |
            | 0130 | 0069 | C4B0 | 69 | 1 | 0 | İ | i |
            | 0131 | 0069 | C4B1 | 69 | 0 | 1 | ı | i |
            | 0049 | 0130 | 49 | C4B0 | 1 | 0 | I | İ |
            | 0069 | 0130 | 69 | C4B0 | 1 | 0 | i | İ |
            | 0049 | 0131 | 49 | C4B1 | 0 | 1 | I | ı |
            | 0069 | 0131 | 69 | C4B1 | 0 | 1 | i | ı |
            | 212A | 004B | E284AA | 4B | 1 | 0 | K | K |
            | 212A | 006B | E284AA | 6B | 1 | 0 | K | k |
            | 004B | 212A | 4B | E284AA | 1 | 0 | K | K |
            | 006B | 212A | 6B | E284AA | 1 | 0 | k | K |
            | 017F | 0053 | C5BF | 53 | 0 | 1 | ſ | S |
            | 017F | 0073 | C5BF | 73 | 0 | 1 | ſ | s |
            | 0053 | 017F | 53 | C5BF | 0 | 1 | S | ſ |
            | 0073 | 017F | 73 | C5BF | 0 | 1 | s | ſ |
            | 1E9B | 1E60 | E1BA9B | E1B9A0 | 0 | 1 | ẛ | Ṡ |
            | 1E9B | 1E61 | E1BA9B | E1B9A1 | 0 | 1 | ẛ | ṡ |
            | 1E60 | 1E9B | E1B9A0 | E1BA9B | 0 | 1 | Ṡ | ẛ |
            | 1E61 | 1E9B | E1B9A1 | E1BA9B | 0 | 1 | ṡ | ẛ |
            | 03D0 | 0392 | CF90 | CE92 | 0 | 1 | ϐ | Β |
            | 03D0 | 03B2 | CF90 | CEB2 | 0 | 1 | ϐ | β |
            | 0392 | 03D0 | CE92 | CF90 | 0 | 1 | Β | ϐ |
            | 03B2 | 03D0 | CEB2 | CF90 | 0 | 1 | β | ϐ |
            | 03F5 | 0395 | CFB5 | CE95 | 0 | 1 | ϵ | Ε |
            | 03F5 | 03B5 | CFB5 | CEB5 | 0 | 1 | ϵ | ε |
            | 0395 | 03F5 | CE95 | CFB5 | 0 | 1 | Ε | ϵ |
            | 03B5 | 03F5 | CEB5 | CFB5 | 0 | 1 | ε | ϵ |
            | 03D1 | 0398 | CF91 | CE98 | 0 | 1 | ϑ | Θ |
            | 03F4 | 0398 | CFB4 | CE98 | 1 | 0 | ϴ | Θ |
            | 03D1 | 03B8 | CF91 | CEB8 | 0 | 1 | ϑ | θ |
            | 03F4 | 03B8 | CFB4 | CEB8 | 1 | 0 | ϴ | θ |
            | 0398 | 03D1 | CE98 | CF91 | 0 | 1 | Θ | ϑ |
            | 03B8 | 03D1 | CEB8 | CF91 | 0 | 1 | θ | ϑ |
            | 0398 | 03F4 | CE98 | CFB4 | 1 | 0 | Θ | ϴ |
            | 03B8 | 03F4 | CEB8 | CFB4 | 1 | 0 | θ | ϴ |
            | 0345 | 0399 | CD85 | CE99 | 0 | 1 | ͅ | Ι |
            | 1FBE | 0399 | E1BEBE | CE99 | 0 | 1 | ι | Ι |
            | 0345 | 03B9 | CD85 | CEB9 | 0 | 1 | ͅ | ι |
            | 1FBE | 03B9 | E1BEBE | CEB9 | 0 | 1 | ι | ι |
            | 0345 | 1FBE | CD85 | E1BEBE | 0 | 1 | ͅ | ι |
            | 0399 | 1FBE | CE99 | E1BEBE | 0 | 1 | Ι | ι |
            | 03B9 | 1FBE | CEB9 | E1BEBE | 0 | 1 | ι | ι |
            | 03F0 | 039A | CFB0 | CE9A | 0 | 1 | ϰ | Κ |
            | 03F0 | 03BA | CFB0 | CEBA | 0 | 1 | ϰ | κ |
            | 039A | 03F0 | CE9A | CFB0 | 0 | 1 | Κ | ϰ |
            | 03BA | 03F0 | CEBA | CFB0 | 0 | 1 | κ | ϰ |
            | 039C | 00B5 | CE9C | C2B5 | 0 | 1 | Μ | µ |
            | 03BC | 00B5 | CEBC | C2B5 | 0 | 1 | μ | µ |
            | 00B5 | 039C | C2B5 | CE9C | 0 | 1 | µ | Μ |
            | 00B5 | 03BC | C2B5 | CEBC | 0 | 1 | µ | μ |
            | 03D6 | 03A0 | CF96 | CEA0 | 0 | 1 | ϖ | Π |
            | 03D6 | 03C0 | CF96 | CF80 | 0 | 1 | ϖ | π |
            | 03A0 | 03D6 | CEA0 | CF96 | 0 | 1 | Π | ϖ |
            | 03C0 | 03D6 | CF80 | CF96 | 0 | 1 | π | ϖ |
            | 03F1 | 03A1 | CFB1 | CEA1 | 0 | 1 | ϱ | Ρ |
            | 03F1 | 03C1 | CFB1 | CF81 | 0 | 1 | ϱ | ρ |
            | 03A1 | 03F1 | CEA1 | CFB1 | 0 | 1 | Ρ | ϱ |
            | 03C1 | 03F1 | CF81 | CFB1 | 0 | 1 | ρ | ϱ |
            | 03C2 | 03A3 | CF82 | CEA3 | 0 | 1 | ς | Σ |
            | 03A3 | 03C2 | CEA3 | CF82 | 0 | 1 | Σ | ς |
            | 03C3 | 03C2 | CF83 | CF82 | 0 | 1 | σ | ς |
            | 03C2 | 03C3 | CF82 | CF83 | 0 | 1 | ς | σ |
            | 03D5 | 03A6 | CF95 | CEA6 | 0 | 1 | ϕ | Φ |
            | 03D5 | 03C6 | CF95 | CF86 | 0 | 1 | ϕ | φ |
            | 03A6 | 03D5 | CEA6 | CF95 | 0 | 1 | Φ | ϕ |
            | 03C6 | 03D5 | CF86 | CF95 | 0 | 1 | φ | ϕ |
            | 2126 | 03A9 | E284A6 | CEA9 | 1 | 0 | Ω | Ω |
            | 2126 | 03C9 | E284A6 | CF89 | 1 | 0 | Ω | ω |
            | 03A9 | 2126 | CEA9 | E284A6 | 1 | 0 | Ω | Ω |
            | 03C9 | 2126 | CF89 | E284A6 | 1 | 0 | ω | Ω |
            | 1C80 | 0412 | E1B280 | D092 | 0 | 1 | ᲀ | В |
            | 1C80 | 0432 | E1B280 | D0B2 | 0 | 1 | ᲀ | в |
            | 0412 | 1C80 | D092 | E1B280 | 0 | 1 | В | ᲀ |
            | 0432 | 1C80 | D0B2 | E1B280 | 0 | 1 | в | ᲀ |
            | 1C81 | 0414 | E1B281 | D094 | 0 | 1 | ᲁ | Д |
            | 1C81 | 0434 | E1B281 | D0B4 | 0 | 1 | ᲁ | д |
            | 0414 | 1C81 | D094 | E1B281 | 0 | 1 | Д | ᲁ |
            | 0434 | 1C81 | D0B4 | E1B281 | 0 | 1 | д | ᲁ |
            | 1C82 | 041E | E1B282 | D09E | 0 | 1 | ᲂ | О |
            | 1C82 | 043E | E1B282 | D0BE | 0 | 1 | ᲂ | о |
            | 041E | 1C82 | D09E | E1B282 | 0 | 1 | О | ᲂ |
            | 043E | 1C82 | D0BE | E1B282 | 0 | 1 | о | ᲂ |
            | 1C83 | 0421 | E1B283 | D0A1 | 0 | 1 | ᲃ | С |
            | 1C83 | 0441 | E1B283 | D181 | 0 | 1 | ᲃ | с |
            | 0421 | 1C83 | D0A1 | E1B283 | 0 | 1 | С | ᲃ |
            | 0441 | 1C83 | D181 | E1B283 | 0 | 1 | с | ᲃ |
            | 1C84 | 0422 | E1B284 | D0A2 | 0 | 1 | ᲄ | Т |
            | 1C85 | 0422 | E1B285 | D0A2 | 0 | 1 | ᲅ | Т |
            | 1C84 | 0442 | E1B284 | D182 | 0 | 1 | ᲄ | т |
            | 1C85 | 0442 | E1B285 | D182 | 0 | 1 | ᲅ | т |
            | 0422 | 1C84 | D0A2 | E1B284 | 0 | 1 | Т | ᲄ |
            | 0442 | 1C84 | D182 | E1B284 | 0 | 1 | т | ᲄ |
            | 1C85 | 1C84 | E1B285 | E1B284 | 0 | 1 | ᲅ | ᲄ |
            | 0422 | 1C85 | D0A2 | E1B285 | 0 | 1 | Т | ᲅ |
            | 0442 | 1C85 | D182 | E1B285 | 0 | 1 | т | ᲅ |
            | 1C84 | 1C85 | E1B284 | E1B285 | 0 | 1 | ᲄ | ᲅ |
            | A64A | 1C88 | EA998A | E1B288 | 0 | 1 | Ꙋ | ᲈ |
            | A64B | 1C88 | EA998B | E1B288 | 0 | 1 | ꙋ | ᲈ |
            | 1C88 | A64A | E1B288 | EA998A | 0 | 1 | ᲈ | Ꙋ |
            | 1C88 | A64B | E1B288 | EA998B | 0 | 1 | ᲈ | ꙋ |
            | 1C86 | 042A | E1B286 | D0AA | 0 | 1 | ᲆ | Ъ |
            | 1C86 | 044A | E1B286 | D18A | 0 | 1 | ᲆ | ъ |
            | 042A | 1C86 | D0AA | E1B286 | 0 | 1 | Ъ | ᲆ |
            | 044A | 1C86 | D18A | E1B286 | 0 | 1 | ъ | ᲆ |
            | 1C87 | 0462 | E1B287 | D1A2 | 0 | 1 | ᲇ | Ѣ |
            | 1C87 | 0463 | E1B287 | D1A3 | 0 | 1 | ᲇ | ѣ |
            | 0462 | 1C87 | D1A2 | E1B287 | 0 | 1 | Ѣ | ᲇ |
            | 0463 | 1C87 | D1A3 | E1B287 | 0 | 1 | ѣ | ᲇ |
            +-----------+-----------+--------+--------+----------+----------+------+------+
            {noformat}


            Note, toupper comparison considers more pairs as equal than tolower comparison:
            {code:sql}
            SELECT SUM(eq_lower), SUM(eq_upper) FROM t21;
            {code}
            {noformat}
            +---------------+---------------+
            | SUM(eq_lower) | SUM(eq_upper) |
            +---------------+---------------+
            | 21 | 96 |
            +---------------+---------------+
            {noformat}

            h2. A more compact table with distinct pairs (the character with a smaller code point is on the left)
            {code:sql}
            SELECT
              HEX(CONVERT(a USING ucs2)) AS unicode_a,
              HEX(CONVERT(b USING ucs2)) AS unicode_b,
              HEX(a), HEX(b),
              BINARY LOWER(a)=LOWER(b) AS eq_lower,
              BINARY UPPER(a)=UPPER(b) AS eq_upper,
              a,b
            FROM
            (
              SELECT DISTINCT IF(BINARY a<b,a,b) AS a,IF(binary a>=b,a,b) AS b from t21
            ) d1
            ORDER BY eq_lower, a, b;
            {code}
            {noformat}
            +-----------+-----------+--------+--------+----------+----------+-----+-----+
            | unicode_a | unicode_b | HEX(a) | HEX(b) | eq_lower | eq_upper | a | b |
            +-----------+-----------+--------+--------+----------+----------+-----+-----+
            | 0345 | 0399 | CD85 | CE99 | 0 | 1 | ͅ | Ι |
            | 0049 | 0131 | 49 | C4B1 | 0 | 1 | I | ı |
            | 0053 | 017F | 53 | C5BF | 0 | 1 | S | ſ |
            | 0392 | 03D0 | CE92 | CF90 | 0 | 1 | Β | ϐ |
            | 0395 | 03F5 | CE95 | CFB5 | 0 | 1 | Ε | ϵ |
            | 0398 | 03D1 | CE98 | CF91 | 0 | 1 | Θ | ϑ |
            | 0399 | 1FBE | CE99 | E1BEBE | 0 | 1 | Ι | ι |
            | 039A | 03F0 | CE9A | CFB0 | 0 | 1 | Κ | ϰ |
            | 00B5 | 039C | C2B5 | CE9C | 0 | 1 | µ | Μ |
            | 03A0 | 03D6 | CEA0 | CF96 | 0 | 1 | Π | ϖ |
            | 03A1 | 03F1 | CEA1 | CFB1 | 0 | 1 | Ρ | ϱ |
            | 03A3 | 03C2 | CEA3 | CF82 | 0 | 1 | Σ | ς |
            | 03A6 | 03D5 | CEA6 | CF95 | 0 | 1 | Φ | ϕ |
            | 0412 | 1C80 | D092 | E1B280 | 0 | 1 | В | ᲀ |
            | 0414 | 1C81 | D094 | E1B281 | 0 | 1 | Д | ᲁ |
            | 041E | 1C82 | D09E | E1B282 | 0 | 1 | О | ᲂ |
            | 0421 | 1C83 | D0A1 | E1B283 | 0 | 1 | С | ᲃ |
            | 0422 | 1C84 | D0A2 | E1B284 | 0 | 1 | Т | ᲄ |
            | 1C88 | A64A | E1B288 | EA998A | 0 | 1 | ᲈ | Ꙋ |
            | 042A | 1C86 | D0AA | E1B286 | 0 | 1 | Ъ | ᲆ |
            | 0462 | 1C87 | D1A2 | E1B287 | 0 | 1 | Ѣ | ᲇ |
            | 00C5 | 212B | C385 | E284AB | 1 | 0 | Å | Å |
            | 0049 | 0130 | 49 | C4B0 | 1 | 0 | I | İ |
            | 004B | 212A | 4B | E284AA | 1 | 0 | K | K |
            | 00DF | 1E9E | C39F | E1BA9E | 1 | 0 | ß | ẞ |
            | 03A9 | 2126 | CEA9 | E284A6 | 1 | 0 | Ω | Ω |
            +-----------+-----------+--------+--------+----------+----------+-----+-----+
            26 rows in set (0.005 sec)
            {noformat}

            h2. Cons of toupper vs tolower comparison
            - The tolower variant will compare exactly like strcasecmp() did
            - The toupper variant will compare close to how utf8mb3_general_ci works
            There are two parallel comparison systems in MariaDB collation library, implemented as virtual functions in MY_COLLATION_HANDLER:

            - Comparison according to the collation, provided by these functions
            {code:cpp}
              int (*strnncoll)(CHARSET_INFO *,
                                   const uchar *, size_t, const uchar *, size_t, my_bool);
              int (*strnncollsp)(CHARSET_INFO *,
                                     const uchar *, size_t, const uchar *, size_t);

              int (*strnncollsp_nchars)(CHARSET_INFO *,
                                            const uchar *str1, size_t len1,
                                            const uchar *str2, size_t len2,
                                            size_t nchars,
                                            uint flags);
            {code}


            - Comparison in case insensitive (but accent sensitive) style, implemented by this function:
            {code:cpp}
              int (*strcasecmp)(CHARSET_INFO *, const char *, const char *);
            {code}
            Note, accent and case sensitivity of the collation does not matter. strcasecmp() always works using accent sensitive case insensitive comparison style.

            These two parallel systems are redundant.

            Note, strcasecmp() is used mostly to compare identifiers, while the functions of the first group are used to compare data.

            Let's get rid of the second comparison system:
            1. Remove MY_COLLATION_HANDLER::strcasecmp()
            2. Introduce a new collation utf8mb4_tolower_as_ci. Note, it should work for the entire Unicode range U+0000..U+10FFFF.
            3. Replace all calls for:
            {code:cpp}
            system_charset_info->coll->strcasecmp()
            {code}
            to calls for
            {code:cpp}
            my_charset_utf8mb4_tolower_as_ci->coll->strnncoll***()
            {code}

            The change would generally be quite mechanic. However, there is one small problem: strcasecmp() accepts 0-terminated strings, while the strnncoll-alike functions accept the pointer and the length. So some refactoring will be needed. Note, Monty earlier changed many MariaDB C data types to use LEX_CSTRING (instead of just a const char pointer) to store names. So this part of the current task will switch some more C data types to LEX_CSTRING.

            h2. tolower vs toupper comparison

            Another option is to implement utf8mb3_toupper_ci which will compare upper cases (instead of lower cases).


            The difference is in a few dozen BMP characters. This script finds all those characters:
            {code:sql}
            CREATE OR REPLACE TABLE t1 (a CHAR(1) CHARACTER SET utf8mb4 COLLATE utf8mb4_uca1400_ai_ci) ENGINE=MyISAM;
            DELIMITER $$
            FOR i IN 0..0xFFFF
            DO
              INSERT INTO t1 VALUES (CHAR(i USING ucs2));
            END FOR;
            $$
            DELIMITER ;
            ALTER TABLE t1
              ADD has_casefolding INT DEFAULT (BINARY LOWER(a)<>a OR BINARY UPPER(a)<>a),
              ADD KEY(has_casefolding, a);

            CREATE OR REPLACE TABLE t21 AS
            SELECT
              HEX(t1.a) AS hex_a, HEX(t2.a) AS hex_b,
              BINARY LOWER(t1.a)=LOWER(t2.a) eq_lower,
              BINARY UPPER(t1.a)=UPPER(t2.a) AS eq_upper,
              t1.a AS a, t2.a AS b
            FROM
              t1 t1, t1 t2
            WHERE
              t1.has_casefolding=1
            AND (BINARY LOWER(t1.a)=LOWER(t2.a))<>(BINARY UPPER(t1.a)=UPPER(t2.a));

            SELECT
              HEX(CONVERT(a USING ucs2)) AS unicode_a,
              HEX(CONVERT(b USING ucs2)) AS unicode_b,
              t21.* FROM t21;
            {code}
            {noformat}
            +-----------+-----------+--------+--------+----------+----------+------+------+
            | unicode_a | unicode_b | hex_a | hex_b | eq_lower | eq_upper | a | b |
            +-----------+-----------+--------+--------+----------+----------+------+------+
            | 1E9E | 00DF | E1BA9E | C39F | 1 | 0 | ẞ | ß |
            | 0399 | 0345 | CE99 | CD85 | 0 | 1 | Ι | ͅ |
            | 03B9 | 0345 | CEB9 | CD85 | 0 | 1 | ι | ͅ |
            | 1FBE | 0345 | E1BEBE | CD85 | 0 | 1 | ι | ͅ |
            | 212B | 00C5 | E284AB | C385 | 1 | 0 | Å | Å |
            | 212B | 00E5 | E284AB | C3A5 | 1 | 0 | Å | å |
            | 00C5 | 212B | C385 | E284AB | 1 | 0 | Å | Å |
            | 00E5 | 212B | C3A5 | E284AB | 1 | 0 | å | Å |
            | 0130 | 0049 | C4B0 | 49 | 1 | 0 | İ | I |
            | 0131 | 0049 | C4B1 | 49 | 0 | 1 | ı | I |
            | 0130 | 0069 | C4B0 | 69 | 1 | 0 | İ | i |
            | 0131 | 0069 | C4B1 | 69 | 0 | 1 | ı | i |
            | 0049 | 0130 | 49 | C4B0 | 1 | 0 | I | İ |
            | 0069 | 0130 | 69 | C4B0 | 1 | 0 | i | İ |
            | 0049 | 0131 | 49 | C4B1 | 0 | 1 | I | ı |
            | 0069 | 0131 | 69 | C4B1 | 0 | 1 | i | ı |
            | 212A | 004B | E284AA | 4B | 1 | 0 | K | K |
            | 212A | 006B | E284AA | 6B | 1 | 0 | K | k |
            | 004B | 212A | 4B | E284AA | 1 | 0 | K | K |
            | 006B | 212A | 6B | E284AA | 1 | 0 | k | K |
            | 017F | 0053 | C5BF | 53 | 0 | 1 | ſ | S |
            | 017F | 0073 | C5BF | 73 | 0 | 1 | ſ | s |
            | 0053 | 017F | 53 | C5BF | 0 | 1 | S | ſ |
            | 0073 | 017F | 73 | C5BF | 0 | 1 | s | ſ |
            | 1E9B | 1E60 | E1BA9B | E1B9A0 | 0 | 1 | ẛ | Ṡ |
            | 1E9B | 1E61 | E1BA9B | E1B9A1 | 0 | 1 | ẛ | ṡ |
            | 1E60 | 1E9B | E1B9A0 | E1BA9B | 0 | 1 | Ṡ | ẛ |
            | 1E61 | 1E9B | E1B9A1 | E1BA9B | 0 | 1 | ṡ | ẛ |
            | 03D0 | 0392 | CF90 | CE92 | 0 | 1 | ϐ | Β |
            | 03D0 | 03B2 | CF90 | CEB2 | 0 | 1 | ϐ | β |
            | 0392 | 03D0 | CE92 | CF90 | 0 | 1 | Β | ϐ |
            | 03B2 | 03D0 | CEB2 | CF90 | 0 | 1 | β | ϐ |
            | 03F5 | 0395 | CFB5 | CE95 | 0 | 1 | ϵ | Ε |
            | 03F5 | 03B5 | CFB5 | CEB5 | 0 | 1 | ϵ | ε |
            | 0395 | 03F5 | CE95 | CFB5 | 0 | 1 | Ε | ϵ |
            | 03B5 | 03F5 | CEB5 | CFB5 | 0 | 1 | ε | ϵ |
            | 03D1 | 0398 | CF91 | CE98 | 0 | 1 | ϑ | Θ |
            | 03F4 | 0398 | CFB4 | CE98 | 1 | 0 | ϴ | Θ |
            | 03D1 | 03B8 | CF91 | CEB8 | 0 | 1 | ϑ | θ |
            | 03F4 | 03B8 | CFB4 | CEB8 | 1 | 0 | ϴ | θ |
            | 0398 | 03D1 | CE98 | CF91 | 0 | 1 | Θ | ϑ |
            | 03B8 | 03D1 | CEB8 | CF91 | 0 | 1 | θ | ϑ |
            | 0398 | 03F4 | CE98 | CFB4 | 1 | 0 | Θ | ϴ |
            | 03B8 | 03F4 | CEB8 | CFB4 | 1 | 0 | θ | ϴ |
            | 0345 | 0399 | CD85 | CE99 | 0 | 1 | ͅ | Ι |
            | 1FBE | 0399 | E1BEBE | CE99 | 0 | 1 | ι | Ι |
            | 0345 | 03B9 | CD85 | CEB9 | 0 | 1 | ͅ | ι |
            | 1FBE | 03B9 | E1BEBE | CEB9 | 0 | 1 | ι | ι |
            | 0345 | 1FBE | CD85 | E1BEBE | 0 | 1 | ͅ | ι |
            | 0399 | 1FBE | CE99 | E1BEBE | 0 | 1 | Ι | ι |
            | 03B9 | 1FBE | CEB9 | E1BEBE | 0 | 1 | ι | ι |
            | 03F0 | 039A | CFB0 | CE9A | 0 | 1 | ϰ | Κ |
            | 03F0 | 03BA | CFB0 | CEBA | 0 | 1 | ϰ | κ |
            | 039A | 03F0 | CE9A | CFB0 | 0 | 1 | Κ | ϰ |
            | 03BA | 03F0 | CEBA | CFB0 | 0 | 1 | κ | ϰ |
            | 039C | 00B5 | CE9C | C2B5 | 0 | 1 | Μ | µ |
            | 03BC | 00B5 | CEBC | C2B5 | 0 | 1 | μ | µ |
            | 00B5 | 039C | C2B5 | CE9C | 0 | 1 | µ | Μ |
            | 00B5 | 03BC | C2B5 | CEBC | 0 | 1 | µ | μ |
            | 03D6 | 03A0 | CF96 | CEA0 | 0 | 1 | ϖ | Π |
            | 03D6 | 03C0 | CF96 | CF80 | 0 | 1 | ϖ | π |
            | 03A0 | 03D6 | CEA0 | CF96 | 0 | 1 | Π | ϖ |
            | 03C0 | 03D6 | CF80 | CF96 | 0 | 1 | π | ϖ |
            | 03F1 | 03A1 | CFB1 | CEA1 | 0 | 1 | ϱ | Ρ |
            | 03F1 | 03C1 | CFB1 | CF81 | 0 | 1 | ϱ | ρ |
            | 03A1 | 03F1 | CEA1 | CFB1 | 0 | 1 | Ρ | ϱ |
            | 03C1 | 03F1 | CF81 | CFB1 | 0 | 1 | ρ | ϱ |
            | 03C2 | 03A3 | CF82 | CEA3 | 0 | 1 | ς | Σ |
            | 03A3 | 03C2 | CEA3 | CF82 | 0 | 1 | Σ | ς |
            | 03C3 | 03C2 | CF83 | CF82 | 0 | 1 | σ | ς |
            | 03C2 | 03C3 | CF82 | CF83 | 0 | 1 | ς | σ |
            | 03D5 | 03A6 | CF95 | CEA6 | 0 | 1 | ϕ | Φ |
            | 03D5 | 03C6 | CF95 | CF86 | 0 | 1 | ϕ | φ |
            | 03A6 | 03D5 | CEA6 | CF95 | 0 | 1 | Φ | ϕ |
            | 03C6 | 03D5 | CF86 | CF95 | 0 | 1 | φ | ϕ |
            | 2126 | 03A9 | E284A6 | CEA9 | 1 | 0 | Ω | Ω |
            | 2126 | 03C9 | E284A6 | CF89 | 1 | 0 | Ω | ω |
            | 03A9 | 2126 | CEA9 | E284A6 | 1 | 0 | Ω | Ω |
            | 03C9 | 2126 | CF89 | E284A6 | 1 | 0 | ω | Ω |
            | 1C80 | 0412 | E1B280 | D092 | 0 | 1 | ᲀ | В |
            | 1C80 | 0432 | E1B280 | D0B2 | 0 | 1 | ᲀ | в |
            | 0412 | 1C80 | D092 | E1B280 | 0 | 1 | В | ᲀ |
            | 0432 | 1C80 | D0B2 | E1B280 | 0 | 1 | в | ᲀ |
            | 1C81 | 0414 | E1B281 | D094 | 0 | 1 | ᲁ | Д |
            | 1C81 | 0434 | E1B281 | D0B4 | 0 | 1 | ᲁ | д |
            | 0414 | 1C81 | D094 | E1B281 | 0 | 1 | Д | ᲁ |
            | 0434 | 1C81 | D0B4 | E1B281 | 0 | 1 | д | ᲁ |
            | 1C82 | 041E | E1B282 | D09E | 0 | 1 | ᲂ | О |
            | 1C82 | 043E | E1B282 | D0BE | 0 | 1 | ᲂ | о |
            | 041E | 1C82 | D09E | E1B282 | 0 | 1 | О | ᲂ |
            | 043E | 1C82 | D0BE | E1B282 | 0 | 1 | о | ᲂ |
            | 1C83 | 0421 | E1B283 | D0A1 | 0 | 1 | ᲃ | С |
            | 1C83 | 0441 | E1B283 | D181 | 0 | 1 | ᲃ | с |
            | 0421 | 1C83 | D0A1 | E1B283 | 0 | 1 | С | ᲃ |
            | 0441 | 1C83 | D181 | E1B283 | 0 | 1 | с | ᲃ |
            | 1C84 | 0422 | E1B284 | D0A2 | 0 | 1 | ᲄ | Т |
            | 1C85 | 0422 | E1B285 | D0A2 | 0 | 1 | ᲅ | Т |
            | 1C84 | 0442 | E1B284 | D182 | 0 | 1 | ᲄ | т |
            | 1C85 | 0442 | E1B285 | D182 | 0 | 1 | ᲅ | т |
            | 0422 | 1C84 | D0A2 | E1B284 | 0 | 1 | Т | ᲄ |
            | 0442 | 1C84 | D182 | E1B284 | 0 | 1 | т | ᲄ |
            | 1C85 | 1C84 | E1B285 | E1B284 | 0 | 1 | ᲅ | ᲄ |
            | 0422 | 1C85 | D0A2 | E1B285 | 0 | 1 | Т | ᲅ |
            | 0442 | 1C85 | D182 | E1B285 | 0 | 1 | т | ᲅ |
            | 1C84 | 1C85 | E1B284 | E1B285 | 0 | 1 | ᲄ | ᲅ |
            | A64A | 1C88 | EA998A | E1B288 | 0 | 1 | Ꙋ | ᲈ |
            | A64B | 1C88 | EA998B | E1B288 | 0 | 1 | ꙋ | ᲈ |
            | 1C88 | A64A | E1B288 | EA998A | 0 | 1 | ᲈ | Ꙋ |
            | 1C88 | A64B | E1B288 | EA998B | 0 | 1 | ᲈ | ꙋ |
            | 1C86 | 042A | E1B286 | D0AA | 0 | 1 | ᲆ | Ъ |
            | 1C86 | 044A | E1B286 | D18A | 0 | 1 | ᲆ | ъ |
            | 042A | 1C86 | D0AA | E1B286 | 0 | 1 | Ъ | ᲆ |
            | 044A | 1C86 | D18A | E1B286 | 0 | 1 | ъ | ᲆ |
            | 1C87 | 0462 | E1B287 | D1A2 | 0 | 1 | ᲇ | Ѣ |
            | 1C87 | 0463 | E1B287 | D1A3 | 0 | 1 | ᲇ | ѣ |
            | 0462 | 1C87 | D1A2 | E1B287 | 0 | 1 | Ѣ | ᲇ |
            | 0463 | 1C87 | D1A3 | E1B287 | 0 | 1 | ѣ | ᲇ |
            +-----------+-----------+--------+--------+----------+----------+------+------+
            {noformat}


            Note, toupper comparison considers more pairs as equal than tolower comparison:
            {code:sql}
            SELECT SUM(eq_lower), SUM(eq_upper) FROM t21;
            {code}
            {noformat}
            +---------------+---------------+
            | SUM(eq_lower) | SUM(eq_upper) |
            +---------------+---------------+
            | 21 | 96 |
            +---------------+---------------+
            {noformat}

            h2. A more compact table with distinct pairs (the character with a smaller code point is on the left)
            {code:sql}
            SELECT
              HEX(CONVERT(a USING ucs2)) AS unicode_a,
              HEX(CONVERT(b USING ucs2)) AS unicode_b,
              HEX(a), HEX(b),
              BINARY LOWER(a)=LOWER(b) AS eq_lower,
              BINARY UPPER(a)=UPPER(b) AS eq_upper,
              a,b
            FROM
            (
              SELECT DISTINCT IF(BINARY a<b,a,b) AS a,IF(binary a>=b,a,b) AS b from t21
            ) d1
            ORDER BY eq_lower, unicode_a, unicode_b;
            {code}
            {noformat}
            +-----------+-----------+--------+--------+----------+----------+-----+-----+
            | unicode_a | unicode_b | HEX(a) | HEX(b) | eq_lower | eq_upper | a | b |
            +-----------+-----------+--------+--------+----------+----------+-----+-----+
            | 0049 | 0131 | 49 | C4B1 | 0 | 1 | I | ı |
            | 0053 | 017F | 53 | C5BF | 0 | 1 | S | ſ |
            | 00B5 | 039C | C2B5 | CE9C | 0 | 1 | µ | Μ |
            | 0345 | 0399 | CD85 | CE99 | 0 | 1 | ͅ | Ι |
            | 0392 | 03D0 | CE92 | CF90 | 0 | 1 | Β | ϐ |
            | 0395 | 03F5 | CE95 | CFB5 | 0 | 1 | Ε | ϵ |
            | 0398 | 03D1 | CE98 | CF91 | 0 | 1 | Θ | ϑ |
            | 0399 | 1FBE | CE99 | E1BEBE | 0 | 1 | Ι | ι |
            | 039A | 03F0 | CE9A | CFB0 | 0 | 1 | Κ | ϰ |
            | 03A0 | 03D6 | CEA0 | CF96 | 0 | 1 | Π | ϖ |
            | 03A1 | 03F1 | CEA1 | CFB1 | 0 | 1 | Ρ | ϱ |
            | 03A3 | 03C2 | CEA3 | CF82 | 0 | 1 | Σ | ς |
            | 03A6 | 03D5 | CEA6 | CF95 | 0 | 1 | Φ | ϕ |
            | 0412 | 1C80 | D092 | E1B280 | 0 | 1 | В | ᲀ |
            | 0414 | 1C81 | D094 | E1B281 | 0 | 1 | Д | ᲁ |
            | 041E | 1C82 | D09E | E1B282 | 0 | 1 | О | ᲂ |
            | 0421 | 1C83 | D0A1 | E1B283 | 0 | 1 | С | ᲃ |
            | 0422 | 1C84 | D0A2 | E1B284 | 0 | 1 | Т | ᲄ |
            | 042A | 1C86 | D0AA | E1B286 | 0 | 1 | Ъ | ᲆ |
            | 0462 | 1C87 | D1A2 | E1B287 | 0 | 1 | Ѣ | ᲇ |
            | 1C88 | A64A | E1B288 | EA998A | 0 | 1 | ᲈ | Ꙋ |
            | 0049 | 0130 | 49 | C4B0 | 1 | 0 | I | İ |
            | 004B | 212A | 4B | E284AA | 1 | 0 | K | K |
            | 00C5 | 212B | C385 | E284AB | 1 | 0 | Å | Å |
            | 00DF | 1E9E | C39F | E1BA9E | 1 | 0 | ß | ẞ |
            | 03A9 | 2126 | CEA9 | E284A6 | 1 | 0 | Ω | Ω |
            +-----------+-----------+--------+--------+----------+----------+-----+-----+
            26 rows in set (0.005 sec)
            {noformat}

            h2. Cons of toupper vs tolower comparison
            - The tolower variant will compare exactly like strcasecmp() did
            - The toupper variant will compare close to how utf8mb3_general_ci works
            serg Sergei Golubchik made changes -
            Assignee Sergei Golubchik [ serg ] Alexander Barkov [ bar ]
            Status In Review [ 10002 ] Stalled [ 10000 ]
            bar Alexander Barkov made changes -
            bar Alexander Barkov made changes -
            Description There are two parallel comparison systems in MariaDB collation library, implemented as virtual functions in MY_COLLATION_HANDLER:

            - Comparison according to the collation, provided by these functions
            {code:cpp}
              int (*strnncoll)(CHARSET_INFO *,
                                   const uchar *, size_t, const uchar *, size_t, my_bool);
              int (*strnncollsp)(CHARSET_INFO *,
                                     const uchar *, size_t, const uchar *, size_t);

              int (*strnncollsp_nchars)(CHARSET_INFO *,
                                            const uchar *str1, size_t len1,
                                            const uchar *str2, size_t len2,
                                            size_t nchars,
                                            uint flags);
            {code}


            - Comparison in case insensitive (but accent sensitive) style, implemented by this function:
            {code:cpp}
              int (*strcasecmp)(CHARSET_INFO *, const char *, const char *);
            {code}
            Note, accent and case sensitivity of the collation does not matter. strcasecmp() always works using accent sensitive case insensitive comparison style.

            These two parallel systems are redundant.

            Note, strcasecmp() is used mostly to compare identifiers, while the functions of the first group are used to compare data.

            Let's get rid of the second comparison system:
            1. Remove MY_COLLATION_HANDLER::strcasecmp()
            2. Introduce a new collation utf8mb4_tolower_as_ci. Note, it should work for the entire Unicode range U+0000..U+10FFFF.
            3. Replace all calls for:
            {code:cpp}
            system_charset_info->coll->strcasecmp()
            {code}
            to calls for
            {code:cpp}
            my_charset_utf8mb4_tolower_as_ci->coll->strnncoll***()
            {code}

            The change would generally be quite mechanic. However, there is one small problem: strcasecmp() accepts 0-terminated strings, while the strnncoll-alike functions accept the pointer and the length. So some refactoring will be needed. Note, Monty earlier changed many MariaDB C data types to use LEX_CSTRING (instead of just a const char pointer) to store names. So this part of the current task will switch some more C data types to LEX_CSTRING.

            h2. tolower vs toupper comparison

            Another option is to implement utf8mb3_toupper_ci which will compare upper cases (instead of lower cases).


            The difference is in a few dozen BMP characters. This script finds all those characters:
            {code:sql}
            CREATE OR REPLACE TABLE t1 (a CHAR(1) CHARACTER SET utf8mb4 COLLATE utf8mb4_uca1400_ai_ci) ENGINE=MyISAM;
            DELIMITER $$
            FOR i IN 0..0xFFFF
            DO
              INSERT INTO t1 VALUES (CHAR(i USING ucs2));
            END FOR;
            $$
            DELIMITER ;
            ALTER TABLE t1
              ADD has_casefolding INT DEFAULT (BINARY LOWER(a)<>a OR BINARY UPPER(a)<>a),
              ADD KEY(has_casefolding, a);

            CREATE OR REPLACE TABLE t21 AS
            SELECT
              HEX(t1.a) AS hex_a, HEX(t2.a) AS hex_b,
              BINARY LOWER(t1.a)=LOWER(t2.a) eq_lower,
              BINARY UPPER(t1.a)=UPPER(t2.a) AS eq_upper,
              t1.a AS a, t2.a AS b
            FROM
              t1 t1, t1 t2
            WHERE
              t1.has_casefolding=1
            AND (BINARY LOWER(t1.a)=LOWER(t2.a))<>(BINARY UPPER(t1.a)=UPPER(t2.a));

            SELECT
              HEX(CONVERT(a USING ucs2)) AS unicode_a,
              HEX(CONVERT(b USING ucs2)) AS unicode_b,
              t21.* FROM t21;
            {code}
            {noformat}
            +-----------+-----------+--------+--------+----------+----------+------+------+
            | unicode_a | unicode_b | hex_a | hex_b | eq_lower | eq_upper | a | b |
            +-----------+-----------+--------+--------+----------+----------+------+------+
            | 1E9E | 00DF | E1BA9E | C39F | 1 | 0 | ẞ | ß |
            | 0399 | 0345 | CE99 | CD85 | 0 | 1 | Ι | ͅ |
            | 03B9 | 0345 | CEB9 | CD85 | 0 | 1 | ι | ͅ |
            | 1FBE | 0345 | E1BEBE | CD85 | 0 | 1 | ι | ͅ |
            | 212B | 00C5 | E284AB | C385 | 1 | 0 | Å | Å |
            | 212B | 00E5 | E284AB | C3A5 | 1 | 0 | Å | å |
            | 00C5 | 212B | C385 | E284AB | 1 | 0 | Å | Å |
            | 00E5 | 212B | C3A5 | E284AB | 1 | 0 | å | Å |
            | 0130 | 0049 | C4B0 | 49 | 1 | 0 | İ | I |
            | 0131 | 0049 | C4B1 | 49 | 0 | 1 | ı | I |
            | 0130 | 0069 | C4B0 | 69 | 1 | 0 | İ | i |
            | 0131 | 0069 | C4B1 | 69 | 0 | 1 | ı | i |
            | 0049 | 0130 | 49 | C4B0 | 1 | 0 | I | İ |
            | 0069 | 0130 | 69 | C4B0 | 1 | 0 | i | İ |
            | 0049 | 0131 | 49 | C4B1 | 0 | 1 | I | ı |
            | 0069 | 0131 | 69 | C4B1 | 0 | 1 | i | ı |
            | 212A | 004B | E284AA | 4B | 1 | 0 | K | K |
            | 212A | 006B | E284AA | 6B | 1 | 0 | K | k |
            | 004B | 212A | 4B | E284AA | 1 | 0 | K | K |
            | 006B | 212A | 6B | E284AA | 1 | 0 | k | K |
            | 017F | 0053 | C5BF | 53 | 0 | 1 | ſ | S |
            | 017F | 0073 | C5BF | 73 | 0 | 1 | ſ | s |
            | 0053 | 017F | 53 | C5BF | 0 | 1 | S | ſ |
            | 0073 | 017F | 73 | C5BF | 0 | 1 | s | ſ |
            | 1E9B | 1E60 | E1BA9B | E1B9A0 | 0 | 1 | ẛ | Ṡ |
            | 1E9B | 1E61 | E1BA9B | E1B9A1 | 0 | 1 | ẛ | ṡ |
            | 1E60 | 1E9B | E1B9A0 | E1BA9B | 0 | 1 | Ṡ | ẛ |
            | 1E61 | 1E9B | E1B9A1 | E1BA9B | 0 | 1 | ṡ | ẛ |
            | 03D0 | 0392 | CF90 | CE92 | 0 | 1 | ϐ | Β |
            | 03D0 | 03B2 | CF90 | CEB2 | 0 | 1 | ϐ | β |
            | 0392 | 03D0 | CE92 | CF90 | 0 | 1 | Β | ϐ |
            | 03B2 | 03D0 | CEB2 | CF90 | 0 | 1 | β | ϐ |
            | 03F5 | 0395 | CFB5 | CE95 | 0 | 1 | ϵ | Ε |
            | 03F5 | 03B5 | CFB5 | CEB5 | 0 | 1 | ϵ | ε |
            | 0395 | 03F5 | CE95 | CFB5 | 0 | 1 | Ε | ϵ |
            | 03B5 | 03F5 | CEB5 | CFB5 | 0 | 1 | ε | ϵ |
            | 03D1 | 0398 | CF91 | CE98 | 0 | 1 | ϑ | Θ |
            | 03F4 | 0398 | CFB4 | CE98 | 1 | 0 | ϴ | Θ |
            | 03D1 | 03B8 | CF91 | CEB8 | 0 | 1 | ϑ | θ |
            | 03F4 | 03B8 | CFB4 | CEB8 | 1 | 0 | ϴ | θ |
            | 0398 | 03D1 | CE98 | CF91 | 0 | 1 | Θ | ϑ |
            | 03B8 | 03D1 | CEB8 | CF91 | 0 | 1 | θ | ϑ |
            | 0398 | 03F4 | CE98 | CFB4 | 1 | 0 | Θ | ϴ |
            | 03B8 | 03F4 | CEB8 | CFB4 | 1 | 0 | θ | ϴ |
            | 0345 | 0399 | CD85 | CE99 | 0 | 1 | ͅ | Ι |
            | 1FBE | 0399 | E1BEBE | CE99 | 0 | 1 | ι | Ι |
            | 0345 | 03B9 | CD85 | CEB9 | 0 | 1 | ͅ | ι |
            | 1FBE | 03B9 | E1BEBE | CEB9 | 0 | 1 | ι | ι |
            | 0345 | 1FBE | CD85 | E1BEBE | 0 | 1 | ͅ | ι |
            | 0399 | 1FBE | CE99 | E1BEBE | 0 | 1 | Ι | ι |
            | 03B9 | 1FBE | CEB9 | E1BEBE | 0 | 1 | ι | ι |
            | 03F0 | 039A | CFB0 | CE9A | 0 | 1 | ϰ | Κ |
            | 03F0 | 03BA | CFB0 | CEBA | 0 | 1 | ϰ | κ |
            | 039A | 03F0 | CE9A | CFB0 | 0 | 1 | Κ | ϰ |
            | 03BA | 03F0 | CEBA | CFB0 | 0 | 1 | κ | ϰ |
            | 039C | 00B5 | CE9C | C2B5 | 0 | 1 | Μ | µ |
            | 03BC | 00B5 | CEBC | C2B5 | 0 | 1 | μ | µ |
            | 00B5 | 039C | C2B5 | CE9C | 0 | 1 | µ | Μ |
            | 00B5 | 03BC | C2B5 | CEBC | 0 | 1 | µ | μ |
            | 03D6 | 03A0 | CF96 | CEA0 | 0 | 1 | ϖ | Π |
            | 03D6 | 03C0 | CF96 | CF80 | 0 | 1 | ϖ | π |
            | 03A0 | 03D6 | CEA0 | CF96 | 0 | 1 | Π | ϖ |
            | 03C0 | 03D6 | CF80 | CF96 | 0 | 1 | π | ϖ |
            | 03F1 | 03A1 | CFB1 | CEA1 | 0 | 1 | ϱ | Ρ |
            | 03F1 | 03C1 | CFB1 | CF81 | 0 | 1 | ϱ | ρ |
            | 03A1 | 03F1 | CEA1 | CFB1 | 0 | 1 | Ρ | ϱ |
            | 03C1 | 03F1 | CF81 | CFB1 | 0 | 1 | ρ | ϱ |
            | 03C2 | 03A3 | CF82 | CEA3 | 0 | 1 | ς | Σ |
            | 03A3 | 03C2 | CEA3 | CF82 | 0 | 1 | Σ | ς |
            | 03C3 | 03C2 | CF83 | CF82 | 0 | 1 | σ | ς |
            | 03C2 | 03C3 | CF82 | CF83 | 0 | 1 | ς | σ |
            | 03D5 | 03A6 | CF95 | CEA6 | 0 | 1 | ϕ | Φ |
            | 03D5 | 03C6 | CF95 | CF86 | 0 | 1 | ϕ | φ |
            | 03A6 | 03D5 | CEA6 | CF95 | 0 | 1 | Φ | ϕ |
            | 03C6 | 03D5 | CF86 | CF95 | 0 | 1 | φ | ϕ |
            | 2126 | 03A9 | E284A6 | CEA9 | 1 | 0 | Ω | Ω |
            | 2126 | 03C9 | E284A6 | CF89 | 1 | 0 | Ω | ω |
            | 03A9 | 2126 | CEA9 | E284A6 | 1 | 0 | Ω | Ω |
            | 03C9 | 2126 | CF89 | E284A6 | 1 | 0 | ω | Ω |
            | 1C80 | 0412 | E1B280 | D092 | 0 | 1 | ᲀ | В |
            | 1C80 | 0432 | E1B280 | D0B2 | 0 | 1 | ᲀ | в |
            | 0412 | 1C80 | D092 | E1B280 | 0 | 1 | В | ᲀ |
            | 0432 | 1C80 | D0B2 | E1B280 | 0 | 1 | в | ᲀ |
            | 1C81 | 0414 | E1B281 | D094 | 0 | 1 | ᲁ | Д |
            | 1C81 | 0434 | E1B281 | D0B4 | 0 | 1 | ᲁ | д |
            | 0414 | 1C81 | D094 | E1B281 | 0 | 1 | Д | ᲁ |
            | 0434 | 1C81 | D0B4 | E1B281 | 0 | 1 | д | ᲁ |
            | 1C82 | 041E | E1B282 | D09E | 0 | 1 | ᲂ | О |
            | 1C82 | 043E | E1B282 | D0BE | 0 | 1 | ᲂ | о |
            | 041E | 1C82 | D09E | E1B282 | 0 | 1 | О | ᲂ |
            | 043E | 1C82 | D0BE | E1B282 | 0 | 1 | о | ᲂ |
            | 1C83 | 0421 | E1B283 | D0A1 | 0 | 1 | ᲃ | С |
            | 1C83 | 0441 | E1B283 | D181 | 0 | 1 | ᲃ | с |
            | 0421 | 1C83 | D0A1 | E1B283 | 0 | 1 | С | ᲃ |
            | 0441 | 1C83 | D181 | E1B283 | 0 | 1 | с | ᲃ |
            | 1C84 | 0422 | E1B284 | D0A2 | 0 | 1 | ᲄ | Т |
            | 1C85 | 0422 | E1B285 | D0A2 | 0 | 1 | ᲅ | Т |
            | 1C84 | 0442 | E1B284 | D182 | 0 | 1 | ᲄ | т |
            | 1C85 | 0442 | E1B285 | D182 | 0 | 1 | ᲅ | т |
            | 0422 | 1C84 | D0A2 | E1B284 | 0 | 1 | Т | ᲄ |
            | 0442 | 1C84 | D182 | E1B284 | 0 | 1 | т | ᲄ |
            | 1C85 | 1C84 | E1B285 | E1B284 | 0 | 1 | ᲅ | ᲄ |
            | 0422 | 1C85 | D0A2 | E1B285 | 0 | 1 | Т | ᲅ |
            | 0442 | 1C85 | D182 | E1B285 | 0 | 1 | т | ᲅ |
            | 1C84 | 1C85 | E1B284 | E1B285 | 0 | 1 | ᲄ | ᲅ |
            | A64A | 1C88 | EA998A | E1B288 | 0 | 1 | Ꙋ | ᲈ |
            | A64B | 1C88 | EA998B | E1B288 | 0 | 1 | ꙋ | ᲈ |
            | 1C88 | A64A | E1B288 | EA998A | 0 | 1 | ᲈ | Ꙋ |
            | 1C88 | A64B | E1B288 | EA998B | 0 | 1 | ᲈ | ꙋ |
            | 1C86 | 042A | E1B286 | D0AA | 0 | 1 | ᲆ | Ъ |
            | 1C86 | 044A | E1B286 | D18A | 0 | 1 | ᲆ | ъ |
            | 042A | 1C86 | D0AA | E1B286 | 0 | 1 | Ъ | ᲆ |
            | 044A | 1C86 | D18A | E1B286 | 0 | 1 | ъ | ᲆ |
            | 1C87 | 0462 | E1B287 | D1A2 | 0 | 1 | ᲇ | Ѣ |
            | 1C87 | 0463 | E1B287 | D1A3 | 0 | 1 | ᲇ | ѣ |
            | 0462 | 1C87 | D1A2 | E1B287 | 0 | 1 | Ѣ | ᲇ |
            | 0463 | 1C87 | D1A3 | E1B287 | 0 | 1 | ѣ | ᲇ |
            +-----------+-----------+--------+--------+----------+----------+------+------+
            {noformat}


            Note, toupper comparison considers more pairs as equal than tolower comparison:
            {code:sql}
            SELECT SUM(eq_lower), SUM(eq_upper) FROM t21;
            {code}
            {noformat}
            +---------------+---------------+
            | SUM(eq_lower) | SUM(eq_upper) |
            +---------------+---------------+
            | 21 | 96 |
            +---------------+---------------+
            {noformat}

            h2. A more compact table with distinct pairs (the character with a smaller code point is on the left)
            {code:sql}
            SELECT
              HEX(CONVERT(a USING ucs2)) AS unicode_a,
              HEX(CONVERT(b USING ucs2)) AS unicode_b,
              HEX(a), HEX(b),
              BINARY LOWER(a)=LOWER(b) AS eq_lower,
              BINARY UPPER(a)=UPPER(b) AS eq_upper,
              a,b
            FROM
            (
              SELECT DISTINCT IF(BINARY a<b,a,b) AS a,IF(binary a>=b,a,b) AS b from t21
            ) d1
            ORDER BY eq_lower, unicode_a, unicode_b;
            {code}
            {noformat}
            +-----------+-----------+--------+--------+----------+----------+-----+-----+
            | unicode_a | unicode_b | HEX(a) | HEX(b) | eq_lower | eq_upper | a | b |
            +-----------+-----------+--------+--------+----------+----------+-----+-----+
            | 0049 | 0131 | 49 | C4B1 | 0 | 1 | I | ı |
            | 0053 | 017F | 53 | C5BF | 0 | 1 | S | ſ |
            | 00B5 | 039C | C2B5 | CE9C | 0 | 1 | µ | Μ |
            | 0345 | 0399 | CD85 | CE99 | 0 | 1 | ͅ | Ι |
            | 0392 | 03D0 | CE92 | CF90 | 0 | 1 | Β | ϐ |
            | 0395 | 03F5 | CE95 | CFB5 | 0 | 1 | Ε | ϵ |
            | 0398 | 03D1 | CE98 | CF91 | 0 | 1 | Θ | ϑ |
            | 0399 | 1FBE | CE99 | E1BEBE | 0 | 1 | Ι | ι |
            | 039A | 03F0 | CE9A | CFB0 | 0 | 1 | Κ | ϰ |
            | 03A0 | 03D6 | CEA0 | CF96 | 0 | 1 | Π | ϖ |
            | 03A1 | 03F1 | CEA1 | CFB1 | 0 | 1 | Ρ | ϱ |
            | 03A3 | 03C2 | CEA3 | CF82 | 0 | 1 | Σ | ς |
            | 03A6 | 03D5 | CEA6 | CF95 | 0 | 1 | Φ | ϕ |
            | 0412 | 1C80 | D092 | E1B280 | 0 | 1 | В | ᲀ |
            | 0414 | 1C81 | D094 | E1B281 | 0 | 1 | Д | ᲁ |
            | 041E | 1C82 | D09E | E1B282 | 0 | 1 | О | ᲂ |
            | 0421 | 1C83 | D0A1 | E1B283 | 0 | 1 | С | ᲃ |
            | 0422 | 1C84 | D0A2 | E1B284 | 0 | 1 | Т | ᲄ |
            | 042A | 1C86 | D0AA | E1B286 | 0 | 1 | Ъ | ᲆ |
            | 0462 | 1C87 | D1A2 | E1B287 | 0 | 1 | Ѣ | ᲇ |
            | 1C88 | A64A | E1B288 | EA998A | 0 | 1 | ᲈ | Ꙋ |
            | 0049 | 0130 | 49 | C4B0 | 1 | 0 | I | İ |
            | 004B | 212A | 4B | E284AA | 1 | 0 | K | K |
            | 00C5 | 212B | C385 | E284AB | 1 | 0 | Å | Å |
            | 00DF | 1E9E | C39F | E1BA9E | 1 | 0 | ß | ẞ |
            | 03A9 | 2126 | CEA9 | E284A6 | 1 | 0 | Ω | Ω |
            +-----------+-----------+--------+--------+----------+----------+-----+-----+
            26 rows in set (0.005 sec)
            {noformat}

            h2. Cons of toupper vs tolower comparison
            - The tolower variant will compare exactly like strcasecmp() did
            - The toupper variant will compare close to how utf8mb3_general_ci works
            There are two parallel comparison systems in MariaDB collation library, implemented as virtual functions in MY_COLLATION_HANDLER:

            - Comparison according to the collation, provided by these functions
            {code:cpp}
              int (*strnncoll)(CHARSET_INFO *,
                                   const uchar *, size_t, const uchar *, size_t, my_bool);
              int (*strnncollsp)(CHARSET_INFO *,
                                     const uchar *, size_t, const uchar *, size_t);

              int (*strnncollsp_nchars)(CHARSET_INFO *,
                                            const uchar *str1, size_t len1,
                                            const uchar *str2, size_t len2,
                                            size_t nchars,
                                            uint flags);
            {code}


            - Comparison in case insensitive (but accent sensitive) style, implemented by this function:
            {code:cpp}
              int (*strcasecmp)(CHARSET_INFO *, const char *, const char *);
            {code}
            Note, accent and case sensitivity of the collation does not matter. strcasecmp() always works using accent sensitive case insensitive comparison style.

            These two parallel systems are redundant.

            Note, strcasecmp() is used mostly to compare identifiers, while the functions of the first group are used to compare data.

            Let's get rid of the second comparison system:
            1. Remove MY_COLLATION_HANDLER::strcasecmp()
            2. Introduce a new collation utf8mb4_general_as_ci. Note, it should work for the entire Unicode range U+0000..U+10FFFF.
            3. Replace all calls for:
            {code:cpp}
            system_charset_info->coll->strcasecmp()
            {code}
            to calls for
            {code:cpp}
            my_charset_utf8mb4_general_as_ci->coll->strnncoll***()
            {code}

            The change would generally be quite mechanic. However, there is one small problem: strcasecmp() accepts 0-terminated strings, while the strnncoll-alike functions accept the pointer and the length. So some refactoring will be needed. Note, Monty earlier changed many MariaDB C data types to use LEX_CSTRING (instead of just a const char pointer) to store names. So this part of the current task will switch some more C data types to LEX_CSTRING.

            h2. tolower vs toupper comparison

            Another option is to implement utf8mb3_general_as_ci which will compare upper cases (instead of lower cases).


            The difference is in a few dozen BMP characters. This script finds all those characters:
            {code:sql}
            CREATE OR REPLACE TABLE t1 (a CHAR(1) CHARACTER SET utf8mb4 COLLATE utf8mb4_uca1400_ai_ci) ENGINE=MyISAM;
            DELIMITER $$
            FOR i IN 0..0xFFFF
            DO
              INSERT INTO t1 VALUES (CHAR(i USING ucs2));
            END FOR;
            $$
            DELIMITER ;
            ALTER TABLE t1
              ADD has_casefolding INT DEFAULT (BINARY LOWER(a)<>a OR BINARY UPPER(a)<>a),
              ADD KEY(has_casefolding, a);

            CREATE OR REPLACE TABLE t21 AS
            SELECT
              HEX(t1.a) AS hex_a, HEX(t2.a) AS hex_b,
              BINARY LOWER(t1.a)=LOWER(t2.a) eq_lower,
              BINARY UPPER(t1.a)=UPPER(t2.a) AS eq_upper,
              t1.a AS a, t2.a AS b
            FROM
              t1 t1, t1 t2
            WHERE
              t1.has_casefolding=1
            AND (BINARY LOWER(t1.a)=LOWER(t2.a))<>(BINARY UPPER(t1.a)=UPPER(t2.a));

            SELECT
              HEX(CONVERT(a USING ucs2)) AS unicode_a,
              HEX(CONVERT(b USING ucs2)) AS unicode_b,
              t21.* FROM t21;
            {code}
            {noformat}
            +-----------+-----------+--------+--------+----------+----------+------+------+
            | unicode_a | unicode_b | hex_a | hex_b | eq_lower | eq_upper | a | b |
            +-----------+-----------+--------+--------+----------+----------+------+------+
            | 1E9E | 00DF | E1BA9E | C39F | 1 | 0 | ẞ | ß |
            | 0399 | 0345 | CE99 | CD85 | 0 | 1 | Ι | ͅ |
            | 03B9 | 0345 | CEB9 | CD85 | 0 | 1 | ι | ͅ |
            | 1FBE | 0345 | E1BEBE | CD85 | 0 | 1 | ι | ͅ |
            | 212B | 00C5 | E284AB | C385 | 1 | 0 | Å | Å |
            | 212B | 00E5 | E284AB | C3A5 | 1 | 0 | Å | å |
            | 00C5 | 212B | C385 | E284AB | 1 | 0 | Å | Å |
            | 00E5 | 212B | C3A5 | E284AB | 1 | 0 | å | Å |
            | 0130 | 0049 | C4B0 | 49 | 1 | 0 | İ | I |
            | 0131 | 0049 | C4B1 | 49 | 0 | 1 | ı | I |
            | 0130 | 0069 | C4B0 | 69 | 1 | 0 | İ | i |
            | 0131 | 0069 | C4B1 | 69 | 0 | 1 | ı | i |
            | 0049 | 0130 | 49 | C4B0 | 1 | 0 | I | İ |
            | 0069 | 0130 | 69 | C4B0 | 1 | 0 | i | İ |
            | 0049 | 0131 | 49 | C4B1 | 0 | 1 | I | ı |
            | 0069 | 0131 | 69 | C4B1 | 0 | 1 | i | ı |
            | 212A | 004B | E284AA | 4B | 1 | 0 | K | K |
            | 212A | 006B | E284AA | 6B | 1 | 0 | K | k |
            | 004B | 212A | 4B | E284AA | 1 | 0 | K | K |
            | 006B | 212A | 6B | E284AA | 1 | 0 | k | K |
            | 017F | 0053 | C5BF | 53 | 0 | 1 | ſ | S |
            | 017F | 0073 | C5BF | 73 | 0 | 1 | ſ | s |
            | 0053 | 017F | 53 | C5BF | 0 | 1 | S | ſ |
            | 0073 | 017F | 73 | C5BF | 0 | 1 | s | ſ |
            | 1E9B | 1E60 | E1BA9B | E1B9A0 | 0 | 1 | ẛ | Ṡ |
            | 1E9B | 1E61 | E1BA9B | E1B9A1 | 0 | 1 | ẛ | ṡ |
            | 1E60 | 1E9B | E1B9A0 | E1BA9B | 0 | 1 | Ṡ | ẛ |
            | 1E61 | 1E9B | E1B9A1 | E1BA9B | 0 | 1 | ṡ | ẛ |
            | 03D0 | 0392 | CF90 | CE92 | 0 | 1 | ϐ | Β |
            | 03D0 | 03B2 | CF90 | CEB2 | 0 | 1 | ϐ | β |
            | 0392 | 03D0 | CE92 | CF90 | 0 | 1 | Β | ϐ |
            | 03B2 | 03D0 | CEB2 | CF90 | 0 | 1 | β | ϐ |
            | 03F5 | 0395 | CFB5 | CE95 | 0 | 1 | ϵ | Ε |
            | 03F5 | 03B5 | CFB5 | CEB5 | 0 | 1 | ϵ | ε |
            | 0395 | 03F5 | CE95 | CFB5 | 0 | 1 | Ε | ϵ |
            | 03B5 | 03F5 | CEB5 | CFB5 | 0 | 1 | ε | ϵ |
            | 03D1 | 0398 | CF91 | CE98 | 0 | 1 | ϑ | Θ |
            | 03F4 | 0398 | CFB4 | CE98 | 1 | 0 | ϴ | Θ |
            | 03D1 | 03B8 | CF91 | CEB8 | 0 | 1 | ϑ | θ |
            | 03F4 | 03B8 | CFB4 | CEB8 | 1 | 0 | ϴ | θ |
            | 0398 | 03D1 | CE98 | CF91 | 0 | 1 | Θ | ϑ |
            | 03B8 | 03D1 | CEB8 | CF91 | 0 | 1 | θ | ϑ |
            | 0398 | 03F4 | CE98 | CFB4 | 1 | 0 | Θ | ϴ |
            | 03B8 | 03F4 | CEB8 | CFB4 | 1 | 0 | θ | ϴ |
            | 0345 | 0399 | CD85 | CE99 | 0 | 1 | ͅ | Ι |
            | 1FBE | 0399 | E1BEBE | CE99 | 0 | 1 | ι | Ι |
            | 0345 | 03B9 | CD85 | CEB9 | 0 | 1 | ͅ | ι |
            | 1FBE | 03B9 | E1BEBE | CEB9 | 0 | 1 | ι | ι |
            | 0345 | 1FBE | CD85 | E1BEBE | 0 | 1 | ͅ | ι |
            | 0399 | 1FBE | CE99 | E1BEBE | 0 | 1 | Ι | ι |
            | 03B9 | 1FBE | CEB9 | E1BEBE | 0 | 1 | ι | ι |
            | 03F0 | 039A | CFB0 | CE9A | 0 | 1 | ϰ | Κ |
            | 03F0 | 03BA | CFB0 | CEBA | 0 | 1 | ϰ | κ |
            | 039A | 03F0 | CE9A | CFB0 | 0 | 1 | Κ | ϰ |
            | 03BA | 03F0 | CEBA | CFB0 | 0 | 1 | κ | ϰ |
            | 039C | 00B5 | CE9C | C2B5 | 0 | 1 | Μ | µ |
            | 03BC | 00B5 | CEBC | C2B5 | 0 | 1 | μ | µ |
            | 00B5 | 039C | C2B5 | CE9C | 0 | 1 | µ | Μ |
            | 00B5 | 03BC | C2B5 | CEBC | 0 | 1 | µ | μ |
            | 03D6 | 03A0 | CF96 | CEA0 | 0 | 1 | ϖ | Π |
            | 03D6 | 03C0 | CF96 | CF80 | 0 | 1 | ϖ | π |
            | 03A0 | 03D6 | CEA0 | CF96 | 0 | 1 | Π | ϖ |
            | 03C0 | 03D6 | CF80 | CF96 | 0 | 1 | π | ϖ |
            | 03F1 | 03A1 | CFB1 | CEA1 | 0 | 1 | ϱ | Ρ |
            | 03F1 | 03C1 | CFB1 | CF81 | 0 | 1 | ϱ | ρ |
            | 03A1 | 03F1 | CEA1 | CFB1 | 0 | 1 | Ρ | ϱ |
            | 03C1 | 03F1 | CF81 | CFB1 | 0 | 1 | ρ | ϱ |
            | 03C2 | 03A3 | CF82 | CEA3 | 0 | 1 | ς | Σ |
            | 03A3 | 03C2 | CEA3 | CF82 | 0 | 1 | Σ | ς |
            | 03C3 | 03C2 | CF83 | CF82 | 0 | 1 | σ | ς |
            | 03C2 | 03C3 | CF82 | CF83 | 0 | 1 | ς | σ |
            | 03D5 | 03A6 | CF95 | CEA6 | 0 | 1 | ϕ | Φ |
            | 03D5 | 03C6 | CF95 | CF86 | 0 | 1 | ϕ | φ |
            | 03A6 | 03D5 | CEA6 | CF95 | 0 | 1 | Φ | ϕ |
            | 03C6 | 03D5 | CF86 | CF95 | 0 | 1 | φ | ϕ |
            | 2126 | 03A9 | E284A6 | CEA9 | 1 | 0 | Ω | Ω |
            | 2126 | 03C9 | E284A6 | CF89 | 1 | 0 | Ω | ω |
            | 03A9 | 2126 | CEA9 | E284A6 | 1 | 0 | Ω | Ω |
            | 03C9 | 2126 | CF89 | E284A6 | 1 | 0 | ω | Ω |
            | 1C80 | 0412 | E1B280 | D092 | 0 | 1 | ᲀ | В |
            | 1C80 | 0432 | E1B280 | D0B2 | 0 | 1 | ᲀ | в |
            | 0412 | 1C80 | D092 | E1B280 | 0 | 1 | В | ᲀ |
            | 0432 | 1C80 | D0B2 | E1B280 | 0 | 1 | в | ᲀ |
            | 1C81 | 0414 | E1B281 | D094 | 0 | 1 | ᲁ | Д |
            | 1C81 | 0434 | E1B281 | D0B4 | 0 | 1 | ᲁ | д |
            | 0414 | 1C81 | D094 | E1B281 | 0 | 1 | Д | ᲁ |
            | 0434 | 1C81 | D0B4 | E1B281 | 0 | 1 | д | ᲁ |
            | 1C82 | 041E | E1B282 | D09E | 0 | 1 | ᲂ | О |
            | 1C82 | 043E | E1B282 | D0BE | 0 | 1 | ᲂ | о |
            | 041E | 1C82 | D09E | E1B282 | 0 | 1 | О | ᲂ |
            | 043E | 1C82 | D0BE | E1B282 | 0 | 1 | о | ᲂ |
            | 1C83 | 0421 | E1B283 | D0A1 | 0 | 1 | ᲃ | С |
            | 1C83 | 0441 | E1B283 | D181 | 0 | 1 | ᲃ | с |
            | 0421 | 1C83 | D0A1 | E1B283 | 0 | 1 | С | ᲃ |
            | 0441 | 1C83 | D181 | E1B283 | 0 | 1 | с | ᲃ |
            | 1C84 | 0422 | E1B284 | D0A2 | 0 | 1 | ᲄ | Т |
            | 1C85 | 0422 | E1B285 | D0A2 | 0 | 1 | ᲅ | Т |
            | 1C84 | 0442 | E1B284 | D182 | 0 | 1 | ᲄ | т |
            | 1C85 | 0442 | E1B285 | D182 | 0 | 1 | ᲅ | т |
            | 0422 | 1C84 | D0A2 | E1B284 | 0 | 1 | Т | ᲄ |
            | 0442 | 1C84 | D182 | E1B284 | 0 | 1 | т | ᲄ |
            | 1C85 | 1C84 | E1B285 | E1B284 | 0 | 1 | ᲅ | ᲄ |
            | 0422 | 1C85 | D0A2 | E1B285 | 0 | 1 | Т | ᲅ |
            | 0442 | 1C85 | D182 | E1B285 | 0 | 1 | т | ᲅ |
            | 1C84 | 1C85 | E1B284 | E1B285 | 0 | 1 | ᲄ | ᲅ |
            | A64A | 1C88 | EA998A | E1B288 | 0 | 1 | Ꙋ | ᲈ |
            | A64B | 1C88 | EA998B | E1B288 | 0 | 1 | ꙋ | ᲈ |
            | 1C88 | A64A | E1B288 | EA998A | 0 | 1 | ᲈ | Ꙋ |
            | 1C88 | A64B | E1B288 | EA998B | 0 | 1 | ᲈ | ꙋ |
            | 1C86 | 042A | E1B286 | D0AA | 0 | 1 | ᲆ | Ъ |
            | 1C86 | 044A | E1B286 | D18A | 0 | 1 | ᲆ | ъ |
            | 042A | 1C86 | D0AA | E1B286 | 0 | 1 | Ъ | ᲆ |
            | 044A | 1C86 | D18A | E1B286 | 0 | 1 | ъ | ᲆ |
            | 1C87 | 0462 | E1B287 | D1A2 | 0 | 1 | ᲇ | Ѣ |
            | 1C87 | 0463 | E1B287 | D1A3 | 0 | 1 | ᲇ | ѣ |
            | 0462 | 1C87 | D1A2 | E1B287 | 0 | 1 | Ѣ | ᲇ |
            | 0463 | 1C87 | D1A3 | E1B287 | 0 | 1 | ѣ | ᲇ |
            +-----------+-----------+--------+--------+----------+----------+------+------+
            {noformat}


            Note, toupper comparison considers more pairs as equal than tolower comparison:
            {code:sql}
            SELECT SUM(eq_lower), SUM(eq_upper) FROM t21;
            {code}
            {noformat}
            +---------------+---------------+
            | SUM(eq_lower) | SUM(eq_upper) |
            +---------------+---------------+
            | 21 | 96 |
            +---------------+---------------+
            {noformat}

            h2. A more compact table with distinct pairs (the character with a smaller code point is on the left)
            {code:sql}
            SELECT
              HEX(CONVERT(a USING ucs2)) AS unicode_a,
              HEX(CONVERT(b USING ucs2)) AS unicode_b,
              HEX(a), HEX(b),
              BINARY LOWER(a)=LOWER(b) AS eq_lower,
              BINARY UPPER(a)=UPPER(b) AS eq_upper,
              a,b
            FROM
            (
              SELECT DISTINCT IF(BINARY a<b,a,b) AS a,IF(binary a>=b,a,b) AS b from t21
            ) d1
            ORDER BY eq_lower, unicode_a, unicode_b;
            {code}
            {noformat}
            +-----------+-----------+--------+--------+----------+----------+-----+-----+
            | unicode_a | unicode_b | HEX(a) | HEX(b) | eq_lower | eq_upper | a | b |
            +-----------+-----------+--------+--------+----------+----------+-----+-----+
            | 0049 | 0131 | 49 | C4B1 | 0 | 1 | I | ı |
            | 0053 | 017F | 53 | C5BF | 0 | 1 | S | ſ |
            | 00B5 | 039C | C2B5 | CE9C | 0 | 1 | µ | Μ |
            | 0345 | 0399 | CD85 | CE99 | 0 | 1 | ͅ | Ι |
            | 0392 | 03D0 | CE92 | CF90 | 0 | 1 | Β | ϐ |
            | 0395 | 03F5 | CE95 | CFB5 | 0 | 1 | Ε | ϵ |
            | 0398 | 03D1 | CE98 | CF91 | 0 | 1 | Θ | ϑ |
            | 0399 | 1FBE | CE99 | E1BEBE | 0 | 1 | Ι | ι |
            | 039A | 03F0 | CE9A | CFB0 | 0 | 1 | Κ | ϰ |
            | 03A0 | 03D6 | CEA0 | CF96 | 0 | 1 | Π | ϖ |
            | 03A1 | 03F1 | CEA1 | CFB1 | 0 | 1 | Ρ | ϱ |
            | 03A3 | 03C2 | CEA3 | CF82 | 0 | 1 | Σ | ς |
            | 03A6 | 03D5 | CEA6 | CF95 | 0 | 1 | Φ | ϕ |
            | 0412 | 1C80 | D092 | E1B280 | 0 | 1 | В | ᲀ |
            | 0414 | 1C81 | D094 | E1B281 | 0 | 1 | Д | ᲁ |
            | 041E | 1C82 | D09E | E1B282 | 0 | 1 | О | ᲂ |
            | 0421 | 1C83 | D0A1 | E1B283 | 0 | 1 | С | ᲃ |
            | 0422 | 1C84 | D0A2 | E1B284 | 0 | 1 | Т | ᲄ |
            | 042A | 1C86 | D0AA | E1B286 | 0 | 1 | Ъ | ᲆ |
            | 0462 | 1C87 | D1A2 | E1B287 | 0 | 1 | Ѣ | ᲇ |
            | 1C88 | A64A | E1B288 | EA998A | 0 | 1 | ᲈ | Ꙋ |
            | 0049 | 0130 | 49 | C4B0 | 1 | 0 | I | İ |
            | 004B | 212A | 4B | E284AA | 1 | 0 | K | K |
            | 00C5 | 212B | C385 | E284AB | 1 | 0 | Å | Å |
            | 00DF | 1E9E | C39F | E1BA9E | 1 | 0 | ß | ẞ |
            | 03A9 | 2126 | CEA9 | E284A6 | 1 | 0 | Ω | Ω |
            +-----------+-----------+--------+--------+----------+----------+-----+-----+
            26 rows in set (0.005 sec)
            {noformat}

            h2. Cons of toupper vs tolower comparison
            - The tolower variant will compare exactly like strcasecmp() did
            - The toupper variant will compare close to how utf8mb3_general_ci works
            bar Alexander Barkov made changes -
            Status Stalled [ 10000 ] In Progress [ 3 ]
            bar Alexander Barkov made changes -
            bar Alexander Barkov made changes -
            bar Alexander Barkov made changes -
            bar Alexander Barkov made changes -
            bar Alexander Barkov made changes -
            bar Alexander Barkov made changes -
            bar Alexander Barkov made changes -
            serg Sergei Golubchik made changes -
            Fix Version/s 11.5 [ 29506 ]
            Fix Version/s 11.4 [ 29301 ]
            bar Alexander Barkov made changes -
            bar Alexander Barkov made changes -

            Hello serg,

            This is a new patch version:
            https://github.com/MariaDB/server/commit/15c315211befe7da677f9a533f0ceabe21caf9a8

            Replied to your review comments by email.

            bar Alexander Barkov added a comment - Hello serg , This is a new patch version: https://github.com/MariaDB/server/commit/15c315211befe7da677f9a533f0ceabe21caf9a8 Replied to your review comments by email.
            bar Alexander Barkov made changes -
            Assignee Alexander Barkov [ bar ] Sergei Golubchik [ serg ]
            Status In Progress [ 3 ] In Review [ 10002 ]

            I reviewed the InnoDB changes, except those related to MDEV-31953 and mem_pressure, which I assume will not be part of the final version.

            For dict_table_schema_check(), it could be better to modify the type of dict_col_meta_t::name and the declarations of table_stats_schema and index_stats_schema. The constants innobase_index_reserve_name_ls, FTS_DOC_ID_COL_NAME_ls and FTS_DOC_ID_INDEX_NAME_ls could be renamed to less obfuscated ones GEN_CLUST_INDEX_ls, FTS_DOC_ID_ls and FTS_DOC_ID_INDEX_ls. Why are those constant defined as static const in header files? Could they be static constexpr instead? Or could the constants be declared in the header file, and defined only once in one compilation unit?

            Have you measured the code size and performance impact of defining dict_table_get_col_name_ls() and similar functions inline? As far as I understand, it could be better to emit function calls than to duplicate code that includes loops or conditional branches.

            Should the following be removed from client/mysql.cc?

            #define cmp_database(A,B) my_strcasecmp_latin1((A), (B))
            

            Should the functions my_strcasecmp_8bit() and my_strcasecmp_latin1() continue to be defined?

            marko Marko Mäkelä added a comment - I reviewed the InnoDB changes, except those related to MDEV-31953 and mem_pressure , which I assume will not be part of the final version. For dict_table_schema_check() , it could be better to modify the type of dict_col_meta_t::name and the declarations of table_stats_schema and index_stats_schema . The constants innobase_index_reserve_name_ls , FTS_DOC_ID_COL_NAME_ls and FTS_DOC_ID_INDEX_NAME_ls could be renamed to less obfuscated ones GEN_CLUST_INDEX_ls , FTS_DOC_ID_ls and FTS_DOC_ID_INDEX_ls . Why are those constant defined as static const in header files? Could they be static constexpr instead? Or could the constants be declared in the header file, and defined only once in one compilation unit? Have you measured the code size and performance impact of defining dict_table_get_col_name_ls() and similar functions inline? As far as I understand, it could be better to emit function calls than to duplicate code that includes loops or conditional branches. Should the following be removed from client/mysql.cc ? #define cmp_database(A,B) my_strcasecmp_latin1((A), (B)) Should the functions my_strcasecmp_8bit() and my_strcasecmp_latin1() continue to be defined?
            bar Alexander Barkov made changes -

            Hi marko! Thanks for your review!
            Please find a new version here:
            https://github.com/MariaDB/server/commit/0f52062f908a3dbe64f7ba6929584a896117c7f3

            I reviewed the InnoDB changes, except those related to MDEV-31953 and mem_pressure, which I assume will not be part of the final version.

            True, the mem_pressure part won't be in the final version.

            For dict_table_schema_check(), it could be better to modify the type of dict_col_meta_t::name and the declarations of table_stats_schema and index_stats_schema.

            Done. It looks better. Thanks for this suggestion.

            The constants innobase_index_reserve_name_ls, FTS_DOC_ID_COL_NAME_ls and FTS_DOC_ID_INDEX_NAME_ls could be renamed to less obfuscated ones GEN_CLUST_INDEX_ls, FTS_DOC_ID_ls and FTS_DOC_ID_INDEX_ls.

            This rename does not seem to relate to the current task. Can we do it in a separate patch if needed?

            Why are those constant defined as static const in header files? Could they be static constexpr instead? Or could the constants be declared in the header file, and defined only once in one compilation unit?

            I changed them to static constexpr.

            Have you measured the code size and performance impact of defining dict_table_get_col_name_ls() and similar functions inline? As far as I understand, it could be better to emit function calls than to duplicate code that includes loops or conditional branches.

            No I have not. dict_table_get_col_name_ls() does not seem to have conditional branches.

            Should the following be removed from client/mysql.cc?

            #define cmp_database(A,B) my_strcasecmp_latin1((A), (B))

            cmp_database() is used in the code. Why remove it?

            Should the functions my_strcasecmp_8bit() and my_strcasecmp_latin1() continue to be defined?

            Yes, latin1 case insensitive comparison is still used in the code.

            bar Alexander Barkov added a comment - Hi marko ! Thanks for your review! Please find a new version here: https://github.com/MariaDB/server/commit/0f52062f908a3dbe64f7ba6929584a896117c7f3 I reviewed the InnoDB changes, except those related to MDEV-31953 and mem_pressure, which I assume will not be part of the final version. True, the mem_pressure part won't be in the final version. For dict_table_schema_check(), it could be better to modify the type of dict_col_meta_t::name and the declarations of table_stats_schema and index_stats_schema. Done. It looks better. Thanks for this suggestion. The constants innobase_index_reserve_name_ls, FTS_DOC_ID_COL_NAME_ls and FTS_DOC_ID_INDEX_NAME_ls could be renamed to less obfuscated ones GEN_CLUST_INDEX_ls, FTS_DOC_ID_ls and FTS_DOC_ID_INDEX_ls. This rename does not seem to relate to the current task. Can we do it in a separate patch if needed? Why are those constant defined as static const in header files? Could they be static constexpr instead? Or could the constants be declared in the header file, and defined only once in one compilation unit? I changed them to static constexpr. Have you measured the code size and performance impact of defining dict_table_get_col_name_ls() and similar functions inline? As far as I understand, it could be better to emit function calls than to duplicate code that includes loops or conditional branches. No I have not. dict_table_get_col_name_ls() does not seem to have conditional branches. Should the following be removed from client/mysql.cc? #define cmp_database(A,B) my_strcasecmp_latin1((A), (B)) cmp_database() is used in the code. Why remove it? Should the functions my_strcasecmp_8bit() and my_strcasecmp_latin1() continue to be defined? Yes, latin1 case insensitive comparison is still used in the code.

            Right, the function dict_table_get_col_name_ls() may not directly expand to any conditional branches (depending on the build parameters), but its execution will include them, for example strlen(). In fact, dict_table_get_col_name_ls() is duplicating some effort by invoking strlen() one more time, after dict_col_t::name() had already computed it for the returned name.

            On my system, even when I enable -O3 build, I observe that dict_table_has_column() is actually invoking dict_table_get_col_name_ls() via a call instruction and not expanding it inline. I think that it would be better to declare and define dict_table_get_col_name_ls() and dict_table_get_v_col_name_ls() without inline. The functions dict_col_t::name() and dict_table_get_v_col_name() had better be refactored, something like this:

            static const char *get_name(std::pair<const char*,size_t> n)
            {
              if (n.first)
                for (size_t i= 0; i < n.second; i++)
                  n.first+= strlen(n.first) + 1;
              return n.first;
            }
             
            static Lex_ident_column get_name_ls(std::pair<const char*,size_t> n); // to be defined
             
            const char *dict_col_t::name_low(const dict_table_t& table) const
            {
              // same code as name(), but without the for loop that was refactored to get_name() above
            }
            const char *dict_col_t::name(const dict_table_t& table) const
            {
              return get_name(name_low(table));
            }
            static const char *dict_table_get_v_col_name_low(const dict_table_t *table, ulint col_nr)
            {
              // same as the old code, but without the for loop at the end
            }
            const char *dict_table_get_v_col_name(const dict_table_t *table, ulint col_nr)
            {
              return get_name(dict_table_get_v_col_name_low(table, col_nr));
            }
            Lex_ident_column dict_col_t::name_ls(const dict_table_t &table) const
            {
              return get_name_ls(name_low(table));
            }
            Lex_ident_column dict_table_get_v_col_name_ls(const dict_table_t * table, ulint col_nr)
            {
              return get_name_ls(dict_table_get_v_col_name_low(table, col_nr));
            }
            

            The get_name() part is currently duplicated in both functions. You would write get_name_ls() in such a way that the duplicate call to strlen() would be avoided.

            Having less inline code defined in common header files should improve the compilation time. Since this is infrequently called code, any runtime speed advantage of inlining should not matter. Reduced code size could lead to faster running and easier-to-debug code.

            marko Marko Mäkelä added a comment - Right, the function dict_table_get_col_name_ls() may not directly expand to any conditional branches (depending on the build parameters), but its execution will include them, for example strlen() . In fact, dict_table_get_col_name_ls() is duplicating some effort by invoking strlen() one more time, after dict_col_t::name() had already computed it for the returned name. On my system, even when I enable -O3 build, I observe that dict_table_has_column() is actually invoking dict_table_get_col_name_ls() via a call instruction and not expanding it inline. I think that it would be better to declare and define dict_table_get_col_name_ls() and dict_table_get_v_col_name_ls() without inline . The functions dict_col_t::name() and dict_table_get_v_col_name() had better be refactored, something like this: static const char *get_name(std::pair< const char *, size_t > n) { if (n.first) for ( size_t i= 0; i < n.second; i++) n.first+= strlen (n.first) + 1; return n.first; }   static Lex_ident_column get_name_ls(std::pair< const char *, size_t > n); // to be defined   const char *dict_col_t::name_low( const dict_table_t& table) const { // same code as name(), but without the for loop that was refactored to get_name() above } const char *dict_col_t::name( const dict_table_t& table) const { return get_name(name_low(table)); } static const char *dict_table_get_v_col_name_low( const dict_table_t *table, ulint col_nr) { // same as the old code, but without the for loop at the end } const char *dict_table_get_v_col_name( const dict_table_t *table, ulint col_nr) { return get_name(dict_table_get_v_col_name_low(table, col_nr)); } Lex_ident_column dict_col_t::name_ls( const dict_table_t &table) const { return get_name_ls(name_low(table)); } Lex_ident_column dict_table_get_v_col_name_ls( const dict_table_t * table, ulint col_nr) { return get_name_ls(dict_table_get_v_col_name_low(table, col_nr)); } The get_name() part is currently duplicated in both functions. You would write get_name_ls() in such a way that the duplicate call to strlen() would be avoided. Having less inline code defined in common header files should improve the compilation time. Since this is infrequently called code, any runtime speed advantage of inlining should not matter. Reduced code size could lead to faster running and easier-to-debug code.

            Hi marko, thanks for your suggestions.
            Please see a new patch here:
            https://github.com/MariaDB/server/commits/bb-11.4-bar-MDEV-31340/

            bar Alexander Barkov added a comment - Hi marko , thanks for your suggestions. Please see a new patch here: https://github.com/MariaDB/server/commits/bb-11.4-bar-MDEV-31340/

            Thank you, bar, the InnoDB changes look good to me. For any new or rewritten functions, such as dict_table_t::get_name_from_z_list() or dict_col_t::name(), I think that it is best to use the common formatting style instead of the archaic InnoDB one.

            marko Marko Mäkelä added a comment - Thank you, bar , the InnoDB changes look good to me. For any new or rewritten functions, such as dict_table_t::get_name_from_z_list() or dict_col_t::name() , I think that it is best to use the common formatting style instead of the archaic InnoDB one.
            bar Alexander Barkov added a comment - - edited

            Hi serg,

            Please find the latest patch here:

            https://github.com/MariaDB/server/commit/be66e975ecf15ae696d2e645da4aaf1a06cf0082

            It addresses your and Marko's comments.

            Thanks.

            bar Alexander Barkov added a comment - - edited Hi serg , Please find the latest patch here: https://github.com/MariaDB/server/commit/be66e975ecf15ae696d2e645da4aaf1a06cf0082 It addresses your and Marko's comments. Thanks.
            bar Alexander Barkov made changes -
            bar Alexander Barkov made changes -
            bar Alexander Barkov made changes -
            bar Alexander Barkov made changes -
            bar Alexander Barkov made changes -

            Non-InnoDB part of be66e975ecf is ok to push too

            serg Sergei Golubchik added a comment - Non-InnoDB part of be66e975ecf is ok to push too
            serg Sergei Golubchik made changes -
            Assignee Sergei Golubchik [ serg ] Alexander Barkov [ bar ]
            Status In Review [ 10002 ] Stalled [ 10000 ]
            julien.fritsch Julien Fritsch made changes -
            Issue Type Task [ 3 ] New Feature [ 2 ]
            bar Alexander Barkov made changes -
            Status Stalled [ 10000 ] In Testing [ 10301 ]
            bar Alexander Barkov made changes -
            Assignee Alexander Barkov [ bar ] Ramesh Sivaraman [ JIRAUSER48189 ]

            Hello ramesh,

            Can you please do testing for this task?

            The patch can be found here:
            https://github.com/MariaDB/server/commit/c9dc8e09bbec4e5eca3073e221f1e4dbbcf23873

            or using the branch URL:
            https://github.com/MariaDB/server/tree/bb-11.5-bar-MDEV-31340

            bar Alexander Barkov added a comment - Hello ramesh , Can you please do testing for this task? The patch can be found here: https://github.com/MariaDB/server/commit/c9dc8e09bbec4e5eca3073e221f1e4dbbcf23873 or using the branch URL: https://github.com/MariaDB/server/tree/bb-11.5-bar-MDEV-31340

            bar Sure.

            ramesh Ramesh Sivaraman added a comment - bar Sure.
            serg Sergei Golubchik made changes -
            Issue Type New Feature [ 2 ] Task [ 3 ]
            serg Sergei Golubchik made changes -

            bar, I've pushed a fixup commit into your branch, it's clearly named "fixup", so please rename it or squash with the main commit, whatever you prefer

            serg Sergei Golubchik added a comment - bar , I've pushed a fixup commit into your branch, it's clearly named "fixup", so please rename it or squash with the main commit, whatever you prefer
            ramesh Ramesh Sivaraman made changes -

            ok to push

            ramesh Ramesh Sivaraman added a comment - ok to push
            ramesh Ramesh Sivaraman made changes -
            Assignee Ramesh Sivaraman [ JIRAUSER48189 ] Alexander Barkov [ bar ]
            Status In Testing [ 10301 ] Stalled [ 10000 ]
            bar Alexander Barkov made changes -
            issue.field.resolutiondate 2024-04-18 18:31:12.0 2024-04-18 18:31:11.662
            bar Alexander Barkov made changes -
            Fix Version/s 11.5.1 [ 29634 ]
            Fix Version/s 11.5 [ 29506 ]
            Resolution Fixed [ 1 ]
            Status Stalled [ 10000 ] Closed [ 6 ]
            bar Alexander Barkov made changes -
            bar Alexander Barkov made changes -
            Description There are two parallel comparison systems in MariaDB collation library, implemented as virtual functions in MY_COLLATION_HANDLER:

            - Comparison according to the collation, provided by these functions
            {code:cpp}
              int (*strnncoll)(CHARSET_INFO *,
                                   const uchar *, size_t, const uchar *, size_t, my_bool);
              int (*strnncollsp)(CHARSET_INFO *,
                                     const uchar *, size_t, const uchar *, size_t);

              int (*strnncollsp_nchars)(CHARSET_INFO *,
                                            const uchar *str1, size_t len1,
                                            const uchar *str2, size_t len2,
                                            size_t nchars,
                                            uint flags);
            {code}


            - Comparison in case insensitive (but accent sensitive) style, implemented by this function:
            {code:cpp}
              int (*strcasecmp)(CHARSET_INFO *, const char *, const char *);
            {code}
            Note, accent and case sensitivity of the collation does not matter. strcasecmp() always works using accent sensitive case insensitive comparison style.

            These two parallel systems are redundant.

            Note, strcasecmp() is used mostly to compare identifiers, while the functions of the first group are used to compare data.

            Let's get rid of the second comparison system:
            1. Remove MY_COLLATION_HANDLER::strcasecmp()
            2. Introduce a new collation utf8mb4_general_as_ci. Note, it should work for the entire Unicode range U+0000..U+10FFFF.
            3. Replace all calls for:
            {code:cpp}
            system_charset_info->coll->strcasecmp()
            {code}
            to calls for
            {code:cpp}
            my_charset_utf8mb4_general_as_ci->coll->strnncoll***()
            {code}

            The change would generally be quite mechanic. However, there is one small problem: strcasecmp() accepts 0-terminated strings, while the strnncoll-alike functions accept the pointer and the length. So some refactoring will be needed. Note, Monty earlier changed many MariaDB C data types to use LEX_CSTRING (instead of just a const char pointer) to store names. So this part of the current task will switch some more C data types to LEX_CSTRING.

            h2. tolower vs toupper comparison

            Another option is to implement utf8mb3_general_as_ci which will compare upper cases (instead of lower cases).


            The difference is in a few dozen BMP characters. This script finds all those characters:
            {code:sql}
            CREATE OR REPLACE TABLE t1 (a CHAR(1) CHARACTER SET utf8mb4 COLLATE utf8mb4_uca1400_ai_ci) ENGINE=MyISAM;
            DELIMITER $$
            FOR i IN 0..0xFFFF
            DO
              INSERT INTO t1 VALUES (CHAR(i USING ucs2));
            END FOR;
            $$
            DELIMITER ;
            ALTER TABLE t1
              ADD has_casefolding INT DEFAULT (BINARY LOWER(a)<>a OR BINARY UPPER(a)<>a),
              ADD KEY(has_casefolding, a);

            CREATE OR REPLACE TABLE t21 AS
            SELECT
              HEX(t1.a) AS hex_a, HEX(t2.a) AS hex_b,
              BINARY LOWER(t1.a)=LOWER(t2.a) eq_lower,
              BINARY UPPER(t1.a)=UPPER(t2.a) AS eq_upper,
              t1.a AS a, t2.a AS b
            FROM
              t1 t1, t1 t2
            WHERE
              t1.has_casefolding=1
            AND (BINARY LOWER(t1.a)=LOWER(t2.a))<>(BINARY UPPER(t1.a)=UPPER(t2.a));

            SELECT
              HEX(CONVERT(a USING ucs2)) AS unicode_a,
              HEX(CONVERT(b USING ucs2)) AS unicode_b,
              t21.* FROM t21;
            {code}
            {noformat}
            +-----------+-----------+--------+--------+----------+----------+------+------+
            | unicode_a | unicode_b | hex_a | hex_b | eq_lower | eq_upper | a | b |
            +-----------+-----------+--------+--------+----------+----------+------+------+
            | 1E9E | 00DF | E1BA9E | C39F | 1 | 0 | ẞ | ß |
            | 0399 | 0345 | CE99 | CD85 | 0 | 1 | Ι | ͅ |
            | 03B9 | 0345 | CEB9 | CD85 | 0 | 1 | ι | ͅ |
            | 1FBE | 0345 | E1BEBE | CD85 | 0 | 1 | ι | ͅ |
            | 212B | 00C5 | E284AB | C385 | 1 | 0 | Å | Å |
            | 212B | 00E5 | E284AB | C3A5 | 1 | 0 | Å | å |
            | 00C5 | 212B | C385 | E284AB | 1 | 0 | Å | Å |
            | 00E5 | 212B | C3A5 | E284AB | 1 | 0 | å | Å |
            | 0130 | 0049 | C4B0 | 49 | 1 | 0 | İ | I |
            | 0131 | 0049 | C4B1 | 49 | 0 | 1 | ı | I |
            | 0130 | 0069 | C4B0 | 69 | 1 | 0 | İ | i |
            | 0131 | 0069 | C4B1 | 69 | 0 | 1 | ı | i |
            | 0049 | 0130 | 49 | C4B0 | 1 | 0 | I | İ |
            | 0069 | 0130 | 69 | C4B0 | 1 | 0 | i | İ |
            | 0049 | 0131 | 49 | C4B1 | 0 | 1 | I | ı |
            | 0069 | 0131 | 69 | C4B1 | 0 | 1 | i | ı |
            | 212A | 004B | E284AA | 4B | 1 | 0 | K | K |
            | 212A | 006B | E284AA | 6B | 1 | 0 | K | k |
            | 004B | 212A | 4B | E284AA | 1 | 0 | K | K |
            | 006B | 212A | 6B | E284AA | 1 | 0 | k | K |
            | 017F | 0053 | C5BF | 53 | 0 | 1 | ſ | S |
            | 017F | 0073 | C5BF | 73 | 0 | 1 | ſ | s |
            | 0053 | 017F | 53 | C5BF | 0 | 1 | S | ſ |
            | 0073 | 017F | 73 | C5BF | 0 | 1 | s | ſ |
            | 1E9B | 1E60 | E1BA9B | E1B9A0 | 0 | 1 | ẛ | Ṡ |
            | 1E9B | 1E61 | E1BA9B | E1B9A1 | 0 | 1 | ẛ | ṡ |
            | 1E60 | 1E9B | E1B9A0 | E1BA9B | 0 | 1 | Ṡ | ẛ |
            | 1E61 | 1E9B | E1B9A1 | E1BA9B | 0 | 1 | ṡ | ẛ |
            | 03D0 | 0392 | CF90 | CE92 | 0 | 1 | ϐ | Β |
            | 03D0 | 03B2 | CF90 | CEB2 | 0 | 1 | ϐ | β |
            | 0392 | 03D0 | CE92 | CF90 | 0 | 1 | Β | ϐ |
            | 03B2 | 03D0 | CEB2 | CF90 | 0 | 1 | β | ϐ |
            | 03F5 | 0395 | CFB5 | CE95 | 0 | 1 | ϵ | Ε |
            | 03F5 | 03B5 | CFB5 | CEB5 | 0 | 1 | ϵ | ε |
            | 0395 | 03F5 | CE95 | CFB5 | 0 | 1 | Ε | ϵ |
            | 03B5 | 03F5 | CEB5 | CFB5 | 0 | 1 | ε | ϵ |
            | 03D1 | 0398 | CF91 | CE98 | 0 | 1 | ϑ | Θ |
            | 03F4 | 0398 | CFB4 | CE98 | 1 | 0 | ϴ | Θ |
            | 03D1 | 03B8 | CF91 | CEB8 | 0 | 1 | ϑ | θ |
            | 03F4 | 03B8 | CFB4 | CEB8 | 1 | 0 | ϴ | θ |
            | 0398 | 03D1 | CE98 | CF91 | 0 | 1 | Θ | ϑ |
            | 03B8 | 03D1 | CEB8 | CF91 | 0 | 1 | θ | ϑ |
            | 0398 | 03F4 | CE98 | CFB4 | 1 | 0 | Θ | ϴ |
            | 03B8 | 03F4 | CEB8 | CFB4 | 1 | 0 | θ | ϴ |
            | 0345 | 0399 | CD85 | CE99 | 0 | 1 | ͅ | Ι |
            | 1FBE | 0399 | E1BEBE | CE99 | 0 | 1 | ι | Ι |
            | 0345 | 03B9 | CD85 | CEB9 | 0 | 1 | ͅ | ι |
            | 1FBE | 03B9 | E1BEBE | CEB9 | 0 | 1 | ι | ι |
            | 0345 | 1FBE | CD85 | E1BEBE | 0 | 1 | ͅ | ι |
            | 0399 | 1FBE | CE99 | E1BEBE | 0 | 1 | Ι | ι |
            | 03B9 | 1FBE | CEB9 | E1BEBE | 0 | 1 | ι | ι |
            | 03F0 | 039A | CFB0 | CE9A | 0 | 1 | ϰ | Κ |
            | 03F0 | 03BA | CFB0 | CEBA | 0 | 1 | ϰ | κ |
            | 039A | 03F0 | CE9A | CFB0 | 0 | 1 | Κ | ϰ |
            | 03BA | 03F0 | CEBA | CFB0 | 0 | 1 | κ | ϰ |
            | 039C | 00B5 | CE9C | C2B5 | 0 | 1 | Μ | µ |
            | 03BC | 00B5 | CEBC | C2B5 | 0 | 1 | μ | µ |
            | 00B5 | 039C | C2B5 | CE9C | 0 | 1 | µ | Μ |
            | 00B5 | 03BC | C2B5 | CEBC | 0 | 1 | µ | μ |
            | 03D6 | 03A0 | CF96 | CEA0 | 0 | 1 | ϖ | Π |
            | 03D6 | 03C0 | CF96 | CF80 | 0 | 1 | ϖ | π |
            | 03A0 | 03D6 | CEA0 | CF96 | 0 | 1 | Π | ϖ |
            | 03C0 | 03D6 | CF80 | CF96 | 0 | 1 | π | ϖ |
            | 03F1 | 03A1 | CFB1 | CEA1 | 0 | 1 | ϱ | Ρ |
            | 03F1 | 03C1 | CFB1 | CF81 | 0 | 1 | ϱ | ρ |
            | 03A1 | 03F1 | CEA1 | CFB1 | 0 | 1 | Ρ | ϱ |
            | 03C1 | 03F1 | CF81 | CFB1 | 0 | 1 | ρ | ϱ |
            | 03C2 | 03A3 | CF82 | CEA3 | 0 | 1 | ς | Σ |
            | 03A3 | 03C2 | CEA3 | CF82 | 0 | 1 | Σ | ς |
            | 03C3 | 03C2 | CF83 | CF82 | 0 | 1 | σ | ς |
            | 03C2 | 03C3 | CF82 | CF83 | 0 | 1 | ς | σ |
            | 03D5 | 03A6 | CF95 | CEA6 | 0 | 1 | ϕ | Φ |
            | 03D5 | 03C6 | CF95 | CF86 | 0 | 1 | ϕ | φ |
            | 03A6 | 03D5 | CEA6 | CF95 | 0 | 1 | Φ | ϕ |
            | 03C6 | 03D5 | CF86 | CF95 | 0 | 1 | φ | ϕ |
            | 2126 | 03A9 | E284A6 | CEA9 | 1 | 0 | Ω | Ω |
            | 2126 | 03C9 | E284A6 | CF89 | 1 | 0 | Ω | ω |
            | 03A9 | 2126 | CEA9 | E284A6 | 1 | 0 | Ω | Ω |
            | 03C9 | 2126 | CF89 | E284A6 | 1 | 0 | ω | Ω |
            | 1C80 | 0412 | E1B280 | D092 | 0 | 1 | ᲀ | В |
            | 1C80 | 0432 | E1B280 | D0B2 | 0 | 1 | ᲀ | в |
            | 0412 | 1C80 | D092 | E1B280 | 0 | 1 | В | ᲀ |
            | 0432 | 1C80 | D0B2 | E1B280 | 0 | 1 | в | ᲀ |
            | 1C81 | 0414 | E1B281 | D094 | 0 | 1 | ᲁ | Д |
            | 1C81 | 0434 | E1B281 | D0B4 | 0 | 1 | ᲁ | д |
            | 0414 | 1C81 | D094 | E1B281 | 0 | 1 | Д | ᲁ |
            | 0434 | 1C81 | D0B4 | E1B281 | 0 | 1 | д | ᲁ |
            | 1C82 | 041E | E1B282 | D09E | 0 | 1 | ᲂ | О |
            | 1C82 | 043E | E1B282 | D0BE | 0 | 1 | ᲂ | о |
            | 041E | 1C82 | D09E | E1B282 | 0 | 1 | О | ᲂ |
            | 043E | 1C82 | D0BE | E1B282 | 0 | 1 | о | ᲂ |
            | 1C83 | 0421 | E1B283 | D0A1 | 0 | 1 | ᲃ | С |
            | 1C83 | 0441 | E1B283 | D181 | 0 | 1 | ᲃ | с |
            | 0421 | 1C83 | D0A1 | E1B283 | 0 | 1 | С | ᲃ |
            | 0441 | 1C83 | D181 | E1B283 | 0 | 1 | с | ᲃ |
            | 1C84 | 0422 | E1B284 | D0A2 | 0 | 1 | ᲄ | Т |
            | 1C85 | 0422 | E1B285 | D0A2 | 0 | 1 | ᲅ | Т |
            | 1C84 | 0442 | E1B284 | D182 | 0 | 1 | ᲄ | т |
            | 1C85 | 0442 | E1B285 | D182 | 0 | 1 | ᲅ | т |
            | 0422 | 1C84 | D0A2 | E1B284 | 0 | 1 | Т | ᲄ |
            | 0442 | 1C84 | D182 | E1B284 | 0 | 1 | т | ᲄ |
            | 1C85 | 1C84 | E1B285 | E1B284 | 0 | 1 | ᲅ | ᲄ |
            | 0422 | 1C85 | D0A2 | E1B285 | 0 | 1 | Т | ᲅ |
            | 0442 | 1C85 | D182 | E1B285 | 0 | 1 | т | ᲅ |
            | 1C84 | 1C85 | E1B284 | E1B285 | 0 | 1 | ᲄ | ᲅ |
            | A64A | 1C88 | EA998A | E1B288 | 0 | 1 | Ꙋ | ᲈ |
            | A64B | 1C88 | EA998B | E1B288 | 0 | 1 | ꙋ | ᲈ |
            | 1C88 | A64A | E1B288 | EA998A | 0 | 1 | ᲈ | Ꙋ |
            | 1C88 | A64B | E1B288 | EA998B | 0 | 1 | ᲈ | ꙋ |
            | 1C86 | 042A | E1B286 | D0AA | 0 | 1 | ᲆ | Ъ |
            | 1C86 | 044A | E1B286 | D18A | 0 | 1 | ᲆ | ъ |
            | 042A | 1C86 | D0AA | E1B286 | 0 | 1 | Ъ | ᲆ |
            | 044A | 1C86 | D18A | E1B286 | 0 | 1 | ъ | ᲆ |
            | 1C87 | 0462 | E1B287 | D1A2 | 0 | 1 | ᲇ | Ѣ |
            | 1C87 | 0463 | E1B287 | D1A3 | 0 | 1 | ᲇ | ѣ |
            | 0462 | 1C87 | D1A2 | E1B287 | 0 | 1 | Ѣ | ᲇ |
            | 0463 | 1C87 | D1A3 | E1B287 | 0 | 1 | ѣ | ᲇ |
            +-----------+-----------+--------+--------+----------+----------+------+------+
            {noformat}


            Note, toupper comparison considers more pairs as equal than tolower comparison:
            {code:sql}
            SELECT SUM(eq_lower), SUM(eq_upper) FROM t21;
            {code}
            {noformat}
            +---------------+---------------+
            | SUM(eq_lower) | SUM(eq_upper) |
            +---------------+---------------+
            | 21 | 96 |
            +---------------+---------------+
            {noformat}

            h2. A more compact table with distinct pairs (the character with a smaller code point is on the left)
            {code:sql}
            SELECT
              HEX(CONVERT(a USING ucs2)) AS unicode_a,
              HEX(CONVERT(b USING ucs2)) AS unicode_b,
              HEX(a), HEX(b),
              BINARY LOWER(a)=LOWER(b) AS eq_lower,
              BINARY UPPER(a)=UPPER(b) AS eq_upper,
              a,b
            FROM
            (
              SELECT DISTINCT IF(BINARY a<b,a,b) AS a,IF(binary a>=b,a,b) AS b from t21
            ) d1
            ORDER BY eq_lower, unicode_a, unicode_b;
            {code}
            {noformat}
            +-----------+-----------+--------+--------+----------+----------+-----+-----+
            | unicode_a | unicode_b | HEX(a) | HEX(b) | eq_lower | eq_upper | a | b |
            +-----------+-----------+--------+--------+----------+----------+-----+-----+
            | 0049 | 0131 | 49 | C4B1 | 0 | 1 | I | ı |
            | 0053 | 017F | 53 | C5BF | 0 | 1 | S | ſ |
            | 00B5 | 039C | C2B5 | CE9C | 0 | 1 | µ | Μ |
            | 0345 | 0399 | CD85 | CE99 | 0 | 1 | ͅ | Ι |
            | 0392 | 03D0 | CE92 | CF90 | 0 | 1 | Β | ϐ |
            | 0395 | 03F5 | CE95 | CFB5 | 0 | 1 | Ε | ϵ |
            | 0398 | 03D1 | CE98 | CF91 | 0 | 1 | Θ | ϑ |
            | 0399 | 1FBE | CE99 | E1BEBE | 0 | 1 | Ι | ι |
            | 039A | 03F0 | CE9A | CFB0 | 0 | 1 | Κ | ϰ |
            | 03A0 | 03D6 | CEA0 | CF96 | 0 | 1 | Π | ϖ |
            | 03A1 | 03F1 | CEA1 | CFB1 | 0 | 1 | Ρ | ϱ |
            | 03A3 | 03C2 | CEA3 | CF82 | 0 | 1 | Σ | ς |
            | 03A6 | 03D5 | CEA6 | CF95 | 0 | 1 | Φ | ϕ |
            | 0412 | 1C80 | D092 | E1B280 | 0 | 1 | В | ᲀ |
            | 0414 | 1C81 | D094 | E1B281 | 0 | 1 | Д | ᲁ |
            | 041E | 1C82 | D09E | E1B282 | 0 | 1 | О | ᲂ |
            | 0421 | 1C83 | D0A1 | E1B283 | 0 | 1 | С | ᲃ |
            | 0422 | 1C84 | D0A2 | E1B284 | 0 | 1 | Т | ᲄ |
            | 042A | 1C86 | D0AA | E1B286 | 0 | 1 | Ъ | ᲆ |
            | 0462 | 1C87 | D1A2 | E1B287 | 0 | 1 | Ѣ | ᲇ |
            | 1C88 | A64A | E1B288 | EA998A | 0 | 1 | ᲈ | Ꙋ |
            | 0049 | 0130 | 49 | C4B0 | 1 | 0 | I | İ |
            | 004B | 212A | 4B | E284AA | 1 | 0 | K | K |
            | 00C5 | 212B | C385 | E284AB | 1 | 0 | Å | Å |
            | 00DF | 1E9E | C39F | E1BA9E | 1 | 0 | ß | ẞ |
            | 03A9 | 2126 | CEA9 | E284A6 | 1 | 0 | Ω | Ω |
            +-----------+-----------+--------+--------+----------+----------+-----+-----+
            26 rows in set (0.005 sec)
            {noformat}

            h2. Cons of toupper vs tolower comparison
            - The tolower variant will compare exactly like strcasecmp() did
            - The toupper variant will compare close to how utf8mb3_general_ci works
            There are two parallel comparison systems in MariaDB collation library, implemented as virtual functions in MY_COLLATION_HANDLER:

            - Comparison according to the collation, provided by these functions
            {code:cpp}
              int (*strnncoll)(CHARSET_INFO *,
                                   const uchar *, size_t, const uchar *, size_t, my_bool);
              int (*strnncollsp)(CHARSET_INFO *,
                                     const uchar *, size_t, const uchar *, size_t);

              int (*strnncollsp_nchars)(CHARSET_INFO *,
                                            const uchar *str1, size_t len1,
                                            const uchar *str2, size_t len2,
                                            size_t nchars,
                                            uint flags);
            {code}


            - Comparison in case insensitive (but accent sensitive) style, implemented by this function:
            {code:cpp}
              int (*strcasecmp)(CHARSET_INFO *, const char *, const char *);
            {code}
            Note, accent and case sensitivity of the collation does not matter. strcasecmp() always works using accent sensitive case insensitive comparison style.

            These two parallel systems are redundant.

            Note, strcasecmp() is used mostly to compare identifiers, while the functions of the first group are used to compare data.

            Let's get rid of the second comparison system:
            1. Remove MY_COLLATION_HANDLER::strcasecmp()
            2. Introduce a new collation utf8mb4_general1400_as_ci. Note, it should work for the entire Unicode range U+0000..U+10FFFF.
            3. Replace all calls for:
            {code:cpp}
            system_charset_info->coll->strcasecmp()
            {code}
            to calls for
            {code:cpp}
            my_charset_utf8mb4_general_as_ci->coll->strnncoll***()
            {code}

            The change would generally be quite mechanic. However, there is one small problem: strcasecmp() accepts 0-terminated strings, while the strnncoll-alike functions accept the pointer and the length. So some refactoring will be needed. Note, Monty earlier changed many MariaDB C data types to use LEX_CSTRING (instead of just a const char pointer) to store names. So this part of the current task will switch some more C data types to LEX_CSTRING.

            h2. tolower vs toupper comparison

            Another option is to implement utf8mb3_general_as_ci which will compare upper cases (instead of lower cases).


            The difference is in a few dozen BMP characters. This script finds all those characters:
            {code:sql}
            CREATE OR REPLACE TABLE t1 (a CHAR(1) CHARACTER SET utf8mb4 COLLATE utf8mb4_uca1400_ai_ci) ENGINE=MyISAM;
            DELIMITER $$
            FOR i IN 0..0xFFFF
            DO
              INSERT INTO t1 VALUES (CHAR(i USING ucs2));
            END FOR;
            $$
            DELIMITER ;
            ALTER TABLE t1
              ADD has_casefolding INT DEFAULT (BINARY LOWER(a)<>a OR BINARY UPPER(a)<>a),
              ADD KEY(has_casefolding, a);

            CREATE OR REPLACE TABLE t21 AS
            SELECT
              HEX(t1.a) AS hex_a, HEX(t2.a) AS hex_b,
              BINARY LOWER(t1.a)=LOWER(t2.a) eq_lower,
              BINARY UPPER(t1.a)=UPPER(t2.a) AS eq_upper,
              t1.a AS a, t2.a AS b
            FROM
              t1 t1, t1 t2
            WHERE
              t1.has_casefolding=1
            AND (BINARY LOWER(t1.a)=LOWER(t2.a))<>(BINARY UPPER(t1.a)=UPPER(t2.a));

            SELECT
              HEX(CONVERT(a USING ucs2)) AS unicode_a,
              HEX(CONVERT(b USING ucs2)) AS unicode_b,
              t21.* FROM t21;
            {code}
            {noformat}
            +-----------+-----------+--------+--------+----------+----------+------+------+
            | unicode_a | unicode_b | hex_a | hex_b | eq_lower | eq_upper | a | b |
            +-----------+-----------+--------+--------+----------+----------+------+------+
            | 1E9E | 00DF | E1BA9E | C39F | 1 | 0 | ẞ | ß |
            | 0399 | 0345 | CE99 | CD85 | 0 | 1 | Ι | ͅ |
            | 03B9 | 0345 | CEB9 | CD85 | 0 | 1 | ι | ͅ |
            | 1FBE | 0345 | E1BEBE | CD85 | 0 | 1 | ι | ͅ |
            | 212B | 00C5 | E284AB | C385 | 1 | 0 | Å | Å |
            | 212B | 00E5 | E284AB | C3A5 | 1 | 0 | Å | å |
            | 00C5 | 212B | C385 | E284AB | 1 | 0 | Å | Å |
            | 00E5 | 212B | C3A5 | E284AB | 1 | 0 | å | Å |
            | 0130 | 0049 | C4B0 | 49 | 1 | 0 | İ | I |
            | 0131 | 0049 | C4B1 | 49 | 0 | 1 | ı | I |
            | 0130 | 0069 | C4B0 | 69 | 1 | 0 | İ | i |
            | 0131 | 0069 | C4B1 | 69 | 0 | 1 | ı | i |
            | 0049 | 0130 | 49 | C4B0 | 1 | 0 | I | İ |
            | 0069 | 0130 | 69 | C4B0 | 1 | 0 | i | İ |
            | 0049 | 0131 | 49 | C4B1 | 0 | 1 | I | ı |
            | 0069 | 0131 | 69 | C4B1 | 0 | 1 | i | ı |
            | 212A | 004B | E284AA | 4B | 1 | 0 | K | K |
            | 212A | 006B | E284AA | 6B | 1 | 0 | K | k |
            | 004B | 212A | 4B | E284AA | 1 | 0 | K | K |
            | 006B | 212A | 6B | E284AA | 1 | 0 | k | K |
            | 017F | 0053 | C5BF | 53 | 0 | 1 | ſ | S |
            | 017F | 0073 | C5BF | 73 | 0 | 1 | ſ | s |
            | 0053 | 017F | 53 | C5BF | 0 | 1 | S | ſ |
            | 0073 | 017F | 73 | C5BF | 0 | 1 | s | ſ |
            | 1E9B | 1E60 | E1BA9B | E1B9A0 | 0 | 1 | ẛ | Ṡ |
            | 1E9B | 1E61 | E1BA9B | E1B9A1 | 0 | 1 | ẛ | ṡ |
            | 1E60 | 1E9B | E1B9A0 | E1BA9B | 0 | 1 | Ṡ | ẛ |
            | 1E61 | 1E9B | E1B9A1 | E1BA9B | 0 | 1 | ṡ | ẛ |
            | 03D0 | 0392 | CF90 | CE92 | 0 | 1 | ϐ | Β |
            | 03D0 | 03B2 | CF90 | CEB2 | 0 | 1 | ϐ | β |
            | 0392 | 03D0 | CE92 | CF90 | 0 | 1 | Β | ϐ |
            | 03B2 | 03D0 | CEB2 | CF90 | 0 | 1 | β | ϐ |
            | 03F5 | 0395 | CFB5 | CE95 | 0 | 1 | ϵ | Ε |
            | 03F5 | 03B5 | CFB5 | CEB5 | 0 | 1 | ϵ | ε |
            | 0395 | 03F5 | CE95 | CFB5 | 0 | 1 | Ε | ϵ |
            | 03B5 | 03F5 | CEB5 | CFB5 | 0 | 1 | ε | ϵ |
            | 03D1 | 0398 | CF91 | CE98 | 0 | 1 | ϑ | Θ |
            | 03F4 | 0398 | CFB4 | CE98 | 1 | 0 | ϴ | Θ |
            | 03D1 | 03B8 | CF91 | CEB8 | 0 | 1 | ϑ | θ |
            | 03F4 | 03B8 | CFB4 | CEB8 | 1 | 0 | ϴ | θ |
            | 0398 | 03D1 | CE98 | CF91 | 0 | 1 | Θ | ϑ |
            | 03B8 | 03D1 | CEB8 | CF91 | 0 | 1 | θ | ϑ |
            | 0398 | 03F4 | CE98 | CFB4 | 1 | 0 | Θ | ϴ |
            | 03B8 | 03F4 | CEB8 | CFB4 | 1 | 0 | θ | ϴ |
            | 0345 | 0399 | CD85 | CE99 | 0 | 1 | ͅ | Ι |
            | 1FBE | 0399 | E1BEBE | CE99 | 0 | 1 | ι | Ι |
            | 0345 | 03B9 | CD85 | CEB9 | 0 | 1 | ͅ | ι |
            | 1FBE | 03B9 | E1BEBE | CEB9 | 0 | 1 | ι | ι |
            | 0345 | 1FBE | CD85 | E1BEBE | 0 | 1 | ͅ | ι |
            | 0399 | 1FBE | CE99 | E1BEBE | 0 | 1 | Ι | ι |
            | 03B9 | 1FBE | CEB9 | E1BEBE | 0 | 1 | ι | ι |
            | 03F0 | 039A | CFB0 | CE9A | 0 | 1 | ϰ | Κ |
            | 03F0 | 03BA | CFB0 | CEBA | 0 | 1 | ϰ | κ |
            | 039A | 03F0 | CE9A | CFB0 | 0 | 1 | Κ | ϰ |
            | 03BA | 03F0 | CEBA | CFB0 | 0 | 1 | κ | ϰ |
            | 039C | 00B5 | CE9C | C2B5 | 0 | 1 | Μ | µ |
            | 03BC | 00B5 | CEBC | C2B5 | 0 | 1 | μ | µ |
            | 00B5 | 039C | C2B5 | CE9C | 0 | 1 | µ | Μ |
            | 00B5 | 03BC | C2B5 | CEBC | 0 | 1 | µ | μ |
            | 03D6 | 03A0 | CF96 | CEA0 | 0 | 1 | ϖ | Π |
            | 03D6 | 03C0 | CF96 | CF80 | 0 | 1 | ϖ | π |
            | 03A0 | 03D6 | CEA0 | CF96 | 0 | 1 | Π | ϖ |
            | 03C0 | 03D6 | CF80 | CF96 | 0 | 1 | π | ϖ |
            | 03F1 | 03A1 | CFB1 | CEA1 | 0 | 1 | ϱ | Ρ |
            | 03F1 | 03C1 | CFB1 | CF81 | 0 | 1 | ϱ | ρ |
            | 03A1 | 03F1 | CEA1 | CFB1 | 0 | 1 | Ρ | ϱ |
            | 03C1 | 03F1 | CF81 | CFB1 | 0 | 1 | ρ | ϱ |
            | 03C2 | 03A3 | CF82 | CEA3 | 0 | 1 | ς | Σ |
            | 03A3 | 03C2 | CEA3 | CF82 | 0 | 1 | Σ | ς |
            | 03C3 | 03C2 | CF83 | CF82 | 0 | 1 | σ | ς |
            | 03C2 | 03C3 | CF82 | CF83 | 0 | 1 | ς | σ |
            | 03D5 | 03A6 | CF95 | CEA6 | 0 | 1 | ϕ | Φ |
            | 03D5 | 03C6 | CF95 | CF86 | 0 | 1 | ϕ | φ |
            | 03A6 | 03D5 | CEA6 | CF95 | 0 | 1 | Φ | ϕ |
            | 03C6 | 03D5 | CF86 | CF95 | 0 | 1 | φ | ϕ |
            | 2126 | 03A9 | E284A6 | CEA9 | 1 | 0 | Ω | Ω |
            | 2126 | 03C9 | E284A6 | CF89 | 1 | 0 | Ω | ω |
            | 03A9 | 2126 | CEA9 | E284A6 | 1 | 0 | Ω | Ω |
            | 03C9 | 2126 | CF89 | E284A6 | 1 | 0 | ω | Ω |
            | 1C80 | 0412 | E1B280 | D092 | 0 | 1 | ᲀ | В |
            | 1C80 | 0432 | E1B280 | D0B2 | 0 | 1 | ᲀ | в |
            | 0412 | 1C80 | D092 | E1B280 | 0 | 1 | В | ᲀ |
            | 0432 | 1C80 | D0B2 | E1B280 | 0 | 1 | в | ᲀ |
            | 1C81 | 0414 | E1B281 | D094 | 0 | 1 | ᲁ | Д |
            | 1C81 | 0434 | E1B281 | D0B4 | 0 | 1 | ᲁ | д |
            | 0414 | 1C81 | D094 | E1B281 | 0 | 1 | Д | ᲁ |
            | 0434 | 1C81 | D0B4 | E1B281 | 0 | 1 | д | ᲁ |
            | 1C82 | 041E | E1B282 | D09E | 0 | 1 | ᲂ | О |
            | 1C82 | 043E | E1B282 | D0BE | 0 | 1 | ᲂ | о |
            | 041E | 1C82 | D09E | E1B282 | 0 | 1 | О | ᲂ |
            | 043E | 1C82 | D0BE | E1B282 | 0 | 1 | о | ᲂ |
            | 1C83 | 0421 | E1B283 | D0A1 | 0 | 1 | ᲃ | С |
            | 1C83 | 0441 | E1B283 | D181 | 0 | 1 | ᲃ | с |
            | 0421 | 1C83 | D0A1 | E1B283 | 0 | 1 | С | ᲃ |
            | 0441 | 1C83 | D181 | E1B283 | 0 | 1 | с | ᲃ |
            | 1C84 | 0422 | E1B284 | D0A2 | 0 | 1 | ᲄ | Т |
            | 1C85 | 0422 | E1B285 | D0A2 | 0 | 1 | ᲅ | Т |
            | 1C84 | 0442 | E1B284 | D182 | 0 | 1 | ᲄ | т |
            | 1C85 | 0442 | E1B285 | D182 | 0 | 1 | ᲅ | т |
            | 0422 | 1C84 | D0A2 | E1B284 | 0 | 1 | Т | ᲄ |
            | 0442 | 1C84 | D182 | E1B284 | 0 | 1 | т | ᲄ |
            | 1C85 | 1C84 | E1B285 | E1B284 | 0 | 1 | ᲅ | ᲄ |
            | 0422 | 1C85 | D0A2 | E1B285 | 0 | 1 | Т | ᲅ |
            | 0442 | 1C85 | D182 | E1B285 | 0 | 1 | т | ᲅ |
            | 1C84 | 1C85 | E1B284 | E1B285 | 0 | 1 | ᲄ | ᲅ |
            | A64A | 1C88 | EA998A | E1B288 | 0 | 1 | Ꙋ | ᲈ |
            | A64B | 1C88 | EA998B | E1B288 | 0 | 1 | ꙋ | ᲈ |
            | 1C88 | A64A | E1B288 | EA998A | 0 | 1 | ᲈ | Ꙋ |
            | 1C88 | A64B | E1B288 | EA998B | 0 | 1 | ᲈ | ꙋ |
            | 1C86 | 042A | E1B286 | D0AA | 0 | 1 | ᲆ | Ъ |
            | 1C86 | 044A | E1B286 | D18A | 0 | 1 | ᲆ | ъ |
            | 042A | 1C86 | D0AA | E1B286 | 0 | 1 | Ъ | ᲆ |
            | 044A | 1C86 | D18A | E1B286 | 0 | 1 | ъ | ᲆ |
            | 1C87 | 0462 | E1B287 | D1A2 | 0 | 1 | ᲇ | Ѣ |
            | 1C87 | 0463 | E1B287 | D1A3 | 0 | 1 | ᲇ | ѣ |
            | 0462 | 1C87 | D1A2 | E1B287 | 0 | 1 | Ѣ | ᲇ |
            | 0463 | 1C87 | D1A3 | E1B287 | 0 | 1 | ѣ | ᲇ |
            +-----------+-----------+--------+--------+----------+----------+------+------+
            {noformat}


            Note, toupper comparison considers more pairs as equal than tolower comparison:
            {code:sql}
            SELECT SUM(eq_lower), SUM(eq_upper) FROM t21;
            {code}
            {noformat}
            +---------------+---------------+
            | SUM(eq_lower) | SUM(eq_upper) |
            +---------------+---------------+
            | 21 | 96 |
            +---------------+---------------+
            {noformat}

            h2. A more compact table with distinct pairs (the character with a smaller code point is on the left)
            {code:sql}
            SELECT
              HEX(CONVERT(a USING ucs2)) AS unicode_a,
              HEX(CONVERT(b USING ucs2)) AS unicode_b,
              HEX(a), HEX(b),
              BINARY LOWER(a)=LOWER(b) AS eq_lower,
              BINARY UPPER(a)=UPPER(b) AS eq_upper,
              a,b
            FROM
            (
              SELECT DISTINCT IF(BINARY a<b,a,b) AS a,IF(binary a>=b,a,b) AS b from t21
            ) d1
            ORDER BY eq_lower, unicode_a, unicode_b;
            {code}
            {noformat}
            +-----------+-----------+--------+--------+----------+----------+-----+-----+
            | unicode_a | unicode_b | HEX(a) | HEX(b) | eq_lower | eq_upper | a | b |
            +-----------+-----------+--------+--------+----------+----------+-----+-----+
            | 0049 | 0131 | 49 | C4B1 | 0 | 1 | I | ı |
            | 0053 | 017F | 53 | C5BF | 0 | 1 | S | ſ |
            | 00B5 | 039C | C2B5 | CE9C | 0 | 1 | µ | Μ |
            | 0345 | 0399 | CD85 | CE99 | 0 | 1 | ͅ | Ι |
            | 0392 | 03D0 | CE92 | CF90 | 0 | 1 | Β | ϐ |
            | 0395 | 03F5 | CE95 | CFB5 | 0 | 1 | Ε | ϵ |
            | 0398 | 03D1 | CE98 | CF91 | 0 | 1 | Θ | ϑ |
            | 0399 | 1FBE | CE99 | E1BEBE | 0 | 1 | Ι | ι |
            | 039A | 03F0 | CE9A | CFB0 | 0 | 1 | Κ | ϰ |
            | 03A0 | 03D6 | CEA0 | CF96 | 0 | 1 | Π | ϖ |
            | 03A1 | 03F1 | CEA1 | CFB1 | 0 | 1 | Ρ | ϱ |
            | 03A3 | 03C2 | CEA3 | CF82 | 0 | 1 | Σ | ς |
            | 03A6 | 03D5 | CEA6 | CF95 | 0 | 1 | Φ | ϕ |
            | 0412 | 1C80 | D092 | E1B280 | 0 | 1 | В | ᲀ |
            | 0414 | 1C81 | D094 | E1B281 | 0 | 1 | Д | ᲁ |
            | 041E | 1C82 | D09E | E1B282 | 0 | 1 | О | ᲂ |
            | 0421 | 1C83 | D0A1 | E1B283 | 0 | 1 | С | ᲃ |
            | 0422 | 1C84 | D0A2 | E1B284 | 0 | 1 | Т | ᲄ |
            | 042A | 1C86 | D0AA | E1B286 | 0 | 1 | Ъ | ᲆ |
            | 0462 | 1C87 | D1A2 | E1B287 | 0 | 1 | Ѣ | ᲇ |
            | 1C88 | A64A | E1B288 | EA998A | 0 | 1 | ᲈ | Ꙋ |
            | 0049 | 0130 | 49 | C4B0 | 1 | 0 | I | İ |
            | 004B | 212A | 4B | E284AA | 1 | 0 | K | K |
            | 00C5 | 212B | C385 | E284AB | 1 | 0 | Å | Å |
            | 00DF | 1E9E | C39F | E1BA9E | 1 | 0 | ß | ẞ |
            | 03A9 | 2126 | CEA9 | E284A6 | 1 | 0 | Ω | Ω |
            +-----------+-----------+--------+--------+----------+----------+-----+-----+
            26 rows in set (0.005 sec)
            {noformat}

            h2. Cons of toupper vs tolower comparison
            - The tolower variant will compare exactly like strcasecmp() did
            - The toupper variant will compare close to how utf8mb3_general_ci works
            bar Alexander Barkov made changes -
            Description There are two parallel comparison systems in MariaDB collation library, implemented as virtual functions in MY_COLLATION_HANDLER:

            - Comparison according to the collation, provided by these functions
            {code:cpp}
              int (*strnncoll)(CHARSET_INFO *,
                                   const uchar *, size_t, const uchar *, size_t, my_bool);
              int (*strnncollsp)(CHARSET_INFO *,
                                     const uchar *, size_t, const uchar *, size_t);

              int (*strnncollsp_nchars)(CHARSET_INFO *,
                                            const uchar *str1, size_t len1,
                                            const uchar *str2, size_t len2,
                                            size_t nchars,
                                            uint flags);
            {code}


            - Comparison in case insensitive (but accent sensitive) style, implemented by this function:
            {code:cpp}
              int (*strcasecmp)(CHARSET_INFO *, const char *, const char *);
            {code}
            Note, accent and case sensitivity of the collation does not matter. strcasecmp() always works using accent sensitive case insensitive comparison style.

            These two parallel systems are redundant.

            Note, strcasecmp() is used mostly to compare identifiers, while the functions of the first group are used to compare data.

            Let's get rid of the second comparison system:
            1. Remove MY_COLLATION_HANDLER::strcasecmp()
            2. Introduce a new collation utf8mb4_general1400_as_ci. Note, it should work for the entire Unicode range U+0000..U+10FFFF.
            3. Replace all calls for:
            {code:cpp}
            system_charset_info->coll->strcasecmp()
            {code}
            to calls for
            {code:cpp}
            my_charset_utf8mb4_general_as_ci->coll->strnncoll***()
            {code}

            The change would generally be quite mechanic. However, there is one small problem: strcasecmp() accepts 0-terminated strings, while the strnncoll-alike functions accept the pointer and the length. So some refactoring will be needed. Note, Monty earlier changed many MariaDB C data types to use LEX_CSTRING (instead of just a const char pointer) to store names. So this part of the current task will switch some more C data types to LEX_CSTRING.

            h2. tolower vs toupper comparison

            Another option is to implement utf8mb3_general_as_ci which will compare upper cases (instead of lower cases).


            The difference is in a few dozen BMP characters. This script finds all those characters:
            {code:sql}
            CREATE OR REPLACE TABLE t1 (a CHAR(1) CHARACTER SET utf8mb4 COLLATE utf8mb4_uca1400_ai_ci) ENGINE=MyISAM;
            DELIMITER $$
            FOR i IN 0..0xFFFF
            DO
              INSERT INTO t1 VALUES (CHAR(i USING ucs2));
            END FOR;
            $$
            DELIMITER ;
            ALTER TABLE t1
              ADD has_casefolding INT DEFAULT (BINARY LOWER(a)<>a OR BINARY UPPER(a)<>a),
              ADD KEY(has_casefolding, a);

            CREATE OR REPLACE TABLE t21 AS
            SELECT
              HEX(t1.a) AS hex_a, HEX(t2.a) AS hex_b,
              BINARY LOWER(t1.a)=LOWER(t2.a) eq_lower,
              BINARY UPPER(t1.a)=UPPER(t2.a) AS eq_upper,
              t1.a AS a, t2.a AS b
            FROM
              t1 t1, t1 t2
            WHERE
              t1.has_casefolding=1
            AND (BINARY LOWER(t1.a)=LOWER(t2.a))<>(BINARY UPPER(t1.a)=UPPER(t2.a));

            SELECT
              HEX(CONVERT(a USING ucs2)) AS unicode_a,
              HEX(CONVERT(b USING ucs2)) AS unicode_b,
              t21.* FROM t21;
            {code}
            {noformat}
            +-----------+-----------+--------+--------+----------+----------+------+------+
            | unicode_a | unicode_b | hex_a | hex_b | eq_lower | eq_upper | a | b |
            +-----------+-----------+--------+--------+----------+----------+------+------+
            | 1E9E | 00DF | E1BA9E | C39F | 1 | 0 | ẞ | ß |
            | 0399 | 0345 | CE99 | CD85 | 0 | 1 | Ι | ͅ |
            | 03B9 | 0345 | CEB9 | CD85 | 0 | 1 | ι | ͅ |
            | 1FBE | 0345 | E1BEBE | CD85 | 0 | 1 | ι | ͅ |
            | 212B | 00C5 | E284AB | C385 | 1 | 0 | Å | Å |
            | 212B | 00E5 | E284AB | C3A5 | 1 | 0 | Å | å |
            | 00C5 | 212B | C385 | E284AB | 1 | 0 | Å | Å |
            | 00E5 | 212B | C3A5 | E284AB | 1 | 0 | å | Å |
            | 0130 | 0049 | C4B0 | 49 | 1 | 0 | İ | I |
            | 0131 | 0049 | C4B1 | 49 | 0 | 1 | ı | I |
            | 0130 | 0069 | C4B0 | 69 | 1 | 0 | İ | i |
            | 0131 | 0069 | C4B1 | 69 | 0 | 1 | ı | i |
            | 0049 | 0130 | 49 | C4B0 | 1 | 0 | I | İ |
            | 0069 | 0130 | 69 | C4B0 | 1 | 0 | i | İ |
            | 0049 | 0131 | 49 | C4B1 | 0 | 1 | I | ı |
            | 0069 | 0131 | 69 | C4B1 | 0 | 1 | i | ı |
            | 212A | 004B | E284AA | 4B | 1 | 0 | K | K |
            | 212A | 006B | E284AA | 6B | 1 | 0 | K | k |
            | 004B | 212A | 4B | E284AA | 1 | 0 | K | K |
            | 006B | 212A | 6B | E284AA | 1 | 0 | k | K |
            | 017F | 0053 | C5BF | 53 | 0 | 1 | ſ | S |
            | 017F | 0073 | C5BF | 73 | 0 | 1 | ſ | s |
            | 0053 | 017F | 53 | C5BF | 0 | 1 | S | ſ |
            | 0073 | 017F | 73 | C5BF | 0 | 1 | s | ſ |
            | 1E9B | 1E60 | E1BA9B | E1B9A0 | 0 | 1 | ẛ | Ṡ |
            | 1E9B | 1E61 | E1BA9B | E1B9A1 | 0 | 1 | ẛ | ṡ |
            | 1E60 | 1E9B | E1B9A0 | E1BA9B | 0 | 1 | Ṡ | ẛ |
            | 1E61 | 1E9B | E1B9A1 | E1BA9B | 0 | 1 | ṡ | ẛ |
            | 03D0 | 0392 | CF90 | CE92 | 0 | 1 | ϐ | Β |
            | 03D0 | 03B2 | CF90 | CEB2 | 0 | 1 | ϐ | β |
            | 0392 | 03D0 | CE92 | CF90 | 0 | 1 | Β | ϐ |
            | 03B2 | 03D0 | CEB2 | CF90 | 0 | 1 | β | ϐ |
            | 03F5 | 0395 | CFB5 | CE95 | 0 | 1 | ϵ | Ε |
            | 03F5 | 03B5 | CFB5 | CEB5 | 0 | 1 | ϵ | ε |
            | 0395 | 03F5 | CE95 | CFB5 | 0 | 1 | Ε | ϵ |
            | 03B5 | 03F5 | CEB5 | CFB5 | 0 | 1 | ε | ϵ |
            | 03D1 | 0398 | CF91 | CE98 | 0 | 1 | ϑ | Θ |
            | 03F4 | 0398 | CFB4 | CE98 | 1 | 0 | ϴ | Θ |
            | 03D1 | 03B8 | CF91 | CEB8 | 0 | 1 | ϑ | θ |
            | 03F4 | 03B8 | CFB4 | CEB8 | 1 | 0 | ϴ | θ |
            | 0398 | 03D1 | CE98 | CF91 | 0 | 1 | Θ | ϑ |
            | 03B8 | 03D1 | CEB8 | CF91 | 0 | 1 | θ | ϑ |
            | 0398 | 03F4 | CE98 | CFB4 | 1 | 0 | Θ | ϴ |
            | 03B8 | 03F4 | CEB8 | CFB4 | 1 | 0 | θ | ϴ |
            | 0345 | 0399 | CD85 | CE99 | 0 | 1 | ͅ | Ι |
            | 1FBE | 0399 | E1BEBE | CE99 | 0 | 1 | ι | Ι |
            | 0345 | 03B9 | CD85 | CEB9 | 0 | 1 | ͅ | ι |
            | 1FBE | 03B9 | E1BEBE | CEB9 | 0 | 1 | ι | ι |
            | 0345 | 1FBE | CD85 | E1BEBE | 0 | 1 | ͅ | ι |
            | 0399 | 1FBE | CE99 | E1BEBE | 0 | 1 | Ι | ι |
            | 03B9 | 1FBE | CEB9 | E1BEBE | 0 | 1 | ι | ι |
            | 03F0 | 039A | CFB0 | CE9A | 0 | 1 | ϰ | Κ |
            | 03F0 | 03BA | CFB0 | CEBA | 0 | 1 | ϰ | κ |
            | 039A | 03F0 | CE9A | CFB0 | 0 | 1 | Κ | ϰ |
            | 03BA | 03F0 | CEBA | CFB0 | 0 | 1 | κ | ϰ |
            | 039C | 00B5 | CE9C | C2B5 | 0 | 1 | Μ | µ |
            | 03BC | 00B5 | CEBC | C2B5 | 0 | 1 | μ | µ |
            | 00B5 | 039C | C2B5 | CE9C | 0 | 1 | µ | Μ |
            | 00B5 | 03BC | C2B5 | CEBC | 0 | 1 | µ | μ |
            | 03D6 | 03A0 | CF96 | CEA0 | 0 | 1 | ϖ | Π |
            | 03D6 | 03C0 | CF96 | CF80 | 0 | 1 | ϖ | π |
            | 03A0 | 03D6 | CEA0 | CF96 | 0 | 1 | Π | ϖ |
            | 03C0 | 03D6 | CF80 | CF96 | 0 | 1 | π | ϖ |
            | 03F1 | 03A1 | CFB1 | CEA1 | 0 | 1 | ϱ | Ρ |
            | 03F1 | 03C1 | CFB1 | CF81 | 0 | 1 | ϱ | ρ |
            | 03A1 | 03F1 | CEA1 | CFB1 | 0 | 1 | Ρ | ϱ |
            | 03C1 | 03F1 | CF81 | CFB1 | 0 | 1 | ρ | ϱ |
            | 03C2 | 03A3 | CF82 | CEA3 | 0 | 1 | ς | Σ |
            | 03A3 | 03C2 | CEA3 | CF82 | 0 | 1 | Σ | ς |
            | 03C3 | 03C2 | CF83 | CF82 | 0 | 1 | σ | ς |
            | 03C2 | 03C3 | CF82 | CF83 | 0 | 1 | ς | σ |
            | 03D5 | 03A6 | CF95 | CEA6 | 0 | 1 | ϕ | Φ |
            | 03D5 | 03C6 | CF95 | CF86 | 0 | 1 | ϕ | φ |
            | 03A6 | 03D5 | CEA6 | CF95 | 0 | 1 | Φ | ϕ |
            | 03C6 | 03D5 | CF86 | CF95 | 0 | 1 | φ | ϕ |
            | 2126 | 03A9 | E284A6 | CEA9 | 1 | 0 | Ω | Ω |
            | 2126 | 03C9 | E284A6 | CF89 | 1 | 0 | Ω | ω |
            | 03A9 | 2126 | CEA9 | E284A6 | 1 | 0 | Ω | Ω |
            | 03C9 | 2126 | CF89 | E284A6 | 1 | 0 | ω | Ω |
            | 1C80 | 0412 | E1B280 | D092 | 0 | 1 | ᲀ | В |
            | 1C80 | 0432 | E1B280 | D0B2 | 0 | 1 | ᲀ | в |
            | 0412 | 1C80 | D092 | E1B280 | 0 | 1 | В | ᲀ |
            | 0432 | 1C80 | D0B2 | E1B280 | 0 | 1 | в | ᲀ |
            | 1C81 | 0414 | E1B281 | D094 | 0 | 1 | ᲁ | Д |
            | 1C81 | 0434 | E1B281 | D0B4 | 0 | 1 | ᲁ | д |
            | 0414 | 1C81 | D094 | E1B281 | 0 | 1 | Д | ᲁ |
            | 0434 | 1C81 | D0B4 | E1B281 | 0 | 1 | д | ᲁ |
            | 1C82 | 041E | E1B282 | D09E | 0 | 1 | ᲂ | О |
            | 1C82 | 043E | E1B282 | D0BE | 0 | 1 | ᲂ | о |
            | 041E | 1C82 | D09E | E1B282 | 0 | 1 | О | ᲂ |
            | 043E | 1C82 | D0BE | E1B282 | 0 | 1 | о | ᲂ |
            | 1C83 | 0421 | E1B283 | D0A1 | 0 | 1 | ᲃ | С |
            | 1C83 | 0441 | E1B283 | D181 | 0 | 1 | ᲃ | с |
            | 0421 | 1C83 | D0A1 | E1B283 | 0 | 1 | С | ᲃ |
            | 0441 | 1C83 | D181 | E1B283 | 0 | 1 | с | ᲃ |
            | 1C84 | 0422 | E1B284 | D0A2 | 0 | 1 | ᲄ | Т |
            | 1C85 | 0422 | E1B285 | D0A2 | 0 | 1 | ᲅ | Т |
            | 1C84 | 0442 | E1B284 | D182 | 0 | 1 | ᲄ | т |
            | 1C85 | 0442 | E1B285 | D182 | 0 | 1 | ᲅ | т |
            | 0422 | 1C84 | D0A2 | E1B284 | 0 | 1 | Т | ᲄ |
            | 0442 | 1C84 | D182 | E1B284 | 0 | 1 | т | ᲄ |
            | 1C85 | 1C84 | E1B285 | E1B284 | 0 | 1 | ᲅ | ᲄ |
            | 0422 | 1C85 | D0A2 | E1B285 | 0 | 1 | Т | ᲅ |
            | 0442 | 1C85 | D182 | E1B285 | 0 | 1 | т | ᲅ |
            | 1C84 | 1C85 | E1B284 | E1B285 | 0 | 1 | ᲄ | ᲅ |
            | A64A | 1C88 | EA998A | E1B288 | 0 | 1 | Ꙋ | ᲈ |
            | A64B | 1C88 | EA998B | E1B288 | 0 | 1 | ꙋ | ᲈ |
            | 1C88 | A64A | E1B288 | EA998A | 0 | 1 | ᲈ | Ꙋ |
            | 1C88 | A64B | E1B288 | EA998B | 0 | 1 | ᲈ | ꙋ |
            | 1C86 | 042A | E1B286 | D0AA | 0 | 1 | ᲆ | Ъ |
            | 1C86 | 044A | E1B286 | D18A | 0 | 1 | ᲆ | ъ |
            | 042A | 1C86 | D0AA | E1B286 | 0 | 1 | Ъ | ᲆ |
            | 044A | 1C86 | D18A | E1B286 | 0 | 1 | ъ | ᲆ |
            | 1C87 | 0462 | E1B287 | D1A2 | 0 | 1 | ᲇ | Ѣ |
            | 1C87 | 0463 | E1B287 | D1A3 | 0 | 1 | ᲇ | ѣ |
            | 0462 | 1C87 | D1A2 | E1B287 | 0 | 1 | Ѣ | ᲇ |
            | 0463 | 1C87 | D1A3 | E1B287 | 0 | 1 | ѣ | ᲇ |
            +-----------+-----------+--------+--------+----------+----------+------+------+
            {noformat}


            Note, toupper comparison considers more pairs as equal than tolower comparison:
            {code:sql}
            SELECT SUM(eq_lower), SUM(eq_upper) FROM t21;
            {code}
            {noformat}
            +---------------+---------------+
            | SUM(eq_lower) | SUM(eq_upper) |
            +---------------+---------------+
            | 21 | 96 |
            +---------------+---------------+
            {noformat}

            h2. A more compact table with distinct pairs (the character with a smaller code point is on the left)
            {code:sql}
            SELECT
              HEX(CONVERT(a USING ucs2)) AS unicode_a,
              HEX(CONVERT(b USING ucs2)) AS unicode_b,
              HEX(a), HEX(b),
              BINARY LOWER(a)=LOWER(b) AS eq_lower,
              BINARY UPPER(a)=UPPER(b) AS eq_upper,
              a,b
            FROM
            (
              SELECT DISTINCT IF(BINARY a<b,a,b) AS a,IF(binary a>=b,a,b) AS b from t21
            ) d1
            ORDER BY eq_lower, unicode_a, unicode_b;
            {code}
            {noformat}
            +-----------+-----------+--------+--------+----------+----------+-----+-----+
            | unicode_a | unicode_b | HEX(a) | HEX(b) | eq_lower | eq_upper | a | b |
            +-----------+-----------+--------+--------+----------+----------+-----+-----+
            | 0049 | 0131 | 49 | C4B1 | 0 | 1 | I | ı |
            | 0053 | 017F | 53 | C5BF | 0 | 1 | S | ſ |
            | 00B5 | 039C | C2B5 | CE9C | 0 | 1 | µ | Μ |
            | 0345 | 0399 | CD85 | CE99 | 0 | 1 | ͅ | Ι |
            | 0392 | 03D0 | CE92 | CF90 | 0 | 1 | Β | ϐ |
            | 0395 | 03F5 | CE95 | CFB5 | 0 | 1 | Ε | ϵ |
            | 0398 | 03D1 | CE98 | CF91 | 0 | 1 | Θ | ϑ |
            | 0399 | 1FBE | CE99 | E1BEBE | 0 | 1 | Ι | ι |
            | 039A | 03F0 | CE9A | CFB0 | 0 | 1 | Κ | ϰ |
            | 03A0 | 03D6 | CEA0 | CF96 | 0 | 1 | Π | ϖ |
            | 03A1 | 03F1 | CEA1 | CFB1 | 0 | 1 | Ρ | ϱ |
            | 03A3 | 03C2 | CEA3 | CF82 | 0 | 1 | Σ | ς |
            | 03A6 | 03D5 | CEA6 | CF95 | 0 | 1 | Φ | ϕ |
            | 0412 | 1C80 | D092 | E1B280 | 0 | 1 | В | ᲀ |
            | 0414 | 1C81 | D094 | E1B281 | 0 | 1 | Д | ᲁ |
            | 041E | 1C82 | D09E | E1B282 | 0 | 1 | О | ᲂ |
            | 0421 | 1C83 | D0A1 | E1B283 | 0 | 1 | С | ᲃ |
            | 0422 | 1C84 | D0A2 | E1B284 | 0 | 1 | Т | ᲄ |
            | 042A | 1C86 | D0AA | E1B286 | 0 | 1 | Ъ | ᲆ |
            | 0462 | 1C87 | D1A2 | E1B287 | 0 | 1 | Ѣ | ᲇ |
            | 1C88 | A64A | E1B288 | EA998A | 0 | 1 | ᲈ | Ꙋ |
            | 0049 | 0130 | 49 | C4B0 | 1 | 0 | I | İ |
            | 004B | 212A | 4B | E284AA | 1 | 0 | K | K |
            | 00C5 | 212B | C385 | E284AB | 1 | 0 | Å | Å |
            | 00DF | 1E9E | C39F | E1BA9E | 1 | 0 | ß | ẞ |
            | 03A9 | 2126 | CEA9 | E284A6 | 1 | 0 | Ω | Ω |
            +-----------+-----------+--------+--------+----------+----------+-----+-----+
            26 rows in set (0.005 sec)
            {noformat}

            h2. Cons of toupper vs tolower comparison
            - The tolower variant will compare exactly like strcasecmp() did
            - The toupper variant will compare close to how utf8mb3_general_ci works
            There are two parallel comparison systems in MariaDB collation library, implemented as virtual functions in MY_COLLATION_HANDLER:

            - Comparison according to the collation, provided by these functions
            {code:cpp}
              int (*strnncoll)(CHARSET_INFO *,
                                   const uchar *, size_t, const uchar *, size_t, my_bool);
              int (*strnncollsp)(CHARSET_INFO *,
                                     const uchar *, size_t, const uchar *, size_t);

              int (*strnncollsp_nchars)(CHARSET_INFO *,
                                            const uchar *str1, size_t len1,
                                            const uchar *str2, size_t len2,
                                            size_t nchars,
                                            uint flags);
            {code}


            - Comparison in case insensitive (but accent sensitive) style, implemented by this function:
            {code:cpp}
              int (*strcasecmp)(CHARSET_INFO *, const char *, const char *);
            {code}
            Note, accent and case sensitivity of the collation does not matter. strcasecmp() always works using accent sensitive case insensitive comparison style.

            These two parallel systems are redundant.

            Note, strcasecmp() is used mostly to compare identifiers, while the functions of the first group are used to compare data.

            Let's get rid of the second comparison system:
            1. Remove MY_COLLATION_HANDLER::strcasecmp()
            2. Introduce a new collation utf8mb4_general1400_as_ci. Note, it should work for the entire Unicode range U+0000..U+10FFFF.
            3. Replace all calls for:
            {code:cpp}
            system_charset_info->coll->strcasecmp()
            {code}
            to calls for
            {code:cpp}
            my_charset_utf8mb4_general1400_as_ci->coll->strnncoll***()
            {code}

            The change would generally be quite mechanic. However, there is one small problem: strcasecmp() accepts 0-terminated strings, while the strnncoll-alike functions accept the pointer and the length. So some refactoring will be needed. Note, Monty earlier changed many MariaDB C data types to use LEX_CSTRING (instead of just a const char pointer) to store names. So this part of the current task will switch some more C data types to LEX_CSTRING.

            h2. tolower vs toupper comparison

            Another option is to implement utf8mb3_general_as_ci which will compare upper cases (instead of lower cases).


            The difference is in a few dozen BMP characters. This script finds all those characters:
            {code:sql}
            CREATE OR REPLACE TABLE t1 (a CHAR(1) CHARACTER SET utf8mb4 COLLATE utf8mb4_uca1400_ai_ci) ENGINE=MyISAM;
            DELIMITER $$
            FOR i IN 0..0xFFFF
            DO
              INSERT INTO t1 VALUES (CHAR(i USING ucs2));
            END FOR;
            $$
            DELIMITER ;
            ALTER TABLE t1
              ADD has_casefolding INT DEFAULT (BINARY LOWER(a)<>a OR BINARY UPPER(a)<>a),
              ADD KEY(has_casefolding, a);

            CREATE OR REPLACE TABLE t21 AS
            SELECT
              HEX(t1.a) AS hex_a, HEX(t2.a) AS hex_b,
              BINARY LOWER(t1.a)=LOWER(t2.a) eq_lower,
              BINARY UPPER(t1.a)=UPPER(t2.a) AS eq_upper,
              t1.a AS a, t2.a AS b
            FROM
              t1 t1, t1 t2
            WHERE
              t1.has_casefolding=1
            AND (BINARY LOWER(t1.a)=LOWER(t2.a))<>(BINARY UPPER(t1.a)=UPPER(t2.a));

            SELECT
              HEX(CONVERT(a USING ucs2)) AS unicode_a,
              HEX(CONVERT(b USING ucs2)) AS unicode_b,
              t21.* FROM t21;
            {code}
            {noformat}
            +-----------+-----------+--------+--------+----------+----------+------+------+
            | unicode_a | unicode_b | hex_a | hex_b | eq_lower | eq_upper | a | b |
            +-----------+-----------+--------+--------+----------+----------+------+------+
            | 1E9E | 00DF | E1BA9E | C39F | 1 | 0 | ẞ | ß |
            | 0399 | 0345 | CE99 | CD85 | 0 | 1 | Ι | ͅ |
            | 03B9 | 0345 | CEB9 | CD85 | 0 | 1 | ι | ͅ |
            | 1FBE | 0345 | E1BEBE | CD85 | 0 | 1 | ι | ͅ |
            | 212B | 00C5 | E284AB | C385 | 1 | 0 | Å | Å |
            | 212B | 00E5 | E284AB | C3A5 | 1 | 0 | Å | å |
            | 00C5 | 212B | C385 | E284AB | 1 | 0 | Å | Å |
            | 00E5 | 212B | C3A5 | E284AB | 1 | 0 | å | Å |
            | 0130 | 0049 | C4B0 | 49 | 1 | 0 | İ | I |
            | 0131 | 0049 | C4B1 | 49 | 0 | 1 | ı | I |
            | 0130 | 0069 | C4B0 | 69 | 1 | 0 | İ | i |
            | 0131 | 0069 | C4B1 | 69 | 0 | 1 | ı | i |
            | 0049 | 0130 | 49 | C4B0 | 1 | 0 | I | İ |
            | 0069 | 0130 | 69 | C4B0 | 1 | 0 | i | İ |
            | 0049 | 0131 | 49 | C4B1 | 0 | 1 | I | ı |
            | 0069 | 0131 | 69 | C4B1 | 0 | 1 | i | ı |
            | 212A | 004B | E284AA | 4B | 1 | 0 | K | K |
            | 212A | 006B | E284AA | 6B | 1 | 0 | K | k |
            | 004B | 212A | 4B | E284AA | 1 | 0 | K | K |
            | 006B | 212A | 6B | E284AA | 1 | 0 | k | K |
            | 017F | 0053 | C5BF | 53 | 0 | 1 | ſ | S |
            | 017F | 0073 | C5BF | 73 | 0 | 1 | ſ | s |
            | 0053 | 017F | 53 | C5BF | 0 | 1 | S | ſ |
            | 0073 | 017F | 73 | C5BF | 0 | 1 | s | ſ |
            | 1E9B | 1E60 | E1BA9B | E1B9A0 | 0 | 1 | ẛ | Ṡ |
            | 1E9B | 1E61 | E1BA9B | E1B9A1 | 0 | 1 | ẛ | ṡ |
            | 1E60 | 1E9B | E1B9A0 | E1BA9B | 0 | 1 | Ṡ | ẛ |
            | 1E61 | 1E9B | E1B9A1 | E1BA9B | 0 | 1 | ṡ | ẛ |
            | 03D0 | 0392 | CF90 | CE92 | 0 | 1 | ϐ | Β |
            | 03D0 | 03B2 | CF90 | CEB2 | 0 | 1 | ϐ | β |
            | 0392 | 03D0 | CE92 | CF90 | 0 | 1 | Β | ϐ |
            | 03B2 | 03D0 | CEB2 | CF90 | 0 | 1 | β | ϐ |
            | 03F5 | 0395 | CFB5 | CE95 | 0 | 1 | ϵ | Ε |
            | 03F5 | 03B5 | CFB5 | CEB5 | 0 | 1 | ϵ | ε |
            | 0395 | 03F5 | CE95 | CFB5 | 0 | 1 | Ε | ϵ |
            | 03B5 | 03F5 | CEB5 | CFB5 | 0 | 1 | ε | ϵ |
            | 03D1 | 0398 | CF91 | CE98 | 0 | 1 | ϑ | Θ |
            | 03F4 | 0398 | CFB4 | CE98 | 1 | 0 | ϴ | Θ |
            | 03D1 | 03B8 | CF91 | CEB8 | 0 | 1 | ϑ | θ |
            | 03F4 | 03B8 | CFB4 | CEB8 | 1 | 0 | ϴ | θ |
            | 0398 | 03D1 | CE98 | CF91 | 0 | 1 | Θ | ϑ |
            | 03B8 | 03D1 | CEB8 | CF91 | 0 | 1 | θ | ϑ |
            | 0398 | 03F4 | CE98 | CFB4 | 1 | 0 | Θ | ϴ |
            | 03B8 | 03F4 | CEB8 | CFB4 | 1 | 0 | θ | ϴ |
            | 0345 | 0399 | CD85 | CE99 | 0 | 1 | ͅ | Ι |
            | 1FBE | 0399 | E1BEBE | CE99 | 0 | 1 | ι | Ι |
            | 0345 | 03B9 | CD85 | CEB9 | 0 | 1 | ͅ | ι |
            | 1FBE | 03B9 | E1BEBE | CEB9 | 0 | 1 | ι | ι |
            | 0345 | 1FBE | CD85 | E1BEBE | 0 | 1 | ͅ | ι |
            | 0399 | 1FBE | CE99 | E1BEBE | 0 | 1 | Ι | ι |
            | 03B9 | 1FBE | CEB9 | E1BEBE | 0 | 1 | ι | ι |
            | 03F0 | 039A | CFB0 | CE9A | 0 | 1 | ϰ | Κ |
            | 03F0 | 03BA | CFB0 | CEBA | 0 | 1 | ϰ | κ |
            | 039A | 03F0 | CE9A | CFB0 | 0 | 1 | Κ | ϰ |
            | 03BA | 03F0 | CEBA | CFB0 | 0 | 1 | κ | ϰ |
            | 039C | 00B5 | CE9C | C2B5 | 0 | 1 | Μ | µ |
            | 03BC | 00B5 | CEBC | C2B5 | 0 | 1 | μ | µ |
            | 00B5 | 039C | C2B5 | CE9C | 0 | 1 | µ | Μ |
            | 00B5 | 03BC | C2B5 | CEBC | 0 | 1 | µ | μ |
            | 03D6 | 03A0 | CF96 | CEA0 | 0 | 1 | ϖ | Π |
            | 03D6 | 03C0 | CF96 | CF80 | 0 | 1 | ϖ | π |
            | 03A0 | 03D6 | CEA0 | CF96 | 0 | 1 | Π | ϖ |
            | 03C0 | 03D6 | CF80 | CF96 | 0 | 1 | π | ϖ |
            | 03F1 | 03A1 | CFB1 | CEA1 | 0 | 1 | ϱ | Ρ |
            | 03F1 | 03C1 | CFB1 | CF81 | 0 | 1 | ϱ | ρ |
            | 03A1 | 03F1 | CEA1 | CFB1 | 0 | 1 | Ρ | ϱ |
            | 03C1 | 03F1 | CF81 | CFB1 | 0 | 1 | ρ | ϱ |
            | 03C2 | 03A3 | CF82 | CEA3 | 0 | 1 | ς | Σ |
            | 03A3 | 03C2 | CEA3 | CF82 | 0 | 1 | Σ | ς |
            | 03C3 | 03C2 | CF83 | CF82 | 0 | 1 | σ | ς |
            | 03C2 | 03C3 | CF82 | CF83 | 0 | 1 | ς | σ |
            | 03D5 | 03A6 | CF95 | CEA6 | 0 | 1 | ϕ | Φ |
            | 03D5 | 03C6 | CF95 | CF86 | 0 | 1 | ϕ | φ |
            | 03A6 | 03D5 | CEA6 | CF95 | 0 | 1 | Φ | ϕ |
            | 03C6 | 03D5 | CF86 | CF95 | 0 | 1 | φ | ϕ |
            | 2126 | 03A9 | E284A6 | CEA9 | 1 | 0 | Ω | Ω |
            | 2126 | 03C9 | E284A6 | CF89 | 1 | 0 | Ω | ω |
            | 03A9 | 2126 | CEA9 | E284A6 | 1 | 0 | Ω | Ω |
            | 03C9 | 2126 | CF89 | E284A6 | 1 | 0 | ω | Ω |
            | 1C80 | 0412 | E1B280 | D092 | 0 | 1 | ᲀ | В |
            | 1C80 | 0432 | E1B280 | D0B2 | 0 | 1 | ᲀ | в |
            | 0412 | 1C80 | D092 | E1B280 | 0 | 1 | В | ᲀ |
            | 0432 | 1C80 | D0B2 | E1B280 | 0 | 1 | в | ᲀ |
            | 1C81 | 0414 | E1B281 | D094 | 0 | 1 | ᲁ | Д |
            | 1C81 | 0434 | E1B281 | D0B4 | 0 | 1 | ᲁ | д |
            | 0414 | 1C81 | D094 | E1B281 | 0 | 1 | Д | ᲁ |
            | 0434 | 1C81 | D0B4 | E1B281 | 0 | 1 | д | ᲁ |
            | 1C82 | 041E | E1B282 | D09E | 0 | 1 | ᲂ | О |
            | 1C82 | 043E | E1B282 | D0BE | 0 | 1 | ᲂ | о |
            | 041E | 1C82 | D09E | E1B282 | 0 | 1 | О | ᲂ |
            | 043E | 1C82 | D0BE | E1B282 | 0 | 1 | о | ᲂ |
            | 1C83 | 0421 | E1B283 | D0A1 | 0 | 1 | ᲃ | С |
            | 1C83 | 0441 | E1B283 | D181 | 0 | 1 | ᲃ | с |
            | 0421 | 1C83 | D0A1 | E1B283 | 0 | 1 | С | ᲃ |
            | 0441 | 1C83 | D181 | E1B283 | 0 | 1 | с | ᲃ |
            | 1C84 | 0422 | E1B284 | D0A2 | 0 | 1 | ᲄ | Т |
            | 1C85 | 0422 | E1B285 | D0A2 | 0 | 1 | ᲅ | Т |
            | 1C84 | 0442 | E1B284 | D182 | 0 | 1 | ᲄ | т |
            | 1C85 | 0442 | E1B285 | D182 | 0 | 1 | ᲅ | т |
            | 0422 | 1C84 | D0A2 | E1B284 | 0 | 1 | Т | ᲄ |
            | 0442 | 1C84 | D182 | E1B284 | 0 | 1 | т | ᲄ |
            | 1C85 | 1C84 | E1B285 | E1B284 | 0 | 1 | ᲅ | ᲄ |
            | 0422 | 1C85 | D0A2 | E1B285 | 0 | 1 | Т | ᲅ |
            | 0442 | 1C85 | D182 | E1B285 | 0 | 1 | т | ᲅ |
            | 1C84 | 1C85 | E1B284 | E1B285 | 0 | 1 | ᲄ | ᲅ |
            | A64A | 1C88 | EA998A | E1B288 | 0 | 1 | Ꙋ | ᲈ |
            | A64B | 1C88 | EA998B | E1B288 | 0 | 1 | ꙋ | ᲈ |
            | 1C88 | A64A | E1B288 | EA998A | 0 | 1 | ᲈ | Ꙋ |
            | 1C88 | A64B | E1B288 | EA998B | 0 | 1 | ᲈ | ꙋ |
            | 1C86 | 042A | E1B286 | D0AA | 0 | 1 | ᲆ | Ъ |
            | 1C86 | 044A | E1B286 | D18A | 0 | 1 | ᲆ | ъ |
            | 042A | 1C86 | D0AA | E1B286 | 0 | 1 | Ъ | ᲆ |
            | 044A | 1C86 | D18A | E1B286 | 0 | 1 | ъ | ᲆ |
            | 1C87 | 0462 | E1B287 | D1A2 | 0 | 1 | ᲇ | Ѣ |
            | 1C87 | 0463 | E1B287 | D1A3 | 0 | 1 | ᲇ | ѣ |
            | 0462 | 1C87 | D1A2 | E1B287 | 0 | 1 | Ѣ | ᲇ |
            | 0463 | 1C87 | D1A3 | E1B287 | 0 | 1 | ѣ | ᲇ |
            +-----------+-----------+--------+--------+----------+----------+------+------+
            {noformat}


            Note, toupper comparison considers more pairs as equal than tolower comparison:
            {code:sql}
            SELECT SUM(eq_lower), SUM(eq_upper) FROM t21;
            {code}
            {noformat}
            +---------------+---------------+
            | SUM(eq_lower) | SUM(eq_upper) |
            +---------------+---------------+
            | 21 | 96 |
            +---------------+---------------+
            {noformat}

            h2. A more compact table with distinct pairs (the character with a smaller code point is on the left)
            {code:sql}
            SELECT
              HEX(CONVERT(a USING ucs2)) AS unicode_a,
              HEX(CONVERT(b USING ucs2)) AS unicode_b,
              HEX(a), HEX(b),
              BINARY LOWER(a)=LOWER(b) AS eq_lower,
              BINARY UPPER(a)=UPPER(b) AS eq_upper,
              a,b
            FROM
            (
              SELECT DISTINCT IF(BINARY a<b,a,b) AS a,IF(binary a>=b,a,b) AS b from t21
            ) d1
            ORDER BY eq_lower, unicode_a, unicode_b;
            {code}
            {noformat}
            +-----------+-----------+--------+--------+----------+----------+-----+-----+
            | unicode_a | unicode_b | HEX(a) | HEX(b) | eq_lower | eq_upper | a | b |
            +-----------+-----------+--------+--------+----------+----------+-----+-----+
            | 0049 | 0131 | 49 | C4B1 | 0 | 1 | I | ı |
            | 0053 | 017F | 53 | C5BF | 0 | 1 | S | ſ |
            | 00B5 | 039C | C2B5 | CE9C | 0 | 1 | µ | Μ |
            | 0345 | 0399 | CD85 | CE99 | 0 | 1 | ͅ | Ι |
            | 0392 | 03D0 | CE92 | CF90 | 0 | 1 | Β | ϐ |
            | 0395 | 03F5 | CE95 | CFB5 | 0 | 1 | Ε | ϵ |
            | 0398 | 03D1 | CE98 | CF91 | 0 | 1 | Θ | ϑ |
            | 0399 | 1FBE | CE99 | E1BEBE | 0 | 1 | Ι | ι |
            | 039A | 03F0 | CE9A | CFB0 | 0 | 1 | Κ | ϰ |
            | 03A0 | 03D6 | CEA0 | CF96 | 0 | 1 | Π | ϖ |
            | 03A1 | 03F1 | CEA1 | CFB1 | 0 | 1 | Ρ | ϱ |
            | 03A3 | 03C2 | CEA3 | CF82 | 0 | 1 | Σ | ς |
            | 03A6 | 03D5 | CEA6 | CF95 | 0 | 1 | Φ | ϕ |
            | 0412 | 1C80 | D092 | E1B280 | 0 | 1 | В | ᲀ |
            | 0414 | 1C81 | D094 | E1B281 | 0 | 1 | Д | ᲁ |
            | 041E | 1C82 | D09E | E1B282 | 0 | 1 | О | ᲂ |
            | 0421 | 1C83 | D0A1 | E1B283 | 0 | 1 | С | ᲃ |
            | 0422 | 1C84 | D0A2 | E1B284 | 0 | 1 | Т | ᲄ |
            | 042A | 1C86 | D0AA | E1B286 | 0 | 1 | Ъ | ᲆ |
            | 0462 | 1C87 | D1A2 | E1B287 | 0 | 1 | Ѣ | ᲇ |
            | 1C88 | A64A | E1B288 | EA998A | 0 | 1 | ᲈ | Ꙋ |
            | 0049 | 0130 | 49 | C4B0 | 1 | 0 | I | İ |
            | 004B | 212A | 4B | E284AA | 1 | 0 | K | K |
            | 00C5 | 212B | C385 | E284AB | 1 | 0 | Å | Å |
            | 00DF | 1E9E | C39F | E1BA9E | 1 | 0 | ß | ẞ |
            | 03A9 | 2126 | CEA9 | E284A6 | 1 | 0 | Ω | Ω |
            +-----------+-----------+--------+--------+----------+----------+-----+-----+
            26 rows in set (0.005 sec)
            {noformat}

            h2. Cons of toupper vs tolower comparison
            - The tolower variant will compare exactly like strcasecmp() did
            - The toupper variant will compare close to how utf8mb3_general_ci works
            bar Alexander Barkov made changes -
            Description There are two parallel comparison systems in MariaDB collation library, implemented as virtual functions in MY_COLLATION_HANDLER:

            - Comparison according to the collation, provided by these functions
            {code:cpp}
              int (*strnncoll)(CHARSET_INFO *,
                                   const uchar *, size_t, const uchar *, size_t, my_bool);
              int (*strnncollsp)(CHARSET_INFO *,
                                     const uchar *, size_t, const uchar *, size_t);

              int (*strnncollsp_nchars)(CHARSET_INFO *,
                                            const uchar *str1, size_t len1,
                                            const uchar *str2, size_t len2,
                                            size_t nchars,
                                            uint flags);
            {code}


            - Comparison in case insensitive (but accent sensitive) style, implemented by this function:
            {code:cpp}
              int (*strcasecmp)(CHARSET_INFO *, const char *, const char *);
            {code}
            Note, accent and case sensitivity of the collation does not matter. strcasecmp() always works using accent sensitive case insensitive comparison style.

            These two parallel systems are redundant.

            Note, strcasecmp() is used mostly to compare identifiers, while the functions of the first group are used to compare data.

            Let's get rid of the second comparison system:
            1. Remove MY_COLLATION_HANDLER::strcasecmp()
            2. Introduce a new collation utf8mb4_general1400_as_ci. Note, it should work for the entire Unicode range U+0000..U+10FFFF.
            3. Replace all calls for:
            {code:cpp}
            system_charset_info->coll->strcasecmp()
            {code}
            to calls for
            {code:cpp}
            my_charset_utf8mb4_general1400_as_ci->coll->strnncoll***()
            {code}

            The change would generally be quite mechanic. However, there is one small problem: strcasecmp() accepts 0-terminated strings, while the strnncoll-alike functions accept the pointer and the length. So some refactoring will be needed. Note, Monty earlier changed many MariaDB C data types to use LEX_CSTRING (instead of just a const char pointer) to store names. So this part of the current task will switch some more C data types to LEX_CSTRING.

            h2. tolower vs toupper comparison

            Another option is to implement utf8mb3_general_as_ci which will compare upper cases (instead of lower cases).


            The difference is in a few dozen BMP characters. This script finds all those characters:
            {code:sql}
            CREATE OR REPLACE TABLE t1 (a CHAR(1) CHARACTER SET utf8mb4 COLLATE utf8mb4_uca1400_ai_ci) ENGINE=MyISAM;
            DELIMITER $$
            FOR i IN 0..0xFFFF
            DO
              INSERT INTO t1 VALUES (CHAR(i USING ucs2));
            END FOR;
            $$
            DELIMITER ;
            ALTER TABLE t1
              ADD has_casefolding INT DEFAULT (BINARY LOWER(a)<>a OR BINARY UPPER(a)<>a),
              ADD KEY(has_casefolding, a);

            CREATE OR REPLACE TABLE t21 AS
            SELECT
              HEX(t1.a) AS hex_a, HEX(t2.a) AS hex_b,
              BINARY LOWER(t1.a)=LOWER(t2.a) eq_lower,
              BINARY UPPER(t1.a)=UPPER(t2.a) AS eq_upper,
              t1.a AS a, t2.a AS b
            FROM
              t1 t1, t1 t2
            WHERE
              t1.has_casefolding=1
            AND (BINARY LOWER(t1.a)=LOWER(t2.a))<>(BINARY UPPER(t1.a)=UPPER(t2.a));

            SELECT
              HEX(CONVERT(a USING ucs2)) AS unicode_a,
              HEX(CONVERT(b USING ucs2)) AS unicode_b,
              t21.* FROM t21;
            {code}
            {noformat}
            +-----------+-----------+--------+--------+----------+----------+------+------+
            | unicode_a | unicode_b | hex_a | hex_b | eq_lower | eq_upper | a | b |
            +-----------+-----------+--------+--------+----------+----------+------+------+
            | 1E9E | 00DF | E1BA9E | C39F | 1 | 0 | ẞ | ß |
            | 0399 | 0345 | CE99 | CD85 | 0 | 1 | Ι | ͅ |
            | 03B9 | 0345 | CEB9 | CD85 | 0 | 1 | ι | ͅ |
            | 1FBE | 0345 | E1BEBE | CD85 | 0 | 1 | ι | ͅ |
            | 212B | 00C5 | E284AB | C385 | 1 | 0 | Å | Å |
            | 212B | 00E5 | E284AB | C3A5 | 1 | 0 | Å | å |
            | 00C5 | 212B | C385 | E284AB | 1 | 0 | Å | Å |
            | 00E5 | 212B | C3A5 | E284AB | 1 | 0 | å | Å |
            | 0130 | 0049 | C4B0 | 49 | 1 | 0 | İ | I |
            | 0131 | 0049 | C4B1 | 49 | 0 | 1 | ı | I |
            | 0130 | 0069 | C4B0 | 69 | 1 | 0 | İ | i |
            | 0131 | 0069 | C4B1 | 69 | 0 | 1 | ı | i |
            | 0049 | 0130 | 49 | C4B0 | 1 | 0 | I | İ |
            | 0069 | 0130 | 69 | C4B0 | 1 | 0 | i | İ |
            | 0049 | 0131 | 49 | C4B1 | 0 | 1 | I | ı |
            | 0069 | 0131 | 69 | C4B1 | 0 | 1 | i | ı |
            | 212A | 004B | E284AA | 4B | 1 | 0 | K | K |
            | 212A | 006B | E284AA | 6B | 1 | 0 | K | k |
            | 004B | 212A | 4B | E284AA | 1 | 0 | K | K |
            | 006B | 212A | 6B | E284AA | 1 | 0 | k | K |
            | 017F | 0053 | C5BF | 53 | 0 | 1 | ſ | S |
            | 017F | 0073 | C5BF | 73 | 0 | 1 | ſ | s |
            | 0053 | 017F | 53 | C5BF | 0 | 1 | S | ſ |
            | 0073 | 017F | 73 | C5BF | 0 | 1 | s | ſ |
            | 1E9B | 1E60 | E1BA9B | E1B9A0 | 0 | 1 | ẛ | Ṡ |
            | 1E9B | 1E61 | E1BA9B | E1B9A1 | 0 | 1 | ẛ | ṡ |
            | 1E60 | 1E9B | E1B9A0 | E1BA9B | 0 | 1 | Ṡ | ẛ |
            | 1E61 | 1E9B | E1B9A1 | E1BA9B | 0 | 1 | ṡ | ẛ |
            | 03D0 | 0392 | CF90 | CE92 | 0 | 1 | ϐ | Β |
            | 03D0 | 03B2 | CF90 | CEB2 | 0 | 1 | ϐ | β |
            | 0392 | 03D0 | CE92 | CF90 | 0 | 1 | Β | ϐ |
            | 03B2 | 03D0 | CEB2 | CF90 | 0 | 1 | β | ϐ |
            | 03F5 | 0395 | CFB5 | CE95 | 0 | 1 | ϵ | Ε |
            | 03F5 | 03B5 | CFB5 | CEB5 | 0 | 1 | ϵ | ε |
            | 0395 | 03F5 | CE95 | CFB5 | 0 | 1 | Ε | ϵ |
            | 03B5 | 03F5 | CEB5 | CFB5 | 0 | 1 | ε | ϵ |
            | 03D1 | 0398 | CF91 | CE98 | 0 | 1 | ϑ | Θ |
            | 03F4 | 0398 | CFB4 | CE98 | 1 | 0 | ϴ | Θ |
            | 03D1 | 03B8 | CF91 | CEB8 | 0 | 1 | ϑ | θ |
            | 03F4 | 03B8 | CFB4 | CEB8 | 1 | 0 | ϴ | θ |
            | 0398 | 03D1 | CE98 | CF91 | 0 | 1 | Θ | ϑ |
            | 03B8 | 03D1 | CEB8 | CF91 | 0 | 1 | θ | ϑ |
            | 0398 | 03F4 | CE98 | CFB4 | 1 | 0 | Θ | ϴ |
            | 03B8 | 03F4 | CEB8 | CFB4 | 1 | 0 | θ | ϴ |
            | 0345 | 0399 | CD85 | CE99 | 0 | 1 | ͅ | Ι |
            | 1FBE | 0399 | E1BEBE | CE99 | 0 | 1 | ι | Ι |
            | 0345 | 03B9 | CD85 | CEB9 | 0 | 1 | ͅ | ι |
            | 1FBE | 03B9 | E1BEBE | CEB9 | 0 | 1 | ι | ι |
            | 0345 | 1FBE | CD85 | E1BEBE | 0 | 1 | ͅ | ι |
            | 0399 | 1FBE | CE99 | E1BEBE | 0 | 1 | Ι | ι |
            | 03B9 | 1FBE | CEB9 | E1BEBE | 0 | 1 | ι | ι |
            | 03F0 | 039A | CFB0 | CE9A | 0 | 1 | ϰ | Κ |
            | 03F0 | 03BA | CFB0 | CEBA | 0 | 1 | ϰ | κ |
            | 039A | 03F0 | CE9A | CFB0 | 0 | 1 | Κ | ϰ |
            | 03BA | 03F0 | CEBA | CFB0 | 0 | 1 | κ | ϰ |
            | 039C | 00B5 | CE9C | C2B5 | 0 | 1 | Μ | µ |
            | 03BC | 00B5 | CEBC | C2B5 | 0 | 1 | μ | µ |
            | 00B5 | 039C | C2B5 | CE9C | 0 | 1 | µ | Μ |
            | 00B5 | 03BC | C2B5 | CEBC | 0 | 1 | µ | μ |
            | 03D6 | 03A0 | CF96 | CEA0 | 0 | 1 | ϖ | Π |
            | 03D6 | 03C0 | CF96 | CF80 | 0 | 1 | ϖ | π |
            | 03A0 | 03D6 | CEA0 | CF96 | 0 | 1 | Π | ϖ |
            | 03C0 | 03D6 | CF80 | CF96 | 0 | 1 | π | ϖ |
            | 03F1 | 03A1 | CFB1 | CEA1 | 0 | 1 | ϱ | Ρ |
            | 03F1 | 03C1 | CFB1 | CF81 | 0 | 1 | ϱ | ρ |
            | 03A1 | 03F1 | CEA1 | CFB1 | 0 | 1 | Ρ | ϱ |
            | 03C1 | 03F1 | CF81 | CFB1 | 0 | 1 | ρ | ϱ |
            | 03C2 | 03A3 | CF82 | CEA3 | 0 | 1 | ς | Σ |
            | 03A3 | 03C2 | CEA3 | CF82 | 0 | 1 | Σ | ς |
            | 03C3 | 03C2 | CF83 | CF82 | 0 | 1 | σ | ς |
            | 03C2 | 03C3 | CF82 | CF83 | 0 | 1 | ς | σ |
            | 03D5 | 03A6 | CF95 | CEA6 | 0 | 1 | ϕ | Φ |
            | 03D5 | 03C6 | CF95 | CF86 | 0 | 1 | ϕ | φ |
            | 03A6 | 03D5 | CEA6 | CF95 | 0 | 1 | Φ | ϕ |
            | 03C6 | 03D5 | CF86 | CF95 | 0 | 1 | φ | ϕ |
            | 2126 | 03A9 | E284A6 | CEA9 | 1 | 0 | Ω | Ω |
            | 2126 | 03C9 | E284A6 | CF89 | 1 | 0 | Ω | ω |
            | 03A9 | 2126 | CEA9 | E284A6 | 1 | 0 | Ω | Ω |
            | 03C9 | 2126 | CF89 | E284A6 | 1 | 0 | ω | Ω |
            | 1C80 | 0412 | E1B280 | D092 | 0 | 1 | ᲀ | В |
            | 1C80 | 0432 | E1B280 | D0B2 | 0 | 1 | ᲀ | в |
            | 0412 | 1C80 | D092 | E1B280 | 0 | 1 | В | ᲀ |
            | 0432 | 1C80 | D0B2 | E1B280 | 0 | 1 | в | ᲀ |
            | 1C81 | 0414 | E1B281 | D094 | 0 | 1 | ᲁ | Д |
            | 1C81 | 0434 | E1B281 | D0B4 | 0 | 1 | ᲁ | д |
            | 0414 | 1C81 | D094 | E1B281 | 0 | 1 | Д | ᲁ |
            | 0434 | 1C81 | D0B4 | E1B281 | 0 | 1 | д | ᲁ |
            | 1C82 | 041E | E1B282 | D09E | 0 | 1 | ᲂ | О |
            | 1C82 | 043E | E1B282 | D0BE | 0 | 1 | ᲂ | о |
            | 041E | 1C82 | D09E | E1B282 | 0 | 1 | О | ᲂ |
            | 043E | 1C82 | D0BE | E1B282 | 0 | 1 | о | ᲂ |
            | 1C83 | 0421 | E1B283 | D0A1 | 0 | 1 | ᲃ | С |
            | 1C83 | 0441 | E1B283 | D181 | 0 | 1 | ᲃ | с |
            | 0421 | 1C83 | D0A1 | E1B283 | 0 | 1 | С | ᲃ |
            | 0441 | 1C83 | D181 | E1B283 | 0 | 1 | с | ᲃ |
            | 1C84 | 0422 | E1B284 | D0A2 | 0 | 1 | ᲄ | Т |
            | 1C85 | 0422 | E1B285 | D0A2 | 0 | 1 | ᲅ | Т |
            | 1C84 | 0442 | E1B284 | D182 | 0 | 1 | ᲄ | т |
            | 1C85 | 0442 | E1B285 | D182 | 0 | 1 | ᲅ | т |
            | 0422 | 1C84 | D0A2 | E1B284 | 0 | 1 | Т | ᲄ |
            | 0442 | 1C84 | D182 | E1B284 | 0 | 1 | т | ᲄ |
            | 1C85 | 1C84 | E1B285 | E1B284 | 0 | 1 | ᲅ | ᲄ |
            | 0422 | 1C85 | D0A2 | E1B285 | 0 | 1 | Т | ᲅ |
            | 0442 | 1C85 | D182 | E1B285 | 0 | 1 | т | ᲅ |
            | 1C84 | 1C85 | E1B284 | E1B285 | 0 | 1 | ᲄ | ᲅ |
            | A64A | 1C88 | EA998A | E1B288 | 0 | 1 | Ꙋ | ᲈ |
            | A64B | 1C88 | EA998B | E1B288 | 0 | 1 | ꙋ | ᲈ |
            | 1C88 | A64A | E1B288 | EA998A | 0 | 1 | ᲈ | Ꙋ |
            | 1C88 | A64B | E1B288 | EA998B | 0 | 1 | ᲈ | ꙋ |
            | 1C86 | 042A | E1B286 | D0AA | 0 | 1 | ᲆ | Ъ |
            | 1C86 | 044A | E1B286 | D18A | 0 | 1 | ᲆ | ъ |
            | 042A | 1C86 | D0AA | E1B286 | 0 | 1 | Ъ | ᲆ |
            | 044A | 1C86 | D18A | E1B286 | 0 | 1 | ъ | ᲆ |
            | 1C87 | 0462 | E1B287 | D1A2 | 0 | 1 | ᲇ | Ѣ |
            | 1C87 | 0463 | E1B287 | D1A3 | 0 | 1 | ᲇ | ѣ |
            | 0462 | 1C87 | D1A2 | E1B287 | 0 | 1 | Ѣ | ᲇ |
            | 0463 | 1C87 | D1A3 | E1B287 | 0 | 1 | ѣ | ᲇ |
            +-----------+-----------+--------+--------+----------+----------+------+------+
            {noformat}


            Note, toupper comparison considers more pairs as equal than tolower comparison:
            {code:sql}
            SELECT SUM(eq_lower), SUM(eq_upper) FROM t21;
            {code}
            {noformat}
            +---------------+---------------+
            | SUM(eq_lower) | SUM(eq_upper) |
            +---------------+---------------+
            | 21 | 96 |
            +---------------+---------------+
            {noformat}

            h2. A more compact table with distinct pairs (the character with a smaller code point is on the left)
            {code:sql}
            SELECT
              HEX(CONVERT(a USING ucs2)) AS unicode_a,
              HEX(CONVERT(b USING ucs2)) AS unicode_b,
              HEX(a), HEX(b),
              BINARY LOWER(a)=LOWER(b) AS eq_lower,
              BINARY UPPER(a)=UPPER(b) AS eq_upper,
              a,b
            FROM
            (
              SELECT DISTINCT IF(BINARY a<b,a,b) AS a,IF(binary a>=b,a,b) AS b from t21
            ) d1
            ORDER BY eq_lower, unicode_a, unicode_b;
            {code}
            {noformat}
            +-----------+-----------+--------+--------+----------+----------+-----+-----+
            | unicode_a | unicode_b | HEX(a) | HEX(b) | eq_lower | eq_upper | a | b |
            +-----------+-----------+--------+--------+----------+----------+-----+-----+
            | 0049 | 0131 | 49 | C4B1 | 0 | 1 | I | ı |
            | 0053 | 017F | 53 | C5BF | 0 | 1 | S | ſ |
            | 00B5 | 039C | C2B5 | CE9C | 0 | 1 | µ | Μ |
            | 0345 | 0399 | CD85 | CE99 | 0 | 1 | ͅ | Ι |
            | 0392 | 03D0 | CE92 | CF90 | 0 | 1 | Β | ϐ |
            | 0395 | 03F5 | CE95 | CFB5 | 0 | 1 | Ε | ϵ |
            | 0398 | 03D1 | CE98 | CF91 | 0 | 1 | Θ | ϑ |
            | 0399 | 1FBE | CE99 | E1BEBE | 0 | 1 | Ι | ι |
            | 039A | 03F0 | CE9A | CFB0 | 0 | 1 | Κ | ϰ |
            | 03A0 | 03D6 | CEA0 | CF96 | 0 | 1 | Π | ϖ |
            | 03A1 | 03F1 | CEA1 | CFB1 | 0 | 1 | Ρ | ϱ |
            | 03A3 | 03C2 | CEA3 | CF82 | 0 | 1 | Σ | ς |
            | 03A6 | 03D5 | CEA6 | CF95 | 0 | 1 | Φ | ϕ |
            | 0412 | 1C80 | D092 | E1B280 | 0 | 1 | В | ᲀ |
            | 0414 | 1C81 | D094 | E1B281 | 0 | 1 | Д | ᲁ |
            | 041E | 1C82 | D09E | E1B282 | 0 | 1 | О | ᲂ |
            | 0421 | 1C83 | D0A1 | E1B283 | 0 | 1 | С | ᲃ |
            | 0422 | 1C84 | D0A2 | E1B284 | 0 | 1 | Т | ᲄ |
            | 042A | 1C86 | D0AA | E1B286 | 0 | 1 | Ъ | ᲆ |
            | 0462 | 1C87 | D1A2 | E1B287 | 0 | 1 | Ѣ | ᲇ |
            | 1C88 | A64A | E1B288 | EA998A | 0 | 1 | ᲈ | Ꙋ |
            | 0049 | 0130 | 49 | C4B0 | 1 | 0 | I | İ |
            | 004B | 212A | 4B | E284AA | 1 | 0 | K | K |
            | 00C5 | 212B | C385 | E284AB | 1 | 0 | Å | Å |
            | 00DF | 1E9E | C39F | E1BA9E | 1 | 0 | ß | ẞ |
            | 03A9 | 2126 | CEA9 | E284A6 | 1 | 0 | Ω | Ω |
            +-----------+-----------+--------+--------+----------+----------+-----+-----+
            26 rows in set (0.005 sec)
            {noformat}

            h2. Cons of toupper vs tolower comparison
            - The tolower variant will compare exactly like strcasecmp() did
            - The toupper variant will compare close to how utf8mb3_general_ci works
            There are two parallel comparison systems in MariaDB collation library, implemented as virtual functions in MY_COLLATION_HANDLER:

            - Comparison according to the collation, provided by these functions
            {code:cpp}
              int (*strnncoll)(CHARSET_INFO *,
                                   const uchar *, size_t, const uchar *, size_t, my_bool);
              int (*strnncollsp)(CHARSET_INFO *,
                                     const uchar *, size_t, const uchar *, size_t);

              int (*strnncollsp_nchars)(CHARSET_INFO *,
                                            const uchar *str1, size_t len1,
                                            const uchar *str2, size_t len2,
                                            size_t nchars,
                                            uint flags);
            {code}


            - Comparison in case insensitive (but accent sensitive) style, implemented by this function:
            {code:cpp}
              int (*strcasecmp)(CHARSET_INFO *, const char *, const char *);
            {code}
            Note, accent and case sensitivity of the collation does not matter. strcasecmp() always works using accent sensitive case insensitive comparison style.

            These two parallel systems are redundant.

            Note, strcasecmp() is used mostly to compare identifiers, while the functions of the first group are used to compare data.

            Let's get rid of the second comparison system:
            1. Remove MY_COLLATION_HANDLER::strcasecmp()
            2. Introduce a new collation utf8mb4_general1400_as_ci. Note, it should work for the entire Unicode range U+0000..U+10FFFF.
            3. Replace all calls for:
            {code:cpp}
            system_charset_info->coll->strcasecmp()
            {code}
            to calls for
            {code:cpp}
            my_charset_utf8mb4_general1400_as_ci->coll->strnncoll***()
            {code}

            The change would generally be quite mechanic. However, there is one small problem: strcasecmp() accepts 0-terminated strings, while the strnncoll-alike functions accept the pointer and the length. So some refactoring will be needed. Note, Monty earlier changed many MariaDB C data types to use LEX_CSTRING (instead of just a const char pointer) to store names. So this part of the current task will switch some more C data types to LEX_CSTRING.

            h2. tolower vs toupper comparison

            Another option is to implement utf8mb3_general1400_as_ci which will compare upper cases (instead of lower cases).


            The difference is in a few dozen BMP characters. This script finds all those characters:
            {code:sql}
            CREATE OR REPLACE TABLE t1 (a CHAR(1) CHARACTER SET utf8mb4 COLLATE utf8mb4_uca1400_ai_ci) ENGINE=MyISAM;
            DELIMITER $$
            FOR i IN 0..0xFFFF
            DO
              INSERT INTO t1 VALUES (CHAR(i USING ucs2));
            END FOR;
            $$
            DELIMITER ;
            ALTER TABLE t1
              ADD has_casefolding INT DEFAULT (BINARY LOWER(a)<>a OR BINARY UPPER(a)<>a),
              ADD KEY(has_casefolding, a);

            CREATE OR REPLACE TABLE t21 AS
            SELECT
              HEX(t1.a) AS hex_a, HEX(t2.a) AS hex_b,
              BINARY LOWER(t1.a)=LOWER(t2.a) eq_lower,
              BINARY UPPER(t1.a)=UPPER(t2.a) AS eq_upper,
              t1.a AS a, t2.a AS b
            FROM
              t1 t1, t1 t2
            WHERE
              t1.has_casefolding=1
            AND (BINARY LOWER(t1.a)=LOWER(t2.a))<>(BINARY UPPER(t1.a)=UPPER(t2.a));

            SELECT
              HEX(CONVERT(a USING ucs2)) AS unicode_a,
              HEX(CONVERT(b USING ucs2)) AS unicode_b,
              t21.* FROM t21;
            {code}
            {noformat}
            +-----------+-----------+--------+--------+----------+----------+------+------+
            | unicode_a | unicode_b | hex_a | hex_b | eq_lower | eq_upper | a | b |
            +-----------+-----------+--------+--------+----------+----------+------+------+
            | 1E9E | 00DF | E1BA9E | C39F | 1 | 0 | ẞ | ß |
            | 0399 | 0345 | CE99 | CD85 | 0 | 1 | Ι | ͅ |
            | 03B9 | 0345 | CEB9 | CD85 | 0 | 1 | ι | ͅ |
            | 1FBE | 0345 | E1BEBE | CD85 | 0 | 1 | ι | ͅ |
            | 212B | 00C5 | E284AB | C385 | 1 | 0 | Å | Å |
            | 212B | 00E5 | E284AB | C3A5 | 1 | 0 | Å | å |
            | 00C5 | 212B | C385 | E284AB | 1 | 0 | Å | Å |
            | 00E5 | 212B | C3A5 | E284AB | 1 | 0 | å | Å |
            | 0130 | 0049 | C4B0 | 49 | 1 | 0 | İ | I |
            | 0131 | 0049 | C4B1 | 49 | 0 | 1 | ı | I |
            | 0130 | 0069 | C4B0 | 69 | 1 | 0 | İ | i |
            | 0131 | 0069 | C4B1 | 69 | 0 | 1 | ı | i |
            | 0049 | 0130 | 49 | C4B0 | 1 | 0 | I | İ |
            | 0069 | 0130 | 69 | C4B0 | 1 | 0 | i | İ |
            | 0049 | 0131 | 49 | C4B1 | 0 | 1 | I | ı |
            | 0069 | 0131 | 69 | C4B1 | 0 | 1 | i | ı |
            | 212A | 004B | E284AA | 4B | 1 | 0 | K | K |
            | 212A | 006B | E284AA | 6B | 1 | 0 | K | k |
            | 004B | 212A | 4B | E284AA | 1 | 0 | K | K |
            | 006B | 212A | 6B | E284AA | 1 | 0 | k | K |
            | 017F | 0053 | C5BF | 53 | 0 | 1 | ſ | S |
            | 017F | 0073 | C5BF | 73 | 0 | 1 | ſ | s |
            | 0053 | 017F | 53 | C5BF | 0 | 1 | S | ſ |
            | 0073 | 017F | 73 | C5BF | 0 | 1 | s | ſ |
            | 1E9B | 1E60 | E1BA9B | E1B9A0 | 0 | 1 | ẛ | Ṡ |
            | 1E9B | 1E61 | E1BA9B | E1B9A1 | 0 | 1 | ẛ | ṡ |
            | 1E60 | 1E9B | E1B9A0 | E1BA9B | 0 | 1 | Ṡ | ẛ |
            | 1E61 | 1E9B | E1B9A1 | E1BA9B | 0 | 1 | ṡ | ẛ |
            | 03D0 | 0392 | CF90 | CE92 | 0 | 1 | ϐ | Β |
            | 03D0 | 03B2 | CF90 | CEB2 | 0 | 1 | ϐ | β |
            | 0392 | 03D0 | CE92 | CF90 | 0 | 1 | Β | ϐ |
            | 03B2 | 03D0 | CEB2 | CF90 | 0 | 1 | β | ϐ |
            | 03F5 | 0395 | CFB5 | CE95 | 0 | 1 | ϵ | Ε |
            | 03F5 | 03B5 | CFB5 | CEB5 | 0 | 1 | ϵ | ε |
            | 0395 | 03F5 | CE95 | CFB5 | 0 | 1 | Ε | ϵ |
            | 03B5 | 03F5 | CEB5 | CFB5 | 0 | 1 | ε | ϵ |
            | 03D1 | 0398 | CF91 | CE98 | 0 | 1 | ϑ | Θ |
            | 03F4 | 0398 | CFB4 | CE98 | 1 | 0 | ϴ | Θ |
            | 03D1 | 03B8 | CF91 | CEB8 | 0 | 1 | ϑ | θ |
            | 03F4 | 03B8 | CFB4 | CEB8 | 1 | 0 | ϴ | θ |
            | 0398 | 03D1 | CE98 | CF91 | 0 | 1 | Θ | ϑ |
            | 03B8 | 03D1 | CEB8 | CF91 | 0 | 1 | θ | ϑ |
            | 0398 | 03F4 | CE98 | CFB4 | 1 | 0 | Θ | ϴ |
            | 03B8 | 03F4 | CEB8 | CFB4 | 1 | 0 | θ | ϴ |
            | 0345 | 0399 | CD85 | CE99 | 0 | 1 | ͅ | Ι |
            | 1FBE | 0399 | E1BEBE | CE99 | 0 | 1 | ι | Ι |
            | 0345 | 03B9 | CD85 | CEB9 | 0 | 1 | ͅ | ι |
            | 1FBE | 03B9 | E1BEBE | CEB9 | 0 | 1 | ι | ι |
            | 0345 | 1FBE | CD85 | E1BEBE | 0 | 1 | ͅ | ι |
            | 0399 | 1FBE | CE99 | E1BEBE | 0 | 1 | Ι | ι |
            | 03B9 | 1FBE | CEB9 | E1BEBE | 0 | 1 | ι | ι |
            | 03F0 | 039A | CFB0 | CE9A | 0 | 1 | ϰ | Κ |
            | 03F0 | 03BA | CFB0 | CEBA | 0 | 1 | ϰ | κ |
            | 039A | 03F0 | CE9A | CFB0 | 0 | 1 | Κ | ϰ |
            | 03BA | 03F0 | CEBA | CFB0 | 0 | 1 | κ | ϰ |
            | 039C | 00B5 | CE9C | C2B5 | 0 | 1 | Μ | µ |
            | 03BC | 00B5 | CEBC | C2B5 | 0 | 1 | μ | µ |
            | 00B5 | 039C | C2B5 | CE9C | 0 | 1 | µ | Μ |
            | 00B5 | 03BC | C2B5 | CEBC | 0 | 1 | µ | μ |
            | 03D6 | 03A0 | CF96 | CEA0 | 0 | 1 | ϖ | Π |
            | 03D6 | 03C0 | CF96 | CF80 | 0 | 1 | ϖ | π |
            | 03A0 | 03D6 | CEA0 | CF96 | 0 | 1 | Π | ϖ |
            | 03C0 | 03D6 | CF80 | CF96 | 0 | 1 | π | ϖ |
            | 03F1 | 03A1 | CFB1 | CEA1 | 0 | 1 | ϱ | Ρ |
            | 03F1 | 03C1 | CFB1 | CF81 | 0 | 1 | ϱ | ρ |
            | 03A1 | 03F1 | CEA1 | CFB1 | 0 | 1 | Ρ | ϱ |
            | 03C1 | 03F1 | CF81 | CFB1 | 0 | 1 | ρ | ϱ |
            | 03C2 | 03A3 | CF82 | CEA3 | 0 | 1 | ς | Σ |
            | 03A3 | 03C2 | CEA3 | CF82 | 0 | 1 | Σ | ς |
            | 03C3 | 03C2 | CF83 | CF82 | 0 | 1 | σ | ς |
            | 03C2 | 03C3 | CF82 | CF83 | 0 | 1 | ς | σ |
            | 03D5 | 03A6 | CF95 | CEA6 | 0 | 1 | ϕ | Φ |
            | 03D5 | 03C6 | CF95 | CF86 | 0 | 1 | ϕ | φ |
            | 03A6 | 03D5 | CEA6 | CF95 | 0 | 1 | Φ | ϕ |
            | 03C6 | 03D5 | CF86 | CF95 | 0 | 1 | φ | ϕ |
            | 2126 | 03A9 | E284A6 | CEA9 | 1 | 0 | Ω | Ω |
            | 2126 | 03C9 | E284A6 | CF89 | 1 | 0 | Ω | ω |
            | 03A9 | 2126 | CEA9 | E284A6 | 1 | 0 | Ω | Ω |
            | 03C9 | 2126 | CF89 | E284A6 | 1 | 0 | ω | Ω |
            | 1C80 | 0412 | E1B280 | D092 | 0 | 1 | ᲀ | В |
            | 1C80 | 0432 | E1B280 | D0B2 | 0 | 1 | ᲀ | в |
            | 0412 | 1C80 | D092 | E1B280 | 0 | 1 | В | ᲀ |
            | 0432 | 1C80 | D0B2 | E1B280 | 0 | 1 | в | ᲀ |
            | 1C81 | 0414 | E1B281 | D094 | 0 | 1 | ᲁ | Д |
            | 1C81 | 0434 | E1B281 | D0B4 | 0 | 1 | ᲁ | д |
            | 0414 | 1C81 | D094 | E1B281 | 0 | 1 | Д | ᲁ |
            | 0434 | 1C81 | D0B4 | E1B281 | 0 | 1 | д | ᲁ |
            | 1C82 | 041E | E1B282 | D09E | 0 | 1 | ᲂ | О |
            | 1C82 | 043E | E1B282 | D0BE | 0 | 1 | ᲂ | о |
            | 041E | 1C82 | D09E | E1B282 | 0 | 1 | О | ᲂ |
            | 043E | 1C82 | D0BE | E1B282 | 0 | 1 | о | ᲂ |
            | 1C83 | 0421 | E1B283 | D0A1 | 0 | 1 | ᲃ | С |
            | 1C83 | 0441 | E1B283 | D181 | 0 | 1 | ᲃ | с |
            | 0421 | 1C83 | D0A1 | E1B283 | 0 | 1 | С | ᲃ |
            | 0441 | 1C83 | D181 | E1B283 | 0 | 1 | с | ᲃ |
            | 1C84 | 0422 | E1B284 | D0A2 | 0 | 1 | ᲄ | Т |
            | 1C85 | 0422 | E1B285 | D0A2 | 0 | 1 | ᲅ | Т |
            | 1C84 | 0442 | E1B284 | D182 | 0 | 1 | ᲄ | т |
            | 1C85 | 0442 | E1B285 | D182 | 0 | 1 | ᲅ | т |
            | 0422 | 1C84 | D0A2 | E1B284 | 0 | 1 | Т | ᲄ |
            | 0442 | 1C84 | D182 | E1B284 | 0 | 1 | т | ᲄ |
            | 1C85 | 1C84 | E1B285 | E1B284 | 0 | 1 | ᲅ | ᲄ |
            | 0422 | 1C85 | D0A2 | E1B285 | 0 | 1 | Т | ᲅ |
            | 0442 | 1C85 | D182 | E1B285 | 0 | 1 | т | ᲅ |
            | 1C84 | 1C85 | E1B284 | E1B285 | 0 | 1 | ᲄ | ᲅ |
            | A64A | 1C88 | EA998A | E1B288 | 0 | 1 | Ꙋ | ᲈ |
            | A64B | 1C88 | EA998B | E1B288 | 0 | 1 | ꙋ | ᲈ |
            | 1C88 | A64A | E1B288 | EA998A | 0 | 1 | ᲈ | Ꙋ |
            | 1C88 | A64B | E1B288 | EA998B | 0 | 1 | ᲈ | ꙋ |
            | 1C86 | 042A | E1B286 | D0AA | 0 | 1 | ᲆ | Ъ |
            | 1C86 | 044A | E1B286 | D18A | 0 | 1 | ᲆ | ъ |
            | 042A | 1C86 | D0AA | E1B286 | 0 | 1 | Ъ | ᲆ |
            | 044A | 1C86 | D18A | E1B286 | 0 | 1 | ъ | ᲆ |
            | 1C87 | 0462 | E1B287 | D1A2 | 0 | 1 | ᲇ | Ѣ |
            | 1C87 | 0463 | E1B287 | D1A3 | 0 | 1 | ᲇ | ѣ |
            | 0462 | 1C87 | D1A2 | E1B287 | 0 | 1 | Ѣ | ᲇ |
            | 0463 | 1C87 | D1A3 | E1B287 | 0 | 1 | ѣ | ᲇ |
            +-----------+-----------+--------+--------+----------+----------+------+------+
            {noformat}


            Note, toupper comparison considers more pairs as equal than tolower comparison:
            {code:sql}
            SELECT SUM(eq_lower), SUM(eq_upper) FROM t21;
            {code}
            {noformat}
            +---------------+---------------+
            | SUM(eq_lower) | SUM(eq_upper) |
            +---------------+---------------+
            | 21 | 96 |
            +---------------+---------------+
            {noformat}

            h2. A more compact table with distinct pairs (the character with a smaller code point is on the left)
            {code:sql}
            SELECT
              HEX(CONVERT(a USING ucs2)) AS unicode_a,
              HEX(CONVERT(b USING ucs2)) AS unicode_b,
              HEX(a), HEX(b),
              BINARY LOWER(a)=LOWER(b) AS eq_lower,
              BINARY UPPER(a)=UPPER(b) AS eq_upper,
              a,b
            FROM
            (
              SELECT DISTINCT IF(BINARY a<b,a,b) AS a,IF(binary a>=b,a,b) AS b from t21
            ) d1
            ORDER BY eq_lower, unicode_a, unicode_b;
            {code}
            {noformat}
            +-----------+-----------+--------+--------+----------+----------+-----+-----+
            | unicode_a | unicode_b | HEX(a) | HEX(b) | eq_lower | eq_upper | a | b |
            +-----------+-----------+--------+--------+----------+----------+-----+-----+
            | 0049 | 0131 | 49 | C4B1 | 0 | 1 | I | ı |
            | 0053 | 017F | 53 | C5BF | 0 | 1 | S | ſ |
            | 00B5 | 039C | C2B5 | CE9C | 0 | 1 | µ | Μ |
            | 0345 | 0399 | CD85 | CE99 | 0 | 1 | ͅ | Ι |
            | 0392 | 03D0 | CE92 | CF90 | 0 | 1 | Β | ϐ |
            | 0395 | 03F5 | CE95 | CFB5 | 0 | 1 | Ε | ϵ |
            | 0398 | 03D1 | CE98 | CF91 | 0 | 1 | Θ | ϑ |
            | 0399 | 1FBE | CE99 | E1BEBE | 0 | 1 | Ι | ι |
            | 039A | 03F0 | CE9A | CFB0 | 0 | 1 | Κ | ϰ |
            | 03A0 | 03D6 | CEA0 | CF96 | 0 | 1 | Π | ϖ |
            | 03A1 | 03F1 | CEA1 | CFB1 | 0 | 1 | Ρ | ϱ |
            | 03A3 | 03C2 | CEA3 | CF82 | 0 | 1 | Σ | ς |
            | 03A6 | 03D5 | CEA6 | CF95 | 0 | 1 | Φ | ϕ |
            | 0412 | 1C80 | D092 | E1B280 | 0 | 1 | В | ᲀ |
            | 0414 | 1C81 | D094 | E1B281 | 0 | 1 | Д | ᲁ |
            | 041E | 1C82 | D09E | E1B282 | 0 | 1 | О | ᲂ |
            | 0421 | 1C83 | D0A1 | E1B283 | 0 | 1 | С | ᲃ |
            | 0422 | 1C84 | D0A2 | E1B284 | 0 | 1 | Т | ᲄ |
            | 042A | 1C86 | D0AA | E1B286 | 0 | 1 | Ъ | ᲆ |
            | 0462 | 1C87 | D1A2 | E1B287 | 0 | 1 | Ѣ | ᲇ |
            | 1C88 | A64A | E1B288 | EA998A | 0 | 1 | ᲈ | Ꙋ |
            | 0049 | 0130 | 49 | C4B0 | 1 | 0 | I | İ |
            | 004B | 212A | 4B | E284AA | 1 | 0 | K | K |
            | 00C5 | 212B | C385 | E284AB | 1 | 0 | Å | Å |
            | 00DF | 1E9E | C39F | E1BA9E | 1 | 0 | ß | ẞ |
            | 03A9 | 2126 | CEA9 | E284A6 | 1 | 0 | Ω | Ω |
            +-----------+-----------+--------+--------+----------+----------+-----+-----+
            26 rows in set (0.005 sec)
            {noformat}

            h2. Cons of toupper vs tolower comparison
            - The tolower variant will compare exactly like strcasecmp() did
            - The toupper variant will compare close to how utf8mb3_general_ci works
            bar Alexander Barkov made changes -
            ParadoxV5 Jimmy Hú made changes -
            ParadoxV5 Jimmy Hú made changes -
            ParadoxV5 Jimmy Hú made changes -
            Roel Roel Van de Paar made changes -

            People

              bar Alexander Barkov
              bar Alexander Barkov
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.