Details

    Description

      MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

      For example, these three scripts:

      CREATE TABLE t1 (a CHAR(5));
      INSERT INTO t1 VALUES ('a'),('a     ');
      SELECT * FROM t1 WHERE a='a ';
      

      CREATE TABLE t1 (a VARCHAR(5));
      INSERT INTO t1 VALUES ('a'),('a     ');
      SELECT * FROM t1 WHERE a='a ';
      

      CREATE TABLE t1 (a TEXT);
      INSERT INTO t1 VALUES ('a'),('a     ');
      SELECT * FROM t1 WHERE a='a ';
      

      ignore trailing spaces in the column value and in the string constant and return two rows as a result.

      This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD.

      The SQL standard says:

      The comparison of two character string expressions depends on the collation used for the comparison (see Subclause 9.13, “Collation determination”). When values of unequal length are compared, if the collation for the comparison has the NO PAD characteristic and the shorter value is equal to some prefix of the longer value, then the shorter value is considered less than the longer value. If the collation for the comparison has the PAD SPACE characteristic, for the purposes of the comparison, the shorter value is effectively extended to the length of the longer by concatenation of <space>s on the right.

      MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

      In certain cases it would be nice to take trailing spaces into account.

      Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

      Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.

      New collation names will have the nopad substring. The list of all new desired collations can be extracted using this SQL query:

      SELECT
        CHARACTER_SET_NAME,
        REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
        CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
      FROM INFORMATION_SCHEMA.CHARACTER_SETS
      WHERE CHARACTER_SET_NAME<>'binary';
      

      +--------------------+---------------------------+--------------------+
      | CHARACTER_SET_NAME | NEW_NAME1                 | NEW_NAME2          |
      +--------------------+---------------------------+--------------------+
      | big5               | big5_chinese_nopad_ci     | big5_nopad_bin     |
      | dec8               | dec8_swedish_nopad_ci     | dec8_nopad_bin     |
      | cp850              | cp850_general_nopad_ci    | cp850_nopad_bin    |
      | hp8                | hp8_english_nopad_ci      | hp8_nopad_bin      |
      | koi8r              | koi8r_general_nopad_ci    | koi8r_nopad_bin    |
      | latin1             | latin1_swedish_nopad_ci   | latin1_nopad_bin   |
      | latin2             | latin2_general_nopad_ci   | latin2_nopad_bin   |
      | swe7               | swe7_swedish_nopad_ci     | swe7_nopad_bin     |
      | ascii              | ascii_general_nopad_ci    | ascii_nopad_bin    |
      | ujis               | ujis_japanese_nopad_ci    | ujis_nopad_bin     |
      | sjis               | sjis_japanese_nopad_ci    | sjis_nopad_bin     |
      | hebrew             | hebrew_general_nopad_ci   | hebrew_nopad_bin   |
      | tis620             | tis620_thai_nopad_ci      | tis620_nopad_bin   |
      | euckr              | euckr_korean_nopad_ci     | euckr_nopad_bin    |
      | koi8u              | koi8u_general_nopad_ci    | koi8u_nopad_bin    |
      | gb2312             | gb2312_chinese_nopad_ci   | gb2312_nopad_bin   |
      | greek              | greek_general_nopad_ci    | greek_nopad_bin    |
      | cp1250             | cp1250_general_nopad_ci   | cp1250_nopad_bin   |
      | gbk                | gbk_chinese_nopad_ci      | gbk_nopad_bin      |
      | latin5             | latin5_turkish_nopad_ci   | latin5_nopad_bin   |
      | armscii8           | armscii8_general_nopad_ci | armscii8_nopad_bin |
      | utf8               | utf8_general_nopad_ci     | utf8_nopad_bin     |
      | ucs2               | ucs2_general_nopad_ci     | ucs2_nopad_bin     |
      | cp866              | cp866_general_nopad_ci    | cp866_nopad_bin    |
      | keybcs2            | keybcs2_general_nopad_ci  | keybcs2_nopad_bin  |
      | macce              | macce_general_nopad_ci    | macce_nopad_bin    |
      | macroman           | macroman_general_nopad_ci | macroman_nopad_bin |
      | cp852              | cp852_general_nopad_ci    | cp852_nopad_bin    |
      | latin7             | latin7_general_nopad_ci   | latin7_nopad_bin   |
      | utf8mb4            | utf8mb4_general_nopad_ci  | utf8mb4_nopad_bin  |
      | cp1251             | cp1251_general_nopad_ci   | cp1251_nopad_bin   |
      | utf16              | utf16_general_nopad_ci    | utf16_nopad_bin    |
      | utf16le            | utf16le_general_nopad_ci  | utf16le_nopad_bin  |
      | cp1256             | cp1256_general_nopad_ci   | cp1256_nopad_bin   |
      | cp1257             | cp1257_general_nopad_ci   | cp1257_nopad_bin   |
      | utf32              | utf32_general_nopad_ci    | utf32_nopad_bin    |
      | geostd8            | geostd8_general_nopad_ci  | geostd8_nopad_bin  |
      | cp932              | cp932_japanese_nopad_ci   | cp932_nopad_bin    |
      | eucjpms            | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin  |
      +--------------------+---------------------------+--------------------+
      39 rows in set (0.00 sec)
      

      Implementation details

      Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
      utf8_general_ci is implemented in strings/ctype-utf8.c.
      Adding utf8_general_nopad_ci can be done in these three steps:

      1. Add a new collation handler

      Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

      static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
      {
          NULL,               /* init */
          my_strnncoll_utf8_general_ci,
          my_strnncollsp_utf8_general_ci,
          my_strnxfrm_unicode,
          my_strnxfrmlen_unicode,
          my_like_range_mb,
          my_wildcmp_utf8,
          my_strcasecmp_utf8,
          my_instr_mb,
          my_hash_sort_utf8,
          my_propagate_complex
      };
      

      and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

      • my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
      • my_hash_sort_utf8 - this is used for HASH indexes
      • my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

      All other functions can be reused from the existing PAD SPACE collation.

      So the new handler will look about like this:

      static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
      {
        NULL,               /* init */
        my_strnncoll_utf8_general_ci,
        my_strnncollsp_utf8_general_nopad_ci, /* a new function */
        my_strnxfrm_unicode_nopad,           /* a new function */
        my_strnxfrmlen_unicode,
        my_like_range_mb,
        my_wildcmp_utf8,
        my_strcasecmp_utf8,
        my_instr_mb,
        my_hash_sort_utf8_nopad,              /* a new function */
        my_propagate_complex
      };
      

      2. Add a new collation definition by copying it from my_charset_utf8_general_ci

      The new definition will look like this:

      struct charset_info_st my_charset_utf8_general_nopad_ci=
      {
        333,0,0,             /* number       */
        MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD,  /* state  */
        "utf8",             /* cs name      */
        "utf8_general_nopad_ci",  /* name         */
        "",                 /* comment      */
        NULL,               /* tailoring    */
        ctype_utf8,         /* ctype        */
        to_lower_utf8,      /* to_lower     */
        to_upper_utf8,      /* to_upper     */
        to_upper_utf8,      /* sort_order   */
        NULL,               /* uca          */
        NULL,               /* tab_to_uni   */
        NULL,               /* tab_from_uni */
        &my_unicase_default,/* caseinfo     */
        NULL,               /* state_map    */
        NULL,               /* ident_map    */
        1,                  /* strxfrm_multiply */
        1,                  /* caseup_multiply  */
        1,                  /* casedn_multiply  */
        1,                  /* mbminlen     */
        3,                  /* mbmaxlen     */
        0,                  /* min_sort_char */
        0xFFFF,             /* max_sort_char */
        ' ',                /* pad char      */
        0,                  /* escape_with_backslash_is_dangerous */
        1,                  /* levels_for_order   */
        &my_charset_utf8_handler,
        &my_collation_utf8_general_nopad_ci_handler
      };
      

      Notice, it looks very similar to the definition of my_charset_utf8_general_ci but

      • has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
      • has a new distinct name: utf8_general_nopad_ci
      • has an extra MY_CS_NOPAD flag
      • does not have MY_CS_PRIMARY in flags
      • uses the new collation handler my_collation_utf8_general_nopad_ci_handler instead of my_collation_utf8_general_ci_handler

      3. Add a collation initialization code

      Open mysys/charset-def.c and add an initialization line like this:

      #ifdef HAVE_CHARSET_utf8
        add_compiled_collation(&my_charset_utf8_general_ci);
        add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
        add_compiled_collation(&my_charset_utf8_bin);
        add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
      

      Collations for the other Unicode and Asian character sets and for latin1 are to be added using about the same steps (some minor details may differ though).
      Collations for 8-bit character sets are to be done in a different way, see section 4.

      Testing

      The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
      Tests will cover all new collations, in all parts of SQL queries involving comparison, including:

      • Unique indexes
      • UNION DISTINCT
      • DISTINCT (with and without indexes)
      • ORDER BY (with and without indexes)
      • GROUP BY (with and without indexes)
      • Mixing NO PAD and PAD collations
      • Aggregate functions
        • MIIN(), MAX()
        • COUNT(DISTINCT)
        • GROUP_CONCAT(DISTINCT)
      • SQL functions involving comparison
        • LEAST(), GREATEST()
        • IF()
        • NULLIF()
        • CASE WHEN a = b THEN ...
        • CASE a WHEN a0 THEN ...

      Tests should cover MyISAM, HEAP and InnoDB tables.

      To simplify adding test, we'll create a shared include file, say mysql-test/include/ctype_pad.inc,
      which will be then included in all mysql-test/t/ctype_xxx.test files for individual character sets. See ctype_regex.inc as an example of such shared file.

      Proposed development order

      As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
      Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:

      • implementation, as described in "Implementation details" (for Unicode and Asian character sets and for latin1) and in section 4 (for 8-bit character sets)
      • Tests, as described in "Testing"

      1. Collations for the Unicode character sets:

      • utf8
      • utf8mb4
      • utf16
      • utf16le
      • utf32

      2. Collations for the Asian character sets

      • big5
      • cp932
      • eucjpms
      • euckr
      • gb2312
      • gbk
      • sjis
      • ujis

      3. Collations for latin1

      Unlike all other 8-bit character sets, latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
      This step will start with introducing new shared handlers:

      • my_collation_8bit_simple_nopad_ci_handler
      • my_collation_8bit_nopad_bin_handler

      and use these handlers to actually add new collations for latin1.
      Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

      4. Collations for the other 8-bit character sets:

      • dec8
      • cp850
      • hp8
      • koi8r
      • latin2
      • swe7
      • ascii
      • hebrew
      • tis620
      • koi8u
      • greek
      • cp1250
      • latin5
      • armscii8
      • cp866
      • keybcs2
      • macce
      • macroman
      • cp852
      • latin7
      • cp1251
      • cp1256
      • cp1257
      • geostd8

      8-bit collations are defined in ctype-extra.c.
      Unlike all other ctype-xxxx.c files, this file is not manually written. It's generated from collation definition files sql/share/charsets/*.xml.

      Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:

      cd strings
      make conf_to_src
      ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
      # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
      mv ctype-extra2.c ctype-extra.c
      

      New 8-bit collations should also be generated from the collation definition files.

      We'll need the following changes:

      4a. Make collation definition loader understand a new "nopad" flag.

      The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml can look like this (notice a new line highlighted):

      <charset name="greek">
        ...
        <collation name="greek_general_ci"    id="25" order="Greek"   flag="primary"/>
        <collation name="greek_bin"           id="70" order="Binary"  flag="binary"/>
        <collation name="greek_nopad_bin"     id="xxx"  flag="binary" flag="nopad"/>
      </charset>
      

      4b. Make collation definition loader reuse weight tables

      New _ci collations should use exactly the same collating weights with their PAD SPACE counterparts.
      For example, a collating weight table for greek_general_ci resides in greek.xml and looks like this:

      <collation name="greek_general_ci">
      <map>
       00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
       10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
       20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
       30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
       40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
       50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
       60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
       50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
       80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
       90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
       A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
       B0 B1 B2 B3 B4 B5 C1 B7 C5 C7 C9 BB CF BD D5 D9
       C9 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
       D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 C9 D5 C1 C5 C7 C9
       D5 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
       D0 D1 D3 D3 D4 D5 D6 D7 D8 D9 C9 D5 CF D5 D9 FF
      </map>
      </collation>
      

      It consists of 256 weights, one weight per code. The first weight corresponds the character with the code 0x00, the last weight corresponds to the character with the code 0xFF.
      Notice, the characters 0x41 and 0x61 (Latin letters 'a' and 'A') have the same weight of 0x41. This makes the collation case insensitive.

      The easiest way would be just to copy-and-paste the weight table definition and use the copy with a new name greek_general_nopad_ci.
      But it would be nice to avoid duplication of definitions. We'll extend ctype.c to understand references to other collations instead of explicit weight table definitions, so a new block for the "greek" character set can look like this:

      <charset name="greek">
        ...
        <collation name="greek_general_ci"    id="25" order="Greek"   flag="primary"/>
        <collation name="greek_bin"           id="70" order="Binary"  flag="binary"/>
        <collation name="greek_general_nopad_ci"    id="xxx"  flag="nopad" map="greek_general_ci"/>
        <collation name="greek_nopad_bin"           id="xxx"  flag="binary" flag="nopad"/>
      </charset>
      

      Notice, the "map" attribute has another collation name rather than a full weight table definition.

      4c. Add new 8-bit collations

      At this sub-step we'll use the changes made in the collation definition loader mentioned in 4a and 4b, and do the following:

      • Edit Index.xml:
        • Add new _nopad_bin collations to all 8-bit character sets, as described in the example for "greek" in 4a (one collation per character set)
        • Add new _nopad_ci collations to all 8-bit character sets, as described in the example for "greek" in 4b (one collation per character set)
      • Extend this block in conf_to_src.c:

          if (cs->state & MY_CS_BINSORT)
            fprintf(f,"  &my_collation_8bit_bin_handler,\n");
          else
            fprintf(f,"  &my_collation_8bit_simple_ci_handler,\n");
        

        to check the MY_CS_NOPAD flag and print my_collation_8bit_nopad_bin_handler and my_collation_8bit_simple_nopad_ci_handler (which we added at step 3) when MY_CS_NOPAD is set.

      • Regenerate ctype-extra.c, as described in the beginning of the section 4.
      • Add tests for the new 8bit collations.

      Attachments

        Issue Links

          Activity

            bar Alexander Barkov created issue -
            bar Alexander Barkov made changes -
            Field Original Value New Value
            Description MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its pad attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.
            We want to add NO PAD variants for all default collations and for all _bin collations.

            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT CHARACTER_SET_NAME, REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1, CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2 FROM INFORMATION_SCHEMA.CHARACTER_SETS WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {code}


            Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8, /* a new function */
              my_propagate_complex
            };
            {code}


            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:


            {code}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              888,0,0, /* number */
              MY_CS_COMPILED|MY_CS_PRIMARY|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID=333 (the exact ID may change)
            - has an extra MY_CS_NOPAD flag
            - uses the new collation handler my_collation_utf8_general_nopad_ci_handler instead of my_collation_utf8_general_ci_handler

            3. Add a collation initialization code

            Open mysys/charset-def.c and add an initialization line like this:

            {code}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}


            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).
            MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its pad attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.
            We want to add NO PAD variants for all default collations and for all _bin collations.

            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {code}


            Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8, /* a new function */
              my_propagate_complex
            };
            {code}


            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:


            {code}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              888,0,0, /* number */
              MY_CS_COMPILED|MY_CS_PRIMARY|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID=333 (the exact ID may change)
            - has an extra MY_CS_NOPAD flag
            - uses the new collation handler my_collation_utf8_general_nopad_ci_handler instead of my_collation_utf8_general_ci_handler

            3. Add a collation initialization code

            Open mysys/charset-def.c and add an initialization line like this:

            {code}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}


            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).
            bar Alexander Barkov made changes -
            Description MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its pad attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.
            We want to add NO PAD variants for all default collations and for all _bin collations.

            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {code}


            Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8, /* a new function */
              my_propagate_complex
            };
            {code}


            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:


            {code}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              888,0,0, /* number */
              MY_CS_COMPILED|MY_CS_PRIMARY|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID=333 (the exact ID may change)
            - has an extra MY_CS_NOPAD flag
            - uses the new collation handler my_collation_utf8_general_nopad_ci_handler instead of my_collation_utf8_general_ci_handler

            3. Add a collation initialization code

            Open mysys/charset-def.c and add an initialization line like this:

            {code}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}


            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).
            MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its pad attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.
            We want to add NO PAD variants for all default collations and for all _bin collations.

            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8, /* a new function */
              my_propagate_complex
            };
            {code}


            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:


            {code}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              888,0,0, /* number */
              MY_CS_COMPILED|MY_CS_PRIMARY|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID=333 (the exact ID may change)
            - has an extra MY_CS_NOPAD flag
            - uses the new collation handler my_collation_utf8_general_nopad_ci_handler instead of my_collation_utf8_general_ci_handler

            3. Add a collation initialization code

            Open mysys/charset-def.c and add an initialization line like this:

            {code}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}


            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).
            bar Alexander Barkov made changes -
            Description MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its pad attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.
            We want to add NO PAD variants for all default collations and for all _bin collations.

            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8, /* a new function */
              my_propagate_complex
            };
            {code}


            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:


            {code}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              888,0,0, /* number */
              MY_CS_COMPILED|MY_CS_PRIMARY|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID=333 (the exact ID may change)
            - has an extra MY_CS_NOPAD flag
            - uses the new collation handler my_collation_utf8_general_nopad_ci_handler instead of my_collation_utf8_general_ci_handler

            3. Add a collation initialization code

            Open mysys/charset-def.c and add an initialization line like this:

            {code}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}


            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).
            MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its pad attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.
            We want to add NO PAD variants for all default collations and for all _bin collations.

            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8, /* a new function */
              my_propagate_complex
            };
            {code}


            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:


            {code}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              888,0,0, /* number */
              MY_CS_COMPILED|MY_CS_PRIMARY|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID=333 (the exact ID may change)
            - has an extra MY_CS_NOPAD flag
            - uses the new collation handler my_collation_utf8_general_nopad_ci_handler instead of my_collation_utf8_general_ci_handler

            3. Add a collation initialization code

            Open mysys/charset-def.c and add an initialization line like this:

            {code}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}


            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).
            bar Alexander Barkov made changes -
            Description MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its pad attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.
            We want to add NO PAD variants for all default collations and for all _bin collations.

            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8, /* a new function */
              my_propagate_complex
            };
            {code}


            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:


            {code}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              888,0,0, /* number */
              MY_CS_COMPILED|MY_CS_PRIMARY|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID=333 (the exact ID may change)
            - has an extra MY_CS_NOPAD flag
            - uses the new collation handler my_collation_utf8_general_nopad_ci_handler instead of my_collation_utf8_general_ci_handler

            3. Add a collation initialization code

            Open mysys/charset-def.c and add an initialization line like this:

            {code}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}


            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).
            MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its pad attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.
            We want to add NO PAD variants for all default collations and for all _bin collations.

            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8, /* a new function */
              my_propagate_complex
            };
            {code}


            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:


            {code}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              888,0,0, /* number */
              MY_CS_COMPILED|MY_CS_PRIMARY|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID=333 (the exact ID may change)
            - has an extra MY_CS_NOPAD flag
            - uses the new collation handler my_collation_utf8_general_nopad_ci_handler instead of my_collation_utf8_general_ci_handler

            3. Add a collation initialization code

            Open mysys/charset-def.c and add an initialization line like this:

            {code}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}


            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).
            bar Alexander Barkov made changes -
            Description MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its pad attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.
            We want to add NO PAD variants for all default collations and for all _bin collations.

            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8, /* a new function */
              my_propagate_complex
            };
            {code}


            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:


            {code}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              888,0,0, /* number */
              MY_CS_COMPILED|MY_CS_PRIMARY|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID=333 (the exact ID may change)
            - has an extra MY_CS_NOPAD flag
            - uses the new collation handler my_collation_utf8_general_nopad_ci_handler instead of my_collation_utf8_general_ci_handler

            3. Add a collation initialization code

            Open mysys/charset-def.c and add an initialization line like this:

            {code}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}


            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).
            MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its pad attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.
            We want to add NO PAD variants for all default collations and for all _bin collations.

            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}


            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:


            {code}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              888,0,0, /* number */
              MY_CS_COMPILED|MY_CS_PRIMARY|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID=333 (the exact ID may change)
            - has an extra MY_CS_NOPAD flag
            - uses the new collation handler my_collation_utf8_general_nopad_ci_handler instead of my_collation_utf8_general_ci_handler

            3. Add a collation initialization code

            Open mysys/charset-def.c and add an initialization line like this:

            {code}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}


            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).
            bar Alexander Barkov made changes -
            Description MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its pad attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.
            We want to add NO PAD variants for all default collations and for all _bin collations.

            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}


            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:


            {code}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              888,0,0, /* number */
              MY_CS_COMPILED|MY_CS_PRIMARY|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID=333 (the exact ID may change)
            - has an extra MY_CS_NOPAD flag
            - uses the new collation handler my_collation_utf8_general_nopad_ci_handler instead of my_collation_utf8_general_ci_handler

            3. Add a collation initialization code

            Open mysys/charset-def.c and add an initialization line like this:

            {code}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}


            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).
            MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its pad attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.
            We want to add NO PAD variants for all default collations and for all _bin collations.

            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}


            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:


            {code}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              888,0,0, /* number */
              MY_CS_COMPILED|MY_CS_PRIMARY|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID=333 (the exact ID may change)
            - has an extra MY_CS_NOPAD flag
            - uses the new collation handler my_collation_utf8_general_nopad_ci_handler instead of my_collation_utf8_general_ci_handler

            3. Add a collation initialization code

            Open mysys/charset-def.c and add an initialization line like this:

            {code}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}


            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).
            bar Alexander Barkov made changes -
            Description MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its pad attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.
            We want to add NO PAD variants for all default collations and for all _bin collations.

            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}


            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:


            {code}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              888,0,0, /* number */
              MY_CS_COMPILED|MY_CS_PRIMARY|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID=333 (the exact ID may change)
            - has an extra MY_CS_NOPAD flag
            - uses the new collation handler my_collation_utf8_general_nopad_ci_handler instead of my_collation_utf8_general_ci_handler

            3. Add a collation initialization code

            Open mysys/charset-def.c and add an initialization line like this:

            {code}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}


            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).
            MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its pad attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.
            We want to add NO PAD variants for all default collations and for all _bin collations.

            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}


            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:


            {code}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              888,0,0, /* number */
              MY_CS_COMPILED|MY_CS_PRIMARY|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact ID may change)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra MY_CS_NOPAD flag
            - uses the new collation handler my_collation_utf8_general_nopad_ci_handler instead of my_collation_utf8_general_ci_handler

            3. Add a collation initialization code

            Open mysys/charset-def.c and add an initialization line like this:

            {code}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}


            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).
            bar Alexander Barkov made changes -
            Description MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its pad attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.
            We want to add NO PAD variants for all default collations and for all _bin collations.

            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}


            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:


            {code}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              888,0,0, /* number */
              MY_CS_COMPILED|MY_CS_PRIMARY|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact ID may change)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra MY_CS_NOPAD flag
            - uses the new collation handler my_collation_utf8_general_nopad_ci_handler instead of my_collation_utf8_general_ci_handler

            3. Add a collation initialization code

            Open mysys/charset-def.c and add an initialization line like this:

            {code}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}


            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).
            MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its pad attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.
            We want to add NO PAD variants for all default collations and for all _bin collations.

            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}


            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:


            {code}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              888,0,0, /* number */
              MY_CS_COMPILED|MY_CS_PRIMARY|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact ID may change)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra MY_CS_NOPAD flag
            - uses the new collation handler my_collation_utf8_general_nopad_ci_handler (instead of my_collation_utf8_general_ci_handler)

            3. Add a collation initialization code

            Open mysys/charset-def.c and add an initialization line like this:

            {code}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}


            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).
            serg Sergei Golubchik made changes -
            Description MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its pad attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.
            We want to add NO PAD variants for all default collations and for all _bin collations.

            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}


            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:


            {code}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              888,0,0, /* number */
              MY_CS_COMPILED|MY_CS_PRIMARY|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact ID may change)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra MY_CS_NOPAD flag
            - uses the new collation handler my_collation_utf8_general_nopad_ci_handler (instead of my_collation_utf8_general_ci_handler)

            3. Add a collation initialization code

            Open mysys/charset-def.c and add an initialization line like this:

            {code}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}


            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).
            MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its pad attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.
            We want to add NO PAD variants for all default collations and for all _bin collations.

            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_PRIMARY|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID=333 (the exact ID may change)
            - has an extra {{MY_CS_NOPAD}} flag
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).
            serg Sergei Golubchik made changes -
            Labels gsoc16
            bar Alexander Barkov made changes -
            Description MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its pad attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.
            We want to add NO PAD variants for all default collations and for all _bin collations.

            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_PRIMARY|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID=333 (the exact ID may change)
            - has an extra {{MY_CS_NOPAD}} flag
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).
            MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its pad attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.
            We want to add NO PAD variants for all default collations and for all _bin collations.

            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_PRIMARY|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact ID may change)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).
            bar Alexander Barkov made changes -
            Description MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its pad attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.
            We want to add NO PAD variants for all default collations and for all _bin collations.

            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_PRIMARY|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact ID may change)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).
            MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its pad attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.
            We want to add NO PAD variants for all default collations and for all _bin collations.

            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_PRIMARY|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).
            bar Alexander Barkov made changes -
            Description MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its pad attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.
            We want to add NO PAD variants for all default collations and for all _bin collations.

            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_PRIMARY|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).
            MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its pad attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.
            We want to add NO PAD variants for all default collations and for all _bin collations.

            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_PRIMARY|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).
            bar Alexander Barkov made changes -
            Description MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its pad attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.
            We want to add NO PAD variants for all default collations and for all _bin collations.

            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_PRIMARY|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).
            MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its pad attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.
            We want to add NO PAD variants for all default collations and for all _bin collations.

            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_PRIMARY|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries including:
            - Unique indexes
            - DISTINCT
            - UNION
            - ORDER BY with indexes
            - ORDER BY without indexes
            - GROUP BY
            - Mixing NO PAD and PAD collations
            - Function output
            - GROUP_CONCAT(DISTINCT)

            Tests should cover MyISAM, HEAP and InnoDB tables.
            bar Alexander Barkov made changes -
            Description MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its pad attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.
            We want to add NO PAD variants for all default collations and for all _bin collations.

            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_PRIMARY|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries including:
            - Unique indexes
            - DISTINCT
            - UNION
            - ORDER BY with indexes
            - ORDER BY without indexes
            - GROUP BY
            - Mixing NO PAD and PAD collations
            - Function output
            - GROUP_CONCAT(DISTINCT)

            Tests should cover MyISAM, HEAP and InnoDB tables.
            MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its pad attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.
            We want to add NO PAD variants for all default collations and for all _bin collations.

            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_PRIMARY|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries including:
            - Unique indexes
            - DISTINCT
            - UNION
            - ORDER BY with indexes
            - ORDER BY without indexes
            - GROUP BY
            - Mixing NO PAD and PAD collations
            - Function output
            - GROUP_CONCAT(DISTINCT)

            Tests should cover MyISAM, HEAP and InnoDB tables.
            bar Alexander Barkov made changes -
            Description MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its pad attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.
            We want to add NO PAD variants for all default collations and for all _bin collations.

            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_PRIMARY|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries including:
            - Unique indexes
            - DISTINCT
            - UNION
            - ORDER BY with indexes
            - ORDER BY without indexes
            - GROUP BY
            - Mixing NO PAD and PAD collations
            - Function output
            - GROUP_CONCAT(DISTINCT)

            Tests should cover MyISAM, HEAP and InnoDB tables.
            MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.
            We want to add NO PAD variants for all default collations and for all _bin collations.

            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_PRIMARY|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries including:
            - Unique indexes
            - DISTINCT
            - UNION
            - ORDER BY with indexes
            - ORDER BY without indexes
            - GROUP BY
            - Mixing NO PAD and PAD collations
            - Function output
            - GROUP_CONCAT(DISTINCT)

            Tests should cover MyISAM, HEAP and InnoDB tables.
            bar Alexander Barkov made changes -
            Description MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.
            We want to add NO PAD variants for all default collations and for all _bin collations.

            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_PRIMARY|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries including:
            - Unique indexes
            - DISTINCT
            - UNION
            - ORDER BY with indexes
            - ORDER BY without indexes
            - GROUP BY
            - Mixing NO PAD and PAD collations
            - Function output
            - GROUP_CONCAT(DISTINCT)

            Tests should cover MyISAM, HEAP and InnoDB tables.
            MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.
            We want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for the default and the _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_PRIMARY|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries including:
            - Unique indexes
            - DISTINCT
            - UNION
            - ORDER BY with indexes
            - ORDER BY without indexes
            - GROUP BY
            - Mixing NO PAD and PAD collations
            - Function output
            - GROUP_CONCAT(DISTINCT)

            Tests should cover MyISAM, HEAP and InnoDB tables.
            bar Alexander Barkov made changes -
            Description MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.
            We want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for the default and the _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_PRIMARY|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries including:
            - Unique indexes
            - DISTINCT
            - UNION
            - ORDER BY with indexes
            - ORDER BY without indexes
            - GROUP BY
            - Mixing NO PAD and PAD collations
            - Function output
            - GROUP_CONCAT(DISTINCT)

            Tests should cover MyISAM, HEAP and InnoDB tables.
            MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.
            We want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for the default and the _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_PRIMARY|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries including:
            - Unique indexes
            - DISTINCT
            - UNION
            - ORDER BY with indexes
            - ORDER BY without indexes
            - GROUP BY
            - Mixing NO PAD and PAD collations
            - Function output
            - GROUP_CONCAT(DISTINCT)

            Tests should cover MyISAM, HEAP and InnoDB tables.
            bar Alexander Barkov made changes -
            Description MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.
            We want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for the default and the _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_PRIMARY|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries including:
            - Unique indexes
            - DISTINCT
            - UNION
            - ORDER BY with indexes
            - ORDER BY without indexes
            - GROUP BY
            - Mixing NO PAD and PAD collations
            - Function output
            - GROUP_CONCAT(DISTINCT)

            Tests should cover MyISAM, HEAP and InnoDB tables.
            MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for the default and the _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_PRIMARY|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries including:
            - Unique indexes
            - DISTINCT
            - UNION
            - ORDER BY with indexes
            - ORDER BY without indexes
            - GROUP BY
            - Mixing NO PAD and PAD collations
            - Function output
            - GROUP_CONCAT(DISTINCT)

            Tests should cover MyISAM, HEAP and InnoDB tables.
            bar Alexander Barkov made changes -
            Description MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for the default and the _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_PRIMARY|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries including:
            - Unique indexes
            - DISTINCT
            - UNION
            - ORDER BY with indexes
            - ORDER BY without indexes
            - GROUP BY
            - Mixing NO PAD and PAD collations
            - Function output
            - GROUP_CONCAT(DISTINCT)

            Tests should cover MyISAM, HEAP and InnoDB tables.
            MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_PRIMARY|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries including:
            - Unique indexes
            - DISTINCT
            - UNION
            - ORDER BY with indexes
            - ORDER BY without indexes
            - GROUP BY
            - Mixing NO PAD and PAD collations
            - Function output
            - GROUP_CONCAT(DISTINCT)

            Tests should cover MyISAM, HEAP and InnoDB tables.
            bar Alexander Barkov made changes -
            Description MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_PRIMARY|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries including:
            - Unique indexes
            - DISTINCT
            - UNION
            - ORDER BY with indexes
            - ORDER BY without indexes
            - GROUP BY
            - Mixing NO PAD and PAD collations
            - Function output
            - GROUP_CONCAT(DISTINCT)

            Tests should cover MyISAM, HEAP and InnoDB tables.
            MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries including:
            - Unique indexes
            - DISTINCT
            - UNION
            - ORDER BY with indexes
            - ORDER BY without indexes
            - GROUP BY
            - Mixing NO PAD and PAD collations
            - Function output
            - GROUP_CONCAT(DISTINCT)

            Tests should cover MyISAM, HEAP and InnoDB tables.
            bar Alexander Barkov made changes -
            Description MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries including:
            - Unique indexes
            - DISTINCT
            - UNION
            - ORDER BY with indexes
            - ORDER BY without indexes
            - GROUP BY
            - Mixing NO PAD and PAD collations
            - Function output
            - GROUP_CONCAT(DISTINCT)

            Tests should cover MyISAM, HEAP and InnoDB tables.
            MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries including:
            - Unique indexes
            - DISTINCT
            - UNION
            - ORDER BY with indexes
            - ORDER BY without indexes
            - GROUP BY
            - Mixing NO PAD and PAD collations
            - Function output
            - GROUP_CONCAT(DISTINCT)

            Tests should cover MyISAM, HEAP and InnoDB tables.
            bar Alexander Barkov made changes -
            Description MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries including:
            - Unique indexes
            - DISTINCT
            - UNION
            - ORDER BY with indexes
            - ORDER BY without indexes
            - GROUP BY
            - Mixing NO PAD and PAD collations
            - Function output
            - GROUP_CONCAT(DISTINCT)

            Tests should cover MyISAM, HEAP and InnoDB tables.
            MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries including:
            - Unique indexes
            - DISTINCT
            - UNION
            - ORDER BY with indexes
            - ORDER BY without indexes
            - GROUP BY
            - Mixing NO PAD and PAD collations
            - MIIN(), MAX()
            - COUNT(DISTINCT)
            - GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.
            bar Alexander Barkov made changes -
            Description MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries including:
            - Unique indexes
            - DISTINCT
            - UNION
            - ORDER BY with indexes
            - ORDER BY without indexes
            - GROUP BY
            - Mixing NO PAD and PAD collations
            - MIIN(), MAX()
            - COUNT(DISTINCT)
            - GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.
            MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries including:
            - Unique indexes
            - DISTINCT
            - UNION
            - ORDER BY with indexes
            - ORDER BY without indexes
            - GROUP BY
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.
            bar Alexander Barkov made changes -
            Description MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries including:
            - Unique indexes
            - DISTINCT
            - UNION
            - ORDER BY with indexes
            - ORDER BY without indexes
            - GROUP BY
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.
            MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries including:
            - Unique indexes
            - DISTINCT
            - UNION
            - ORDER BY with indexes
            - ORDER BY without indexes
            - GROUP BY with indexes
            - GROUP BY without indexes
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.
            bar Alexander Barkov made changes -
            Description MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries including:
            - Unique indexes
            - DISTINCT
            - UNION
            - ORDER BY with indexes
            - ORDER BY without indexes
            - GROUP BY with indexes
            - GROUP BY without indexes
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.
            MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.
            bar Alexander Barkov made changes -
            Description MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.
            MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.
            bar Alexander Barkov made changes -
            Description MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.
            MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details"
            - Tests, as described in "Testing"

            1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            2. Collations for Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            3. Collations for latin1
            4. Collations for other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8
            bar Alexander Barkov made changes -
            Description MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details"
            - Tests, as described in "Testing"

            1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            2. Collations for Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            3. Collations for latin1
            4. Collations for other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8
            MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details"
            - Tests, as described in "Testing"

            1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            2. Collations for Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            3. Collations for latin1
            latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler
            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            4. Collations for other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8
            bar Alexander Barkov made changes -
            Description MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details"
            - Tests, as described in "Testing"

            1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            2. Collations for Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            3. Collations for latin1
            latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler
            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            4. Collations for other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8
            MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details"
            - Tests, as described in "Testing"

            1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            2. Collations for Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            3. Collations for latin1
            latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            4. Collations for other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8
            bar Alexander Barkov made changes -
            Description MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details"
            - Tests, as described in "Testing"

            1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            2. Collations for Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            3. Collations for latin1
            latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            4. Collations for other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8
            MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details"
            - Tests, as described in "Testing"

            1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            2. Collations for Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            3. Collations for latin1
            latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            4. Collations for other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are implemented in ctype-extra.c.
            Inlike all other collations, this file is not manually written. It's generated from collation definition files in sql/share/charsets/*.xml files.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New utf8 collations should be generated from the collation definition files.
            bar Alexander Barkov made changes -
            Description MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details"
            - Tests, as described in "Testing"

            1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            2. Collations for Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            3. Collations for latin1
            latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            4. Collations for other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are implemented in ctype-extra.c.
            Inlike all other collations, this file is not manually written. It's generated from collation definition files in sql/share/charsets/*.xml files.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New utf8 collations should be generated from the collation definition files.
            MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details"
            - Tests, as described in "Testing"

            1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            2. Collations for Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            3. Collations for latin1
            latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            4. Collations for other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are implemented in ctype-extra.c.
            Inlike all other collations, this file is not manually written. It's generated from collation definition files in sql/share/charsets/*.xml files.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New utf8 collations should be generated from the collation definition files.
            We'll need some additions into handle a new flag "nopad", so Index.xml look like this:
            {noformat|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_ci" id="xxx" order="Greek" flag="nopad"/>
              <collation name="greek_bin" id="xxx" order="Binary" flag="binary" flag="nopad"/>
            </charset>
            {noformat}
            bar Alexander Barkov made changes -
            Description MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details"
            - Tests, as described in "Testing"

            1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            2. Collations for Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            3. Collations for latin1
            latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            4. Collations for other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are implemented in ctype-extra.c.
            Inlike all other collations, this file is not manually written. It's generated from collation definition files in sql/share/charsets/*.xml files.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New utf8 collations should be generated from the collation definition files.
            We'll need some additions into handle a new flag "nopad", so Index.xml look like this:
            {noformat|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_ci" id="xxx" order="Greek" flag="nopad"/>
              <collation name="greek_bin" id="xxx" order="Binary" flag="binary" flag="nopad"/>
            </charset>
            {noformat}
            MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details"
            - Tests, as described in "Testing"

            1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            2. Collations for Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            3. Collations for latin1
            latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            4. Collations for other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are implemented in ctype-extra.c.
            Inlike all other collations, this file is not manually written. It's generated from collation definition files in sql/share/charsets/*.xml files.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New utf8 collations should be generated from the collation definition files.
            We'll need some additions into handle a new flag "nopad", so Index.xml look like this:
            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_ci" id="xxx" order="Greek" flag="nopad"/>
              <collation name="greek_bin" id="xxx" order="Binary" flag="binary" flag="nopad"/>
            </charset>
            {code}
            bar Alexander Barkov made changes -
            Description MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details"
            - Tests, as described in "Testing"

            1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            2. Collations for Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            3. Collations for latin1
            latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            4. Collations for other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are implemented in ctype-extra.c.
            Inlike all other collations, this file is not manually written. It's generated from collation definition files in sql/share/charsets/*.xml files.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New utf8 collations should be generated from the collation definition files.
            We'll need some additions into handle a new flag "nopad", so Index.xml look like this:
            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_ci" id="xxx" order="Greek" flag="nopad"/>
              <collation name="greek_bin" id="xxx" order="Binary" flag="binary" flag="nopad"/>
            </charset>
            {code}
            MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details"
            - Tests, as described in "Testing"

            1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            2. Collations for Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            3. Collations for latin1
            latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            4. Collations for other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are implemented in ctype-extra.c.
            Inlike all other collations, this file is not manually written. It's generated from collation definition files in sql/share/charsets/*.xml files.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.
            We'll need some additions into handle a new flag "nopad", so Index.xml look like this:
            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_ci" id="xxx" order="Greek" flag="nopad"/>
              <collation name="greek_bin" id="xxx" order="Binary" flag="binary" flag="nopad"/>
            </charset>
            {code}
            bar Alexander Barkov made changes -
            Description MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details"
            - Tests, as described in "Testing"

            1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            2. Collations for Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            3. Collations for latin1
            latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            4. Collations for other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are implemented in ctype-extra.c.
            Inlike all other collations, this file is not manually written. It's generated from collation definition files in sql/share/charsets/*.xml files.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.
            We'll need some additions into handle a new flag "nopad", so Index.xml look like this:
            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_ci" id="xxx" order="Greek" flag="nopad"/>
              <collation name="greek_bin" id="xxx" order="Binary" flag="binary" flag="nopad"/>
            </charset>
            {code}
            MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details"
            - Tests, as described in "Testing"

            1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            2. Collations for Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            3. Collations for latin1
            latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            4. Collations for other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are implemented in ctype-extra.c.
            Inlike all other collations, this file is not manually written. It's generated from collation definition files in sql/share/charsets/*.xml files.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.
            We'll need some additions into handle a new flag "nopad", so Index.xml looks like this (notice new lines highlighted):
            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_ci" id="xxx" order="Greek" flag="nopad"/>
              <collation name="greek_bin" id="xxx" order="Binary" flag="binary" flag="nopad"/>
            </charset>
            {code}
            bar Alexander Barkov made changes -
            Description MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details"
            - Tests, as described in "Testing"

            1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            2. Collations for Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            3. Collations for latin1
            latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            4. Collations for other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are implemented in ctype-extra.c.
            Inlike all other collations, this file is not manually written. It's generated from collation definition files in sql/share/charsets/*.xml files.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.
            We'll need some additions into handle a new flag "nopad", so Index.xml looks like this (notice new lines highlighted):
            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_ci" id="xxx" order="Greek" flag="nopad"/>
              <collation name="greek_bin" id="xxx" order="Binary" flag="binary" flag="nopad"/>
            </charset>
            {code}
            MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details"
            - Tests, as described in "Testing"

            1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            2. Collations for Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            3. Collations for latin1
            latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            4. Collations for other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are implemented in ctype-extra.c.
            Inlike all other collations, this file is not manually written. It's generated from collation definition files in sql/share/charsets/*.xml files.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml looks like this (notice new lines highlighted):
            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_ci" id="xxx" order="Greek" flag="nopad"/>
              <collation name="greek_bin" id="xxx" order="Binary" flag="binary" flag="nopad"/>
            </charset>
            {code}
            bar Alexander Barkov made changes -
            Description MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details"
            - Tests, as described in "Testing"

            1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            2. Collations for Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            3. Collations for latin1
            latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            4. Collations for other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are implemented in ctype-extra.c.
            Inlike all other collations, this file is not manually written. It's generated from collation definition files in sql/share/charsets/*.xml files.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml looks like this (notice new lines highlighted):
            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_ci" id="xxx" order="Greek" flag="nopad"/>
              <collation name="greek_bin" id="xxx" order="Binary" flag="binary" flag="nopad"/>
            </charset>
            {code}
            MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details"
            - Tests, as described in "Testing"

            1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            2. Collations for Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            3. Collations for latin1
            latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            4. Collations for other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are implemented in ctype-extra.c.
            Inlike all other collations, this file is not manually written. It's generated from collation definition files in sql/share/charsets/*.xml files.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml looks like this (notice new lines highlighted):
            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_nopad_ci" id="xxx" order="Greek" flag="nopad"/>
              <collation name="greek_nopad_bin" id="xxx" order="Binary" flag="binary" flag="nopad"/>
            </charset>
            {code}
            bar Alexander Barkov made changes -
            Description MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details"
            - Tests, as described in "Testing"

            1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            2. Collations for Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            3. Collations for latin1
            latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            4. Collations for other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are implemented in ctype-extra.c.
            Inlike all other collations, this file is not manually written. It's generated from collation definition files in sql/share/charsets/*.xml files.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml looks like this (notice new lines highlighted):
            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_nopad_ci" id="xxx" order="Greek" flag="nopad"/>
              <collation name="greek_nopad_bin" id="xxx" order="Binary" flag="binary" flag="nopad"/>
            </charset>
            {code}
            MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details"
            - Tests, as described in "Testing"

            1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            2. Collations for Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            3. Collations for latin1
            latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            4. Collations for other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are implemented in ctype-extra.c.
            Inlike all other collations, this file is not manually written. It's generated from collation definition files in sql/share/charsets/*.xml files.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml looks like this (notice new lines highlighted):
            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_nopad_ci" id="xxx" flag="nopad"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            bar Alexander Barkov made changes -
            Description MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details"
            - Tests, as described in "Testing"

            1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            2. Collations for Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            3. Collations for latin1
            latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            4. Collations for other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are implemented in ctype-extra.c.
            Inlike all other collations, this file is not manually written. It's generated from collation definition files in sql/share/charsets/*.xml files.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml looks like this (notice new lines highlighted):
            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_nopad_ci" id="xxx" flag="nopad"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details"
            - Tests, as described in "Testing"

            h2. 1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            h2. 2. Collations for Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            h2. 3. Collations for latin1
            latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            h2. 4. Collations for other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are implemented in ctype-extra.c.
            Inlike all other collations, this file is not manually written. It's generated from collation definition files in sql/share/charsets/*.xml files.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml looks like this (notice new lines highlighted):
            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_nopad_ci" id="xxx" flag="nopad"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            bar Alexander Barkov made changes -
            Description MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details"
            - Tests, as described in "Testing"

            h2. 1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            h2. 2. Collations for Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            h2. 3. Collations for latin1
            latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            h2. 4. Collations for other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are implemented in ctype-extra.c.
            Inlike all other collations, this file is not manually written. It's generated from collation definition files in sql/share/charsets/*.xml files.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml looks like this (notice new lines highlighted):
            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_nopad_ci" id="xxx" flag="nopad"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details"
            - Tests, as described in "Testing"

            h2. 1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            h2. 2. Collations for Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            h2. 3. Collations for latin1
            latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            h2. 4. Collations for other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are implemented in ctype-extra.c.
            Inlike all other collations, this file is not manually written. It's generated from collation definition files in sql/share/charsets/*.xml files.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.
            We'll need the following changes:
            h3. 4a. Make collation definition loader understand a new "nopad" flag.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml can look like this (notice a new line highlighted):
            {code:xml|highlight=5}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            h3. 4b. Make collation definition loader understand a new attribute "source".
            New _ci collations should use exactly the same definition of mapping from character codes to their collating weight with their PAD SPACE counterparts.
            For example, sort map definition for greek_general_ci resides in greek.xml and looks like this:
            {code:xml}
            <collation name="greek_general_ci">
            <map>
             00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
             10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
             20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
             30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
             40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
             60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
             80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
             90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
             A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
             B0 B1 B2 B3 B4 B5 C1 B7 C5 C7 C9 BB CF BD D5 D9
             C9 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 C9 D5 C1 C5 C7 C9
             D5 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D3 D3 D4 D5 D6 D7 D8 D9 C9 D5 CF D5 D9 FF
            </map>
            </collation>
            {code}
            The easiest way would be just to copy-and-paste its definition with a new name greek_general_nopad_ci.
            But it would be nice to avoid duplication of definitions. We'll extend ctype.c to understand a new attribute "source" to make it possible to share map definitions between collations, so a new block for the "greek" character set can look like this:

            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
              <collation name="greek_nopad_bin" id="xxx" flag="nopad" source="greek_general_ci"/>
            </charset>
            {code}
            bar Alexander Barkov made changes -
            Description MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details"
            - Tests, as described in "Testing"

            h2. 1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            h2. 2. Collations for Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            h2. 3. Collations for latin1
            latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            h2. 4. Collations for other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are implemented in ctype-extra.c.
            Inlike all other collations, this file is not manually written. It's generated from collation definition files in sql/share/charsets/*.xml files.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.
            We'll need the following changes:
            h3. 4a. Make collation definition loader understand a new "nopad" flag.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml can look like this (notice a new line highlighted):
            {code:xml|highlight=5}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            h3. 4b. Make collation definition loader understand a new attribute "source".
            New _ci collations should use exactly the same definition of mapping from character codes to their collating weight with their PAD SPACE counterparts.
            For example, sort map definition for greek_general_ci resides in greek.xml and looks like this:
            {code:xml}
            <collation name="greek_general_ci">
            <map>
             00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
             10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
             20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
             30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
             40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
             60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
             80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
             90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
             A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
             B0 B1 B2 B3 B4 B5 C1 B7 C5 C7 C9 BB CF BD D5 D9
             C9 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 C9 D5 C1 C5 C7 C9
             D5 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D3 D3 D4 D5 D6 D7 D8 D9 C9 D5 CF D5 D9 FF
            </map>
            </collation>
            {code}
            The easiest way would be just to copy-and-paste its definition with a new name greek_general_nopad_ci.
            But it would be nice to avoid duplication of definitions. We'll extend ctype.c to understand a new attribute "source" to make it possible to share map definitions between collations, so a new block for the "greek" character set can look like this:

            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
              <collation name="greek_nopad_bin" id="xxx" flag="nopad" source="greek_general_ci"/>
            </charset>
            {code}
            MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details"
            - Tests, as described in "Testing"

            h2. 1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            h2. 2. Collations for Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            h2. 3. Collations for latin1
            latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            h2. 4. Collations for other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are implemented in ctype-extra.c.
            Inlike all other collations, this file is not manually written. It's generated from collation definition files in sql/share/charsets/*.xml files.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.
            We'll need the following changes:
            h3. 4a. Make collation definition loader understand a new "nopad" flag.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml can look like this (notice a new line highlighted):
            {code:xml|highlight=5}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            h3. 4b. Make collation definition loader understand a new attribute "source".
            New _ci collations should use exactly the same definition of mapping from character codes to their collating weight with their PAD SPACE counterparts.
            For example, sort map definition for greek_general_ci resides in greek.xml and looks like this:
            {code:xml}
            <collation name="greek_general_ci">
            <map>
             00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
             10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
             20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
             30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
             40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
             60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
             80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
             90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
             A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
             B0 B1 B2 B3 B4 B5 C1 B7 C5 C7 C9 BB CF BD D5 D9
             C9 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 C9 D5 C1 C5 C7 C9
             D5 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D3 D3 D4 D5 D6 D7 D8 D9 C9 D5 CF D5 D9 FF
            </map>
            </collation>
            {code}
            The easiest way would be just to copy-and-paste its definition with a new name greek_general_nopad_ci.
            But it would be nice to avoid duplication of definitions. We'll extend ctype.c to understand a new attribute "source" to make it possible to share map definitions between collations, so a new block for the "greek" character set can look like this:

            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_nopad_ci" id="xxx" flag="nopad" source="greek_general_ci"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            bar Alexander Barkov made changes -
            Description MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details"
            - Tests, as described in "Testing"

            h2. 1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            h2. 2. Collations for Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            h2. 3. Collations for latin1
            latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            h2. 4. Collations for other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are implemented in ctype-extra.c.
            Inlike all other collations, this file is not manually written. It's generated from collation definition files in sql/share/charsets/*.xml files.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.
            We'll need the following changes:
            h3. 4a. Make collation definition loader understand a new "nopad" flag.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml can look like this (notice a new line highlighted):
            {code:xml|highlight=5}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            h3. 4b. Make collation definition loader understand a new attribute "source".
            New _ci collations should use exactly the same definition of mapping from character codes to their collating weight with their PAD SPACE counterparts.
            For example, sort map definition for greek_general_ci resides in greek.xml and looks like this:
            {code:xml}
            <collation name="greek_general_ci">
            <map>
             00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
             10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
             20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
             30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
             40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
             60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
             80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
             90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
             A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
             B0 B1 B2 B3 B4 B5 C1 B7 C5 C7 C9 BB CF BD D5 D9
             C9 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 C9 D5 C1 C5 C7 C9
             D5 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D3 D3 D4 D5 D6 D7 D8 D9 C9 D5 CF D5 D9 FF
            </map>
            </collation>
            {code}
            The easiest way would be just to copy-and-paste its definition with a new name greek_general_nopad_ci.
            But it would be nice to avoid duplication of definitions. We'll extend ctype.c to understand a new attribute "source" to make it possible to share map definitions between collations, so a new block for the "greek" character set can look like this:

            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_nopad_ci" id="xxx" flag="nopad" source="greek_general_ci"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details"
            - Tests, as described in "Testing"

            h2. 1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            h2. 2. Collations for Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            h2. 3. Collations for latin1
            latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            h2. 4. Collations for other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are implemented in ctype-extra.c.
            Inlike all other collations, this file is not manually written. It's generated from collation definition files in sql/share/charsets/*.xml files.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.
            We'll need the following changes:
            h3. 4a. Make collation definition loader understand a new "nopad" flag.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml can look like this (notice a new line highlighted):
            {code:xml|highlight=5}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            h3. 4b. Make collation definition loader understand a new attribute "source".
            New _ci collations should use exactly the same definition of mapping from character codes to their collating weight with their PAD SPACE counterparts.
            For example, sort map definition for greek_general_ci resides in greek.xml and looks like this:
            {code:xml}
            <collation name="greek_general_ci">
            <map>
             00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
             10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
             20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
             30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
             40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
             60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
             80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
             90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
             A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
             B0 B1 B2 B3 B4 B5 C1 B7 C5 C7 C9 BB CF BD D5 D9
             C9 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 C9 D5 C1 C5 C7 C9
             D5 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D3 D3 D4 D5 D6 D7 D8 D9 C9 D5 CF D5 D9 FF
            </map>
            </collation>
            {code}
            The easiest way would be just to copy-and-paste its definition with a new name greek_general_nopad_ci.
            But it would be nice to avoid duplication of definitions. We'll extend ctype.c to understand a new attribute "source" to make it possible to share map definitions between collations, so a new block for the "greek" character set can look like this:

            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_nopad_ci" id="xxx" flag="nopad" source="greek_general_ci"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            bar Alexander Barkov made changes -
            Description MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details"
            - Tests, as described in "Testing"

            h2. 1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            h2. 2. Collations for Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            h2. 3. Collations for latin1
            latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            h2. 4. Collations for other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are implemented in ctype-extra.c.
            Inlike all other collations, this file is not manually written. It's generated from collation definition files in sql/share/charsets/*.xml files.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.
            We'll need the following changes:
            h3. 4a. Make collation definition loader understand a new "nopad" flag.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml can look like this (notice a new line highlighted):
            {code:xml|highlight=5}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            h3. 4b. Make collation definition loader understand a new attribute "source".
            New _ci collations should use exactly the same definition of mapping from character codes to their collating weight with their PAD SPACE counterparts.
            For example, sort map definition for greek_general_ci resides in greek.xml and looks like this:
            {code:xml}
            <collation name="greek_general_ci">
            <map>
             00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
             10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
             20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
             30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
             40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
             60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
             80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
             90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
             A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
             B0 B1 B2 B3 B4 B5 C1 B7 C5 C7 C9 BB CF BD D5 D9
             C9 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 C9 D5 C1 C5 C7 C9
             D5 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D3 D3 D4 D5 D6 D7 D8 D9 C9 D5 CF D5 D9 FF
            </map>
            </collation>
            {code}
            The easiest way would be just to copy-and-paste its definition with a new name greek_general_nopad_ci.
            But it would be nice to avoid duplication of definitions. We'll extend ctype.c to understand a new attribute "source" to make it possible to share map definitions between collations, so a new block for the "greek" character set can look like this:

            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_nopad_ci" id="xxx" flag="nopad" source="greek_general_ci"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details"
            - Tests, as described in "Testing"

            h2. 1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            h2. 2. Collations for Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            h2. 3. Collations for latin1
            latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            h2. 4. Collations for other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are implemented in ctype-extra.c.
            Inlike all other collations, this file is not manually written. It's generated from collation definition files in sql/share/charsets/*.xml files.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.
            We'll need the following changes:
            h3. 4a. Make collation definition loader understand a new "nopad" flag.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml can look like this (notice a new line highlighted):
            {code:xml|highlight=5}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            h3. 4b. Make collation definition loader understand a new attribute "source".
            New _ci collations should use exactly the same definition of mapping from character codes to their collating weight with their PAD SPACE counterparts.
            For example, sort map definition for greek_general_ci resides in greek.xml and looks like this:
            {code:xml}
            <collation name="greek_general_ci">
            <map>
             00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
             10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
             20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
             30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
             40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
             60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
             80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
             90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
             A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
             B0 B1 B2 B3 B4 B5 C1 B7 C5 C7 C9 BB CF BD D5 D9
             C9 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 C9 D5 C1 C5 C7 C9
             D5 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D3 D3 D4 D5 D6 D7 D8 D9 C9 D5 CF D5 D9 FF
            </map>
            </collation>
            {code}
            The easiest way would be just to copy-and-paste its definition with a new name greek_general_nopad_ci.
            But it would be nice to avoid duplication of definitions. We'll extend ctype.c to understand a new attribute "source" to make it possible to share map definitions between collations, so a new block for the "greek" character set can look like this:

            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_nopad_ci" id="xxx" flag="nopad" source="greek_general_ci"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            bar Alexander Barkov made changes -
            Description MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details"
            - Tests, as described in "Testing"

            h2. 1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            h2. 2. Collations for Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            h2. 3. Collations for latin1
            latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            h2. 4. Collations for other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are implemented in ctype-extra.c.
            Inlike all other collations, this file is not manually written. It's generated from collation definition files in sql/share/charsets/*.xml files.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.
            We'll need the following changes:
            h3. 4a. Make collation definition loader understand a new "nopad" flag.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml can look like this (notice a new line highlighted):
            {code:xml|highlight=5}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            h3. 4b. Make collation definition loader understand a new attribute "source".
            New _ci collations should use exactly the same definition of mapping from character codes to their collating weight with their PAD SPACE counterparts.
            For example, sort map definition for greek_general_ci resides in greek.xml and looks like this:
            {code:xml}
            <collation name="greek_general_ci">
            <map>
             00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
             10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
             20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
             30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
             40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
             60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
             80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
             90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
             A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
             B0 B1 B2 B3 B4 B5 C1 B7 C5 C7 C9 BB CF BD D5 D9
             C9 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 C9 D5 C1 C5 C7 C9
             D5 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D3 D3 D4 D5 D6 D7 D8 D9 C9 D5 CF D5 D9 FF
            </map>
            </collation>
            {code}
            The easiest way would be just to copy-and-paste its definition with a new name greek_general_nopad_ci.
            But it would be nice to avoid duplication of definitions. We'll extend ctype.c to understand a new attribute "source" to make it possible to share map definitions between collations, so a new block for the "greek" character set can look like this:

            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_nopad_ci" id="xxx" flag="nopad" source="greek_general_ci"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details"
            - Tests, as described in "Testing"

            h2. 1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            h2. 2. Collations for Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            h2. 3. Collations for latin1
            latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            h2. 4. Collations for other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are implemented in ctype-extra.c.
            Inlike all other collations, this file is not manually written. It's generated from collation definition files in sql/share/charsets/*.xml files.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.
            We'll need the following changes:
            h3. 4a. Make collation definition loader understand a new "nopad" flag.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml can look like this (notice a new line highlighted):
            {code:xml|highlight=5}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            h3. 4b. Make collation definition loader reuse weight table
            New _ci collations should use exactly the same weight definition (i.e. mapping from character codes to their collating weight) with their PAD SPACE counterparts.
            For example, weight table definition for greek_general_ci resides in greek.xml and looks like this:
            {code:xml}
            <collation name="greek_general_ci">
            <map>
             00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
             10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
             20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
             30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
             40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
             60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
             80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
             90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
             A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
             B0 B1 B2 B3 B4 B5 C1 B7 C5 C7 C9 BB CF BD D5 D9
             C9 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 C9 D5 C1 C5 C7 C9
             D5 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D3 D3 D4 D5 D6 D7 D8 D9 C9 D5 CF D5 D9 FF
            </map>
            </collation>
            {code}
            The easiest way would be just to copy-and-paste its definition with a new name greek_general_nopad_ci.
            But it would be nice to avoid duplication of definitions. We'll extend ctype.c to understand references to other collations instead of explicit weight table definitions, so a new block for the "greek" character set can look like this:

            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_nopad_ci" id="xxx" flag="nopad" map="greek_general_ci"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            Notice, the "map" attribute has another collation name rather than a full weight table definition.
            bar Alexander Barkov made changes -
            Description MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details"
            - Tests, as described in "Testing"

            h2. 1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            h2. 2. Collations for Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            h2. 3. Collations for latin1
            latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            h2. 4. Collations for other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are implemented in ctype-extra.c.
            Inlike all other collations, this file is not manually written. It's generated from collation definition files in sql/share/charsets/*.xml files.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.
            We'll need the following changes:
            h3. 4a. Make collation definition loader understand a new "nopad" flag.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml can look like this (notice a new line highlighted):
            {code:xml|highlight=5}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            h3. 4b. Make collation definition loader reuse weight table
            New _ci collations should use exactly the same weight definition (i.e. mapping from character codes to their collating weight) with their PAD SPACE counterparts.
            For example, weight table definition for greek_general_ci resides in greek.xml and looks like this:
            {code:xml}
            <collation name="greek_general_ci">
            <map>
             00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
             10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
             20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
             30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
             40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
             60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
             80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
             90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
             A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
             B0 B1 B2 B3 B4 B5 C1 B7 C5 C7 C9 BB CF BD D5 D9
             C9 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 C9 D5 C1 C5 C7 C9
             D5 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D3 D3 D4 D5 D6 D7 D8 D9 C9 D5 CF D5 D9 FF
            </map>
            </collation>
            {code}
            The easiest way would be just to copy-and-paste its definition with a new name greek_general_nopad_ci.
            But it would be nice to avoid duplication of definitions. We'll extend ctype.c to understand references to other collations instead of explicit weight table definitions, so a new block for the "greek" character set can look like this:

            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_nopad_ci" id="xxx" flag="nopad" map="greek_general_ci"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            Notice, the "map" attribute has another collation name rather than a full weight table definition.
            MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details"
            - Tests, as described in "Testing"

            h2. 1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            h2. 2. Collations for Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            h2. 3. Collations for latin1
            Unlike all other 8-bit character sets, latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            h2. 4. Collations for other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are implemented in ctype-extra.c.
            Inlike all other collations, this file is not manually written. It's generated from collation definition files in sql/share/charsets/*.xml files.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.
            We'll need the following changes:
            h3. 4a. Make collation definition loader understand a new "nopad" flag.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml can look like this (notice a new line highlighted):
            {code:xml|highlight=5}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            h3. 4b. Make collation definition loader reuse weight table
            New _ci collations should use exactly the same weight definition (i.e. mapping from character codes to their collating weight) with their PAD SPACE counterparts.
            For example, weight table definition for greek_general_ci resides in greek.xml and looks like this:
            {code:xml}
            <collation name="greek_general_ci">
            <map>
             00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
             10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
             20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
             30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
             40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
             60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
             80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
             90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
             A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
             B0 B1 B2 B3 B4 B5 C1 B7 C5 C7 C9 BB CF BD D5 D9
             C9 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 C9 D5 C1 C5 C7 C9
             D5 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D3 D3 D4 D5 D6 D7 D8 D9 C9 D5 CF D5 D9 FF
            </map>
            </collation>
            {code}
            The easiest way would be just to copy-and-paste its definition with a new name greek_general_nopad_ci.
            But it would be nice to avoid duplication of definitions. We'll extend ctype.c to understand references to other collations instead of explicit weight table definitions, so a new block for the "greek" character set can look like this:

            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_nopad_ci" id="xxx" flag="nopad" map="greek_general_ci"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            Notice, the "map" attribute has another collation name rather than a full weight table definition.
            bar Alexander Barkov made changes -
            Description MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details"
            - Tests, as described in "Testing"

            h2. 1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            h2. 2. Collations for Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            h2. 3. Collations for latin1
            Unlike all other 8-bit character sets, latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            h2. 4. Collations for other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are implemented in ctype-extra.c.
            Inlike all other collations, this file is not manually written. It's generated from collation definition files in sql/share/charsets/*.xml files.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.
            We'll need the following changes:
            h3. 4a. Make collation definition loader understand a new "nopad" flag.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml can look like this (notice a new line highlighted):
            {code:xml|highlight=5}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            h3. 4b. Make collation definition loader reuse weight table
            New _ci collations should use exactly the same weight definition (i.e. mapping from character codes to their collating weight) with their PAD SPACE counterparts.
            For example, weight table definition for greek_general_ci resides in greek.xml and looks like this:
            {code:xml}
            <collation name="greek_general_ci">
            <map>
             00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
             10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
             20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
             30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
             40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
             60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
             80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
             90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
             A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
             B0 B1 B2 B3 B4 B5 C1 B7 C5 C7 C9 BB CF BD D5 D9
             C9 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 C9 D5 C1 C5 C7 C9
             D5 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D3 D3 D4 D5 D6 D7 D8 D9 C9 D5 CF D5 D9 FF
            </map>
            </collation>
            {code}
            The easiest way would be just to copy-and-paste its definition with a new name greek_general_nopad_ci.
            But it would be nice to avoid duplication of definitions. We'll extend ctype.c to understand references to other collations instead of explicit weight table definitions, so a new block for the "greek" character set can look like this:

            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_nopad_ci" id="xxx" flag="nopad" map="greek_general_ci"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            Notice, the "map" attribute has another collation name rather than a full weight table definition.
            MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details"
            - Tests, as described in "Testing"

            h2. 1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            h2. 2. Collations for Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            h2. 3. Collations for latin1
            Unlike all other 8-bit character sets, latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            h2. 4. Collations for other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are implemented in ctype-extra.c.
            Inlike all other collations, this file is not manually written. It's generated from collation definition files in sql/share/charsets/*.xml files.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.
            We'll need the following changes:
            h3. 4a. Make collation definition loader understand a new "nopad" flag.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml can look like this (notice a new line highlighted):
            {code:xml|highlight=5}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            h3. 4b. Make collation definition loader reuse weight table
            New _ci collations should use exactly the same weight definition (i.e. mapping from character codes to their collating weight) with their PAD SPACE counterparts.
            For example, weight table definition for greek_general_ci resides in greek.xml and looks like this:
            {code:xml}
            <collation name="greek_general_ci">
            <map>
             00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
             10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
             20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
             30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
             40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
             60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
             80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
             90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
             A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
             B0 B1 B2 B3 B4 B5 C1 B7 C5 C7 C9 BB CF BD D5 D9
             C9 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 C9 D5 C1 C5 C7 C9
             D5 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D3 D3 D4 D5 D6 D7 D8 D9 C9 D5 CF D5 D9 FF
            </map>
            </collation>
            {code}
            The easiest way would be just to copy-and-paste its definition and use the copy with a new name greek_general_nopad_ci.
            But it would be nice to avoid duplication of definitions. We'll extend ctype.c to understand references to other collations instead of explicit weight table definitions, so a new block for the "greek" character set can look like this:

            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_nopad_ci" id="xxx" flag="nopad" map="greek_general_ci"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            Notice, the "map" attribute has another collation name rather than a full weight table definition.
            bar Alexander Barkov made changes -
            Description MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details"
            - Tests, as described in "Testing"

            h2. 1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            h2. 2. Collations for Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            h2. 3. Collations for latin1
            Unlike all other 8-bit character sets, latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            h2. 4. Collations for other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are implemented in ctype-extra.c.
            Inlike all other collations, this file is not manually written. It's generated from collation definition files in sql/share/charsets/*.xml files.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.
            We'll need the following changes:
            h3. 4a. Make collation definition loader understand a new "nopad" flag.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml can look like this (notice a new line highlighted):
            {code:xml|highlight=5}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            h3. 4b. Make collation definition loader reuse weight table
            New _ci collations should use exactly the same weight definition (i.e. mapping from character codes to their collating weight) with their PAD SPACE counterparts.
            For example, weight table definition for greek_general_ci resides in greek.xml and looks like this:
            {code:xml}
            <collation name="greek_general_ci">
            <map>
             00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
             10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
             20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
             30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
             40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
             60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
             80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
             90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
             A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
             B0 B1 B2 B3 B4 B5 C1 B7 C5 C7 C9 BB CF BD D5 D9
             C9 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 C9 D5 C1 C5 C7 C9
             D5 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D3 D3 D4 D5 D6 D7 D8 D9 C9 D5 CF D5 D9 FF
            </map>
            </collation>
            {code}
            The easiest way would be just to copy-and-paste its definition and use the copy with a new name greek_general_nopad_ci.
            But it would be nice to avoid duplication of definitions. We'll extend ctype.c to understand references to other collations instead of explicit weight table definitions, so a new block for the "greek" character set can look like this:

            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_nopad_ci" id="xxx" flag="nopad" map="greek_general_ci"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            Notice, the "map" attribute has another collation name rather than a full weight table definition.
            MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details"
            - Tests, as described in "Testing"

            h2. 1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            h2. 2. Collations for Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            h2. 3. Collations for latin1
            Unlike all other 8-bit character sets, latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            h2. 4. Collations for other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are implemented in ctype-extra.c.
            Inlike all other collations, this file is not manually written. It's generated from collation definition files in sql/share/charsets/*.xml files.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.
            We'll need the following changes:
            h3. 4a. Make collation definition loader understand a new "nopad" flag.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml can look like this (notice a new line highlighted):
            {code:xml|highlight=5}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            h3. 4b. Make collation definition loader reuse weight table
            New _ci collations should use exactly the same weight definition (i.e. mapping from character codes to their collating weight) with their PAD SPACE counterparts.
            For example, weight table definition for greek_general_ci resides in greek.xml and looks like this:
            {code:xml}
            <collation name="greek_general_ci">
            <map>
             00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
             10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
             20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
             30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
             40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
             60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
             80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
             90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
             A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
             B0 B1 B2 B3 B4 B5 C1 B7 C5 C7 C9 BB CF BD D5 D9
             C9 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 C9 D5 C1 C5 C7 C9
             D5 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D3 D3 D4 D5 D6 D7 D8 D9 C9 D5 CF D5 D9 FF
            </map>
            </collation>
            {code}
            The easiest way would be just to copy-and-paste its definition and use the copy with a new name greek_general_nopad_ci.
            But it would be nice to avoid duplication of definitions. We'll extend ctype.c to understand references to other collations instead of explicit weight table definitions, so a new block for the "greek" character set can look like this:

            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_nopad_ci" id="xxx" flag="nopad" map="greek_general_ci"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            Notice, the "map" attribute has another collation name rather than a full weight table definition.
            bar Alexander Barkov made changes -
            Description MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details"
            - Tests, as described in "Testing"

            h2. 1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            h2. 2. Collations for Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            h2. 3. Collations for latin1
            Unlike all other 8-bit character sets, latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            h2. 4. Collations for other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are implemented in ctype-extra.c.
            Inlike all other collations, this file is not manually written. It's generated from collation definition files in sql/share/charsets/*.xml files.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.
            We'll need the following changes:
            h3. 4a. Make collation definition loader understand a new "nopad" flag.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml can look like this (notice a new line highlighted):
            {code:xml|highlight=5}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            h3. 4b. Make collation definition loader reuse weight table
            New _ci collations should use exactly the same weight definition (i.e. mapping from character codes to their collating weight) with their PAD SPACE counterparts.
            For example, weight table definition for greek_general_ci resides in greek.xml and looks like this:
            {code:xml}
            <collation name="greek_general_ci">
            <map>
             00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
             10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
             20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
             30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
             40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
             60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
             80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
             90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
             A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
             B0 B1 B2 B3 B4 B5 C1 B7 C5 C7 C9 BB CF BD D5 D9
             C9 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 C9 D5 C1 C5 C7 C9
             D5 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D3 D3 D4 D5 D6 D7 D8 D9 C9 D5 CF D5 D9 FF
            </map>
            </collation>
            {code}
            The easiest way would be just to copy-and-paste its definition and use the copy with a new name greek_general_nopad_ci.
            But it would be nice to avoid duplication of definitions. We'll extend ctype.c to understand references to other collations instead of explicit weight table definitions, so a new block for the "greek" character set can look like this:

            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_nopad_ci" id="xxx" flag="nopad" map="greek_general_ci"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            Notice, the "map" attribute has another collation name rather than a full weight table definition.
            MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details"
            - Tests, as described in "Testing"

            h2. 1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            h2. 2. Collations for Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            h2. 3. Collations for latin1
            Unlike all other 8-bit character sets, latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            h2. 4. Collations for other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are implemented in ctype-extra.c.
            Inlike all other collations, this file is not manually written. It's generated from collation definition files in sql/share/charsets/*.xml files.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.
            We'll need the following changes:
            h3. 4a. Make collation definition loader understand a new "nopad" flag.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml can look like this (notice a new line highlighted):
            {code:xml|highlight=5}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            h3. 4b. Make collation definition loader reuse weight table
            New _ci collations should use exactly the same weight definition (i.e. mapping from character codes to their collating weight) with their PAD SPACE counterparts.
            For example, weight table definition for greek_general_ci resides in greek.xml and looks like this:
            {code:xml}
            <collation name="greek_general_ci">
            <map>
             00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
             10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
             20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
             30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
             40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
             60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
             80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
             90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
             A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
             B0 B1 B2 B3 B4 B5 C1 B7 C5 C7 C9 BB CF BD D5 D9
             C9 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 C9 D5 C1 C5 C7 C9
             D5 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D3 D3 D4 D5 D6 D7 D8 D9 C9 D5 CF D5 D9 FF
            </map>
            </collation>
            {code}
            The easiest way would be just to copy-and-paste its definition and use the copy with a new name greek_general_nopad_ci.
            But it would be nice to avoid duplication of definitions. We'll extend ctype.c to understand references to other collations instead of explicit weight table definitions, so a new block for the "greek" character set can look like this:

            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_nopad_ci" id="xxx" flag="nopad" map="greek_general_ci"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            Notice, the "map" attribute has another collation name rather than a full weight table definition.
            bar Alexander Barkov made changes -
            Description MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details"
            - Tests, as described in "Testing"

            h2. 1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            h2. 2. Collations for Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            h2. 3. Collations for latin1
            Unlike all other 8-bit character sets, latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            h2. 4. Collations for other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are implemented in ctype-extra.c.
            Inlike all other collations, this file is not manually written. It's generated from collation definition files in sql/share/charsets/*.xml files.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.
            We'll need the following changes:
            h3. 4a. Make collation definition loader understand a new "nopad" flag.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml can look like this (notice a new line highlighted):
            {code:xml|highlight=5}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            h3. 4b. Make collation definition loader reuse weight table
            New _ci collations should use exactly the same weight definition (i.e. mapping from character codes to their collating weight) with their PAD SPACE counterparts.
            For example, weight table definition for greek_general_ci resides in greek.xml and looks like this:
            {code:xml}
            <collation name="greek_general_ci">
            <map>
             00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
             10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
             20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
             30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
             40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
             60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
             80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
             90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
             A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
             B0 B1 B2 B3 B4 B5 C1 B7 C5 C7 C9 BB CF BD D5 D9
             C9 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 C9 D5 C1 C5 C7 C9
             D5 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D3 D3 D4 D5 D6 D7 D8 D9 C9 D5 CF D5 D9 FF
            </map>
            </collation>
            {code}
            The easiest way would be just to copy-and-paste its definition and use the copy with a new name greek_general_nopad_ci.
            But it would be nice to avoid duplication of definitions. We'll extend ctype.c to understand references to other collations instead of explicit weight table definitions, so a new block for the "greek" character set can look like this:

            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_nopad_ci" id="xxx" flag="nopad" map="greek_general_ci"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            Notice, the "map" attribute has another collation name rather than a full weight table definition.
            MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details"
            - Tests, as described in "Testing"

            h2. 1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            h2. 2. Collations for Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            h2. 3. Collations for latin1
            Unlike all other 8-bit character sets, latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            h2. 4. Collations for other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are implemented in ctype-extra.c.
            Inlike all other collations, this file is not manually written. It's generated from collation definition files in sql/share/charsets/*.xml files.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.
            We'll need the following changes:
            h3. 4a. Make collation definition loader understand a new "nopad" flag.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml can look like this (notice a new line highlighted):
            {code:xml|highlight=5}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            h3. 4b. Make collation definition loader reuse weight tables
            New _ci collations should use exactly the same weight definition (i.e. mapping from character codes to their collating weight) with their PAD SPACE counterparts.
            For example, weight table definition for greek_general_ci resides in greek.xml and looks like this:
            {code:xml}
            <collation name="greek_general_ci">
            <map>
             00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
             10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
             20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
             30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
             40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
             60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
             80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
             90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
             A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
             B0 B1 B2 B3 B4 B5 C1 B7 C5 C7 C9 BB CF BD D5 D9
             C9 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 C9 D5 C1 C5 C7 C9
             D5 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D3 D3 D4 D5 D6 D7 D8 D9 C9 D5 CF D5 D9 FF
            </map>
            </collation>
            {code}
            The easiest way would be just to copy-and-paste its definition and use the copy with a new name greek_general_nopad_ci.
            But it would be nice to avoid duplication of definitions. We'll extend ctype.c to understand references to other collations instead of explicit weight table definitions, so a new block for the "greek" character set can look like this:

            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_nopad_ci" id="xxx" flag="nopad" map="greek_general_ci"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            Notice, the "map" attribute has another collation name rather than a full weight table definition.
            bar Alexander Barkov made changes -
            Description MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details"
            - Tests, as described in "Testing"

            h2. 1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            h2. 2. Collations for Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            h2. 3. Collations for latin1
            Unlike all other 8-bit character sets, latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            h2. 4. Collations for other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are implemented in ctype-extra.c.
            Inlike all other collations, this file is not manually written. It's generated from collation definition files in sql/share/charsets/*.xml files.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.
            We'll need the following changes:
            h3. 4a. Make collation definition loader understand a new "nopad" flag.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml can look like this (notice a new line highlighted):
            {code:xml|highlight=5}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            h3. 4b. Make collation definition loader reuse weight tables
            New _ci collations should use exactly the same weight definition (i.e. mapping from character codes to their collating weight) with their PAD SPACE counterparts.
            For example, weight table definition for greek_general_ci resides in greek.xml and looks like this:
            {code:xml}
            <collation name="greek_general_ci">
            <map>
             00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
             10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
             20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
             30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
             40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
             60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
             80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
             90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
             A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
             B0 B1 B2 B3 B4 B5 C1 B7 C5 C7 C9 BB CF BD D5 D9
             C9 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 C9 D5 C1 C5 C7 C9
             D5 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D3 D3 D4 D5 D6 D7 D8 D9 C9 D5 CF D5 D9 FF
            </map>
            </collation>
            {code}
            The easiest way would be just to copy-and-paste its definition and use the copy with a new name greek_general_nopad_ci.
            But it would be nice to avoid duplication of definitions. We'll extend ctype.c to understand references to other collations instead of explicit weight table definitions, so a new block for the "greek" character set can look like this:

            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_nopad_ci" id="xxx" flag="nopad" map="greek_general_ci"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            Notice, the "map" attribute has another collation name rather than a full weight table definition.
            MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details"
            - Tests, as described in "Testing"

            h2. 1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            h2. 2. Collations for Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            h2. 3. Collations for latin1
            Unlike all other 8-bit character sets, latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            h2. 4. Collations for other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are implemented in ctype-extra.c.
            Inlike all other collations, this file is not manually written. It's generated from collation definition files in sql/share/charsets/*.xml files.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.
            We'll need the following changes:
            h3. 4a. Make collation definition loader understand a new "nopad" flag.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml can look like this (notice a new line highlighted):
            {code:xml|highlight=5}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            h3. 4b. Make collation definition loader reuse weight tables
            New _ci collations should use exactly the same weight definition (i.e. mapping from character codes to their collating weight) with their PAD SPACE counterparts.
            For example, weight table definition for greek_general_ci resides in greek.xml and looks like this:
            {code:xml}
            <collation name="greek_general_ci">
            <map>
             00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
             10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
             20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
             30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
             40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
             60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
             80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
             90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
             A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
             B0 B1 B2 B3 B4 B5 C1 B7 C5 C7 C9 BB CF BD D5 D9
             C9 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 C9 D5 C1 C5 C7 C9
             D5 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D3 D3 D4 D5 D6 D7 D8 D9 C9 D5 CF D5 D9 FF
            </map>
            </collation>
            {code}
            The easiest way would be just to copy-and-paste its definition and use the copy with a new name greek_general_nopad_ci.
            But it would be nice to avoid duplication of definitions. We'll extend ctype.c to understand references to other collations instead of explicit weight table definitions, so a new block for the "greek" character set can look like this:

            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_nopad_ci" id="xxx" flag="nopad" map="greek_general_ci"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            Notice, the "map" attribute has another collation name rather than a full weight table definition.

            4c. Add new 8-bit collations
            At this sub-step we'll use the changes in the collation definition loader mentioned in 4a and 4b, and do the following:
            - Change Index.xml by adding new "nopad" collations for all character sets, as described in the example for "greek"
            - Regenerate ctype-extra.c, as described in the beginning of the section 4.


            bar Alexander Barkov made changes -
            Description MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details"
            - Tests, as described in "Testing"

            h2. 1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            h2. 2. Collations for Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            h2. 3. Collations for latin1
            Unlike all other 8-bit character sets, latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            h2. 4. Collations for other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are implemented in ctype-extra.c.
            Inlike all other collations, this file is not manually written. It's generated from collation definition files in sql/share/charsets/*.xml files.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.
            We'll need the following changes:
            h3. 4a. Make collation definition loader understand a new "nopad" flag.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml can look like this (notice a new line highlighted):
            {code:xml|highlight=5}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            h3. 4b. Make collation definition loader reuse weight tables
            New _ci collations should use exactly the same weight definition (i.e. mapping from character codes to their collating weight) with their PAD SPACE counterparts.
            For example, weight table definition for greek_general_ci resides in greek.xml and looks like this:
            {code:xml}
            <collation name="greek_general_ci">
            <map>
             00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
             10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
             20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
             30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
             40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
             60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
             80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
             90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
             A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
             B0 B1 B2 B3 B4 B5 C1 B7 C5 C7 C9 BB CF BD D5 D9
             C9 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 C9 D5 C1 C5 C7 C9
             D5 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D3 D3 D4 D5 D6 D7 D8 D9 C9 D5 CF D5 D9 FF
            </map>
            </collation>
            {code}
            The easiest way would be just to copy-and-paste its definition and use the copy with a new name greek_general_nopad_ci.
            But it would be nice to avoid duplication of definitions. We'll extend ctype.c to understand references to other collations instead of explicit weight table definitions, so a new block for the "greek" character set can look like this:

            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_nopad_ci" id="xxx" flag="nopad" map="greek_general_ci"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            Notice, the "map" attribute has another collation name rather than a full weight table definition.

            4c. Add new 8-bit collations
            At this sub-step we'll use the changes in the collation definition loader mentioned in 4a and 4b, and do the following:
            - Change Index.xml by adding new "nopad" collations for all character sets, as described in the example for "greek"
            - Regenerate ctype-extra.c, as described in the beginning of the section 4.


            MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details"
            - Tests, as described in "Testing"

            h2. 1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            h2. 2. Collations for Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            h2. 3. Collations for latin1
            Unlike all other 8-bit character sets, latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            h2. 4. Collations for other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are implemented in ctype-extra.c.
            Inlike all other collations, this file is not manually written. It's generated from collation definition files in sql/share/charsets/*.xml files.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.
            We'll need the following changes:
            h3. 4a. Make collation definition loader understand a new "nopad" flag.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml can look like this (notice a new line highlighted):
            {code:xml|highlight=5}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            h3. 4b. Make collation definition loader reuse weight tables
            New _ci collations should use exactly the same weight definition (i.e. mapping from character codes to their collating weight) with their PAD SPACE counterparts.
            For example, weight table definition for greek_general_ci resides in greek.xml and looks like this:
            {code:xml}
            <collation name="greek_general_ci">
            <map>
             00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
             10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
             20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
             30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
             40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
             60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
             80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
             90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
             A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
             B0 B1 B2 B3 B4 B5 C1 B7 C5 C7 C9 BB CF BD D5 D9
             C9 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 C9 D5 C1 C5 C7 C9
             D5 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D3 D3 D4 D5 D6 D7 D8 D9 C9 D5 CF D5 D9 FF
            </map>
            </collation>
            {code}
            The easiest way would be just to copy-and-paste its definition and use the copy with a new name greek_general_nopad_ci.
            But it would be nice to avoid duplication of definitions. We'll extend ctype.c to understand references to other collations instead of explicit weight table definitions, so a new block for the "greek" character set can look like this:

            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_nopad_ci" id="xxx" flag="nopad" map="greek_general_ci"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            Notice, the "map" attribute has another collation name rather than a full weight table definition.

            h3. 4c. Add new 8-bit collations
            At this sub-step we'll use the changes in the collation definition loader mentioned in 4a and 4b, and do the following:
            - Change Index.xml by adding new "nopad" collations for all character sets, as described in the example for "greek"
            - Regenerate ctype-extra.c, as described in the beginning of the section 4.


            bar Alexander Barkov made changes -
            Description MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details"
            - Tests, as described in "Testing"

            h2. 1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            h2. 2. Collations for Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            h2. 3. Collations for latin1
            Unlike all other 8-bit character sets, latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            h2. 4. Collations for other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are implemented in ctype-extra.c.
            Inlike all other collations, this file is not manually written. It's generated from collation definition files in sql/share/charsets/*.xml files.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.
            We'll need the following changes:
            h3. 4a. Make collation definition loader understand a new "nopad" flag.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml can look like this (notice a new line highlighted):
            {code:xml|highlight=5}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            h3. 4b. Make collation definition loader reuse weight tables
            New _ci collations should use exactly the same weight definition (i.e. mapping from character codes to their collating weight) with their PAD SPACE counterparts.
            For example, weight table definition for greek_general_ci resides in greek.xml and looks like this:
            {code:xml}
            <collation name="greek_general_ci">
            <map>
             00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
             10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
             20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
             30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
             40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
             60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
             80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
             90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
             A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
             B0 B1 B2 B3 B4 B5 C1 B7 C5 C7 C9 BB CF BD D5 D9
             C9 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 C9 D5 C1 C5 C7 C9
             D5 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D3 D3 D4 D5 D6 D7 D8 D9 C9 D5 CF D5 D9 FF
            </map>
            </collation>
            {code}
            The easiest way would be just to copy-and-paste its definition and use the copy with a new name greek_general_nopad_ci.
            But it would be nice to avoid duplication of definitions. We'll extend ctype.c to understand references to other collations instead of explicit weight table definitions, so a new block for the "greek" character set can look like this:

            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_nopad_ci" id="xxx" flag="nopad" map="greek_general_ci"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            Notice, the "map" attribute has another collation name rather than a full weight table definition.

            h3. 4c. Add new 8-bit collations
            At this sub-step we'll use the changes in the collation definition loader mentioned in 4a and 4b, and do the following:
            - Change Index.xml by adding new "nopad" collations for all character sets, as described in the example for "greek"
            - Regenerate ctype-extra.c, as described in the beginning of the section 4.


            MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details"
            - Tests, as described in "Testing"

            h2. 1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            h2. 2. Collations for Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            h2. 3. Collations for latin1
            Unlike all other 8-bit character sets, latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            h2. 4. Collations for other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are implemented in ctype-extra.c.
            Inlike all other collations, this file is not manually written. It's generated from collation definition files in sql/share/charsets/*.xml files.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.

            We'll need the following changes:
            h3. 4a. Make collation definition loader understand a new "nopad" flag.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml can look like this (notice a new line highlighted):
            {code:xml|highlight=5}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            h3. 4b. Make collation definition loader reuse weight tables
            New _ci collations should use exactly the same weight definition (i.e. mapping from character codes to their collating weight) with their PAD SPACE counterparts.
            For example, weight table definition for greek_general_ci resides in greek.xml and looks like this:
            {code:xml}
            <collation name="greek_general_ci">
            <map>
             00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
             10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
             20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
             30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
             40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
             60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
             80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
             90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
             A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
             B0 B1 B2 B3 B4 B5 C1 B7 C5 C7 C9 BB CF BD D5 D9
             C9 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 C9 D5 C1 C5 C7 C9
             D5 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D3 D3 D4 D5 D6 D7 D8 D9 C9 D5 CF D5 D9 FF
            </map>
            </collation>
            {code}
            The easiest way would be just to copy-and-paste its definition and use the copy with a new name greek_general_nopad_ci.
            But it would be nice to avoid duplication of definitions. We'll extend ctype.c to understand references to other collations instead of explicit weight table definitions, so a new block for the "greek" character set can look like this:

            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_nopad_ci" id="xxx" flag="nopad" map="greek_general_ci"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            Notice, the "map" attribute has another collation name rather than a full weight table definition.

            h3. 4c. Add new 8-bit collations
            At this sub-step we'll use the changes in the collation definition loader mentioned in 4a and 4b, and do the following:
            - Edit Index.xml:
            -- Add new _nopad_bin collations to all 8-bit character sets, as described in the example for "greek" in 4a.
            -- Add new _nopad_ci collations to all 8-bit character sets, as described in the example for "greek" in 4b.
            - Regenerate ctype-extra.c, as described in the beginning of the section 4.


            bar Alexander Barkov made changes -
            Description MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details"
            - Tests, as described in "Testing"

            h2. 1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            h2. 2. Collations for Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            h2. 3. Collations for latin1
            Unlike all other 8-bit character sets, latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            h2. 4. Collations for other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are implemented in ctype-extra.c.
            Inlike all other collations, this file is not manually written. It's generated from collation definition files in sql/share/charsets/*.xml files.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.

            We'll need the following changes:
            h3. 4a. Make collation definition loader understand a new "nopad" flag.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml can look like this (notice a new line highlighted):
            {code:xml|highlight=5}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            h3. 4b. Make collation definition loader reuse weight tables
            New _ci collations should use exactly the same weight definition (i.e. mapping from character codes to their collating weight) with their PAD SPACE counterparts.
            For example, weight table definition for greek_general_ci resides in greek.xml and looks like this:
            {code:xml}
            <collation name="greek_general_ci">
            <map>
             00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
             10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
             20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
             30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
             40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
             60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
             80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
             90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
             A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
             B0 B1 B2 B3 B4 B5 C1 B7 C5 C7 C9 BB CF BD D5 D9
             C9 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 C9 D5 C1 C5 C7 C9
             D5 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D3 D3 D4 D5 D6 D7 D8 D9 C9 D5 CF D5 D9 FF
            </map>
            </collation>
            {code}
            The easiest way would be just to copy-and-paste its definition and use the copy with a new name greek_general_nopad_ci.
            But it would be nice to avoid duplication of definitions. We'll extend ctype.c to understand references to other collations instead of explicit weight table definitions, so a new block for the "greek" character set can look like this:

            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_nopad_ci" id="xxx" flag="nopad" map="greek_general_ci"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            Notice, the "map" attribute has another collation name rather than a full weight table definition.

            h3. 4c. Add new 8-bit collations
            At this sub-step we'll use the changes in the collation definition loader mentioned in 4a and 4b, and do the following:
            - Edit Index.xml:
            -- Add new _nopad_bin collations to all 8-bit character sets, as described in the example for "greek" in 4a.
            -- Add new _nopad_ci collations to all 8-bit character sets, as described in the example for "greek" in 4b.
            - Regenerate ctype-extra.c, as described in the beginning of the section 4.


            MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details"
            - Tests, as described in "Testing"

            h2. 1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            h2. 2. Collations for Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            h2. 3. Collations for latin1
            Unlike all other 8-bit character sets, latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            h2. 4. Collations for other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are implemented in ctype-extra.c.
            Inlike all other collations, this file is not manually written. It's generated from collation definition files in sql/share/charsets/*.xml files.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.

            We'll need the following changes:
            h3. 4a. Make collation definition loader understand a new "nopad" flag.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml can look like this (notice a new line highlighted):
            {code:xml|highlight=5}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            h3. 4b. Make collation definition loader reuse weight tables
            New _ci collations should use exactly the same weight definition (i.e. mapping from character codes to their collating weight) with their PAD SPACE counterparts.
            For example, weight table definition for greek_general_ci resides in greek.xml and looks like this:
            {code:xml}
            <collation name="greek_general_ci">
            <map>
             00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
             10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
             20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
             30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
             40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
             60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
             80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
             90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
             A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
             B0 B1 B2 B3 B4 B5 C1 B7 C5 C7 C9 BB CF BD D5 D9
             C9 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 C9 D5 C1 C5 C7 C9
             D5 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D3 D3 D4 D5 D6 D7 D8 D9 C9 D5 CF D5 D9 FF
            </map>
            </collation>
            {code}
            The easiest way would be just to copy-and-paste its definition and use the copy with a new name greek_general_nopad_ci.
            But it would be nice to avoid duplication of definitions. We'll extend ctype.c to understand references to other collations instead of explicit weight table definitions, so a new block for the "greek" character set can look like this:

            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_nopad_ci" id="xxx" flag="nopad" map="greek_general_ci"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            Notice, the "map" attribute has another collation name rather than a full weight table definition.

            h3. 4c. Add new 8-bit collations
            At this sub-step we'll use the changes made in the collation definition loader mentioned in 4a and 4b, and do the following:
            - Edit Index.xml:
            -- Add new _nopad_bin collations to all 8-bit character sets, as described in the example for "greek" in 4a.
            -- Add new _nopad_ci collations to all 8-bit character sets, as described in the example for "greek" in 4b.
            - Regenerate ctype-extra.c, as described in the beginning of the section 4.


            bar Alexander Barkov made changes -
            Description MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details"
            - Tests, as described in "Testing"

            h2. 1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            h2. 2. Collations for Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            h2. 3. Collations for latin1
            Unlike all other 8-bit character sets, latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            h2. 4. Collations for other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are implemented in ctype-extra.c.
            Inlike all other collations, this file is not manually written. It's generated from collation definition files in sql/share/charsets/*.xml files.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.

            We'll need the following changes:
            h3. 4a. Make collation definition loader understand a new "nopad" flag.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml can look like this (notice a new line highlighted):
            {code:xml|highlight=5}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            h3. 4b. Make collation definition loader reuse weight tables
            New _ci collations should use exactly the same weight definition (i.e. mapping from character codes to their collating weight) with their PAD SPACE counterparts.
            For example, weight table definition for greek_general_ci resides in greek.xml and looks like this:
            {code:xml}
            <collation name="greek_general_ci">
            <map>
             00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
             10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
             20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
             30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
             40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
             60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
             80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
             90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
             A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
             B0 B1 B2 B3 B4 B5 C1 B7 C5 C7 C9 BB CF BD D5 D9
             C9 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 C9 D5 C1 C5 C7 C9
             D5 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D3 D3 D4 D5 D6 D7 D8 D9 C9 D5 CF D5 D9 FF
            </map>
            </collation>
            {code}
            The easiest way would be just to copy-and-paste its definition and use the copy with a new name greek_general_nopad_ci.
            But it would be nice to avoid duplication of definitions. We'll extend ctype.c to understand references to other collations instead of explicit weight table definitions, so a new block for the "greek" character set can look like this:

            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_nopad_ci" id="xxx" flag="nopad" map="greek_general_ci"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            Notice, the "map" attribute has another collation name rather than a full weight table definition.

            h3. 4c. Add new 8-bit collations
            At this sub-step we'll use the changes made in the collation definition loader mentioned in 4a and 4b, and do the following:
            - Edit Index.xml:
            -- Add new _nopad_bin collations to all 8-bit character sets, as described in the example for "greek" in 4a.
            -- Add new _nopad_ci collations to all 8-bit character sets, as described in the example for "greek" in 4b.
            - Regenerate ctype-extra.c, as described in the beginning of the section 4.


            MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details"
            - Tests, as described in "Testing"

            h2. 1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            h2. 2. Collations for Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            h2. 3. Collations for latin1
            Unlike all other 8-bit character sets, latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            h2. 4. Collations for other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are implemented in ctype-extra.c.
            Inlike all other collations, this file is not manually written. It's generated from collation definition files in sql/share/charsets/*.xml files.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.

            We'll need the following changes:
            h3. 4a. Make collation definition loader understand a new "nopad" flag.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml can look like this (notice a new line highlighted):
            {code:xml|highlight=5}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            h3. 4b. Make collation definition loader reuse weight tables
            New _ci collations should use exactly the same weight definition (i.e. mapping from character codes to their collating weight) with their PAD SPACE counterparts.
            For example, weight table definition for greek_general_ci resides in greek.xml and looks like this:
            {code:xml}
            <collation name="greek_general_ci">
            <map>
             00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
             10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
             20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
             30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
             40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
             60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
             80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
             90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
             A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
             B0 B1 B2 B3 B4 B5 C1 B7 C5 C7 C9 BB CF BD D5 D9
             C9 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 C9 D5 C1 C5 C7 C9
             D5 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D3 D3 D4 D5 D6 D7 D8 D9 C9 D5 CF D5 D9 FF
            </map>
            </collation>
            {code}
            The easiest way would be just to copy-and-paste its definition and use the copy with a new name greek_general_nopad_ci.
            But it would be nice to avoid duplication of definitions. We'll extend ctype.c to understand references to other collations instead of explicit weight table definitions, so a new block for the "greek" character set can look like this:

            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_nopad_ci" id="xxx" flag="nopad" map="greek_general_ci"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            Notice, the "map" attribute has another collation name rather than a full weight table definition.

            h3. 4c. Add new 8-bit collations
            At this sub-step we'll use the changes made in the collation definition loader mentioned in 4a and 4b, and do the following:
            - Edit Index.xml:
            -- Add new _nopad_bin collations to all 8-bit character sets, as described in the example for "greek" in 4a.
            -- Add new _nopad_ci collations to all 8-bit character sets, as described in the example for "greek" in 4b.
            - Regenerate ctype-extra.c, as described in the beginning of the section 4.
            - Add tests

            bar Alexander Barkov made changes -
            Description MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details"
            - Tests, as described in "Testing"

            h2. 1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            h2. 2. Collations for Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            h2. 3. Collations for latin1
            Unlike all other 8-bit character sets, latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            h2. 4. Collations for other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are implemented in ctype-extra.c.
            Inlike all other collations, this file is not manually written. It's generated from collation definition files in sql/share/charsets/*.xml files.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.

            We'll need the following changes:
            h3. 4a. Make collation definition loader understand a new "nopad" flag.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml can look like this (notice a new line highlighted):
            {code:xml|highlight=5}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            h3. 4b. Make collation definition loader reuse weight tables
            New _ci collations should use exactly the same weight definition (i.e. mapping from character codes to their collating weight) with their PAD SPACE counterparts.
            For example, weight table definition for greek_general_ci resides in greek.xml and looks like this:
            {code:xml}
            <collation name="greek_general_ci">
            <map>
             00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
             10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
             20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
             30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
             40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
             60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
             80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
             90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
             A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
             B0 B1 B2 B3 B4 B5 C1 B7 C5 C7 C9 BB CF BD D5 D9
             C9 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 C9 D5 C1 C5 C7 C9
             D5 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D3 D3 D4 D5 D6 D7 D8 D9 C9 D5 CF D5 D9 FF
            </map>
            </collation>
            {code}
            The easiest way would be just to copy-and-paste its definition and use the copy with a new name greek_general_nopad_ci.
            But it would be nice to avoid duplication of definitions. We'll extend ctype.c to understand references to other collations instead of explicit weight table definitions, so a new block for the "greek" character set can look like this:

            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_nopad_ci" id="xxx" flag="nopad" map="greek_general_ci"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            Notice, the "map" attribute has another collation name rather than a full weight table definition.

            h3. 4c. Add new 8-bit collations
            At this sub-step we'll use the changes made in the collation definition loader mentioned in 4a and 4b, and do the following:
            - Edit Index.xml:
            -- Add new _nopad_bin collations to all 8-bit character sets, as described in the example for "greek" in 4a.
            -- Add new _nopad_ci collations to all 8-bit character sets, as described in the example for "greek" in 4b.
            - Regenerate ctype-extra.c, as described in the beginning of the section 4.
            - Add tests

            MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details" (for Unicode, Asian and latin1) and in section 4 (for 8-bit character sets)
            - Tests, as described in "Testing"

            h2. 1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            h2. 2. Collations for Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            h2. 3. Collations for latin1
            Unlike all other 8-bit character sets, latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            h2. 4. Collations for other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are implemented in ctype-extra.c.
            Inlike all other collations, this file is not manually written. It's generated from collation definition files in sql/share/charsets/*.xml files.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.

            We'll need the following changes:
            h3. 4a. Make collation definition loader understand a new "nopad" flag.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml can look like this (notice a new line highlighted):
            {code:xml|highlight=5}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            h3. 4b. Make collation definition loader reuse weight tables
            New _ci collations should use exactly the same weight definition (i.e. mapping from character codes to their collating weight) with their PAD SPACE counterparts.
            For example, weight table definition for greek_general_ci resides in greek.xml and looks like this:
            {code:xml}
            <collation name="greek_general_ci">
            <map>
             00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
             10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
             20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
             30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
             40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
             60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
             80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
             90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
             A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
             B0 B1 B2 B3 B4 B5 C1 B7 C5 C7 C9 BB CF BD D5 D9
             C9 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 C9 D5 C1 C5 C7 C9
             D5 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D3 D3 D4 D5 D6 D7 D8 D9 C9 D5 CF D5 D9 FF
            </map>
            </collation>
            {code}
            The easiest way would be just to copy-and-paste its definition and use the copy with a new name greek_general_nopad_ci.
            But it would be nice to avoid duplication of definitions. We'll extend ctype.c to understand references to other collations instead of explicit weight table definitions, so a new block for the "greek" character set can look like this:

            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_nopad_ci" id="xxx" flag="nopad" map="greek_general_ci"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            Notice, the "map" attribute has another collation name rather than a full weight table definition.

            h3. 4c. Add new 8-bit collations
            At this sub-step we'll use the changes made in the collation definition loader mentioned in 4a and 4b, and do the following:
            - Edit Index.xml:
            -- Add new _nopad_bin collations to all 8-bit character sets, as described in the example for "greek" in 4a.
            -- Add new _nopad_ci collations to all 8-bit character sets, as described in the example for "greek" in 4b.
            - Regenerate ctype-extra.c, as described in the beginning of the section 4.
            - Add tests

            bar Alexander Barkov made changes -
            Description MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other character sets are to be added using about the same steps (some minor details may differ though).

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details" (for Unicode, Asian and latin1) and in section 4 (for 8-bit character sets)
            - Tests, as described in "Testing"

            h2. 1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            h2. 2. Collations for Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            h2. 3. Collations for latin1
            Unlike all other 8-bit character sets, latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            h2. 4. Collations for other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are implemented in ctype-extra.c.
            Inlike all other collations, this file is not manually written. It's generated from collation definition files in sql/share/charsets/*.xml files.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.

            We'll need the following changes:
            h3. 4a. Make collation definition loader understand a new "nopad" flag.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml can look like this (notice a new line highlighted):
            {code:xml|highlight=5}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            h3. 4b. Make collation definition loader reuse weight tables
            New _ci collations should use exactly the same weight definition (i.e. mapping from character codes to their collating weight) with their PAD SPACE counterparts.
            For example, weight table definition for greek_general_ci resides in greek.xml and looks like this:
            {code:xml}
            <collation name="greek_general_ci">
            <map>
             00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
             10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
             20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
             30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
             40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
             60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
             80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
             90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
             A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
             B0 B1 B2 B3 B4 B5 C1 B7 C5 C7 C9 BB CF BD D5 D9
             C9 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 C9 D5 C1 C5 C7 C9
             D5 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D3 D3 D4 D5 D6 D7 D8 D9 C9 D5 CF D5 D9 FF
            </map>
            </collation>
            {code}
            The easiest way would be just to copy-and-paste its definition and use the copy with a new name greek_general_nopad_ci.
            But it would be nice to avoid duplication of definitions. We'll extend ctype.c to understand references to other collations instead of explicit weight table definitions, so a new block for the "greek" character set can look like this:

            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_nopad_ci" id="xxx" flag="nopad" map="greek_general_ci"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            Notice, the "map" attribute has another collation name rather than a full weight table definition.

            h3. 4c. Add new 8-bit collations
            At this sub-step we'll use the changes made in the collation definition loader mentioned in 4a and 4b, and do the following:
            - Edit Index.xml:
            -- Add new _nopad_bin collations to all 8-bit character sets, as described in the example for "greek" in 4a.
            -- Add new _nopad_ci collations to all 8-bit character sets, as described in the example for "greek" in 4b.
            - Regenerate ctype-extra.c, as described in the beginning of the section 4.
            - Add tests

            MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other Unicode and Asian character sets and for latin1 are to be added using about the same steps (some minor details may differ though).
            Collations for 8-bit character sets are to be done in a different way, see section 4.

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details" (for Unicode, Asian and latin1) and in section 4 (for 8-bit character sets)
            - Tests, as described in "Testing"

            h2. 1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            h2. 2. Collations for Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            h2. 3. Collations for latin1
            Unlike all other 8-bit character sets, latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            h2. 4. Collations for other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are implemented in ctype-extra.c.
            Inlike all other collations, this file is not manually written. It's generated from collation definition files in sql/share/charsets/*.xml files.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.

            We'll need the following changes:
            h3. 4a. Make collation definition loader understand a new "nopad" flag.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml can look like this (notice a new line highlighted):
            {code:xml|highlight=5}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            h3. 4b. Make collation definition loader reuse weight tables
            New _ci collations should use exactly the same weight definition (i.e. mapping from character codes to their collating weight) with their PAD SPACE counterparts.
            For example, weight table definition for greek_general_ci resides in greek.xml and looks like this:
            {code:xml}
            <collation name="greek_general_ci">
            <map>
             00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
             10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
             20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
             30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
             40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
             60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
             80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
             90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
             A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
             B0 B1 B2 B3 B4 B5 C1 B7 C5 C7 C9 BB CF BD D5 D9
             C9 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 C9 D5 C1 C5 C7 C9
             D5 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D3 D3 D4 D5 D6 D7 D8 D9 C9 D5 CF D5 D9 FF
            </map>
            </collation>
            {code}
            The easiest way would be just to copy-and-paste its definition and use the copy with a new name greek_general_nopad_ci.
            But it would be nice to avoid duplication of definitions. We'll extend ctype.c to understand references to other collations instead of explicit weight table definitions, so a new block for the "greek" character set can look like this:

            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_nopad_ci" id="xxx" flag="nopad" map="greek_general_ci"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            Notice, the "map" attribute has another collation name rather than a full weight table definition.

            h3. 4c. Add new 8-bit collations
            At this sub-step we'll use the changes made in the collation definition loader mentioned in 4a and 4b, and do the following:
            - Edit Index.xml:
            -- Add new _nopad_bin collations to all 8-bit character sets, as described in the example for "greek" in 4a.
            -- Add new _nopad_ci collations to all 8-bit character sets, as described in the example for "greek" in 4b.
            - Regenerate ctype-extra.c, as described in the beginning of the section 4.
            - Add tests

            bar Alexander Barkov made changes -
            Description MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other Unicode and Asian character sets and for latin1 are to be added using about the same steps (some minor details may differ though).
            Collations for 8-bit character sets are to be done in a different way, see section 4.

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details" (for Unicode, Asian and latin1) and in section 4 (for 8-bit character sets)
            - Tests, as described in "Testing"

            h2. 1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            h2. 2. Collations for Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            h2. 3. Collations for latin1
            Unlike all other 8-bit character sets, latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            h2. 4. Collations for other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are implemented in ctype-extra.c.
            Inlike all other collations, this file is not manually written. It's generated from collation definition files in sql/share/charsets/*.xml files.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.

            We'll need the following changes:
            h3. 4a. Make collation definition loader understand a new "nopad" flag.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml can look like this (notice a new line highlighted):
            {code:xml|highlight=5}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            h3. 4b. Make collation definition loader reuse weight tables
            New _ci collations should use exactly the same weight definition (i.e. mapping from character codes to their collating weight) with their PAD SPACE counterparts.
            For example, weight table definition for greek_general_ci resides in greek.xml and looks like this:
            {code:xml}
            <collation name="greek_general_ci">
            <map>
             00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
             10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
             20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
             30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
             40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
             60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
             80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
             90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
             A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
             B0 B1 B2 B3 B4 B5 C1 B7 C5 C7 C9 BB CF BD D5 D9
             C9 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 C9 D5 C1 C5 C7 C9
             D5 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D3 D3 D4 D5 D6 D7 D8 D9 C9 D5 CF D5 D9 FF
            </map>
            </collation>
            {code}
            The easiest way would be just to copy-and-paste its definition and use the copy with a new name greek_general_nopad_ci.
            But it would be nice to avoid duplication of definitions. We'll extend ctype.c to understand references to other collations instead of explicit weight table definitions, so a new block for the "greek" character set can look like this:

            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_nopad_ci" id="xxx" flag="nopad" map="greek_general_ci"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            Notice, the "map" attribute has another collation name rather than a full weight table definition.

            h3. 4c. Add new 8-bit collations
            At this sub-step we'll use the changes made in the collation definition loader mentioned in 4a and 4b, and do the following:
            - Edit Index.xml:
            -- Add new _nopad_bin collations to all 8-bit character sets, as described in the example for "greek" in 4a.
            -- Add new _nopad_ci collations to all 8-bit character sets, as described in the example for "greek" in 4b.
            - Regenerate ctype-extra.c, as described in the beginning of the section 4.
            - Add tests

            MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other Unicode and Asian character sets and for latin1 are to be added using about the same steps (some minor details may differ though).
            Collations for 8-bit character sets are to be done in a different way, see section 4.

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details" (for Unicode and Asian character sets and for latin1) and in section 4 (for 8-bit character sets)
            - Tests, as described in "Testing"

            h2. 1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            h2. 2. Collations for Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            h2. 3. Collations for latin1
            Unlike all other 8-bit character sets, latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            h2. 4. Collations for other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are implemented in ctype-extra.c.
            Inlike all other collations, this file is not manually written. It's generated from collation definition files in sql/share/charsets/*.xml files.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.

            We'll need the following changes:
            h3. 4a. Make collation definition loader understand a new "nopad" flag.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml can look like this (notice a new line highlighted):
            {code:xml|highlight=5}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            h3. 4b. Make collation definition loader reuse weight tables
            New _ci collations should use exactly the same weight definition (i.e. mapping from character codes to their collating weight) with their PAD SPACE counterparts.
            For example, weight table definition for greek_general_ci resides in greek.xml and looks like this:
            {code:xml}
            <collation name="greek_general_ci">
            <map>
             00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
             10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
             20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
             30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
             40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
             60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
             80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
             90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
             A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
             B0 B1 B2 B3 B4 B5 C1 B7 C5 C7 C9 BB CF BD D5 D9
             C9 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 C9 D5 C1 C5 C7 C9
             D5 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D3 D3 D4 D5 D6 D7 D8 D9 C9 D5 CF D5 D9 FF
            </map>
            </collation>
            {code}
            The easiest way would be just to copy-and-paste its definition and use the copy with a new name greek_general_nopad_ci.
            But it would be nice to avoid duplication of definitions. We'll extend ctype.c to understand references to other collations instead of explicit weight table definitions, so a new block for the "greek" character set can look like this:

            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_nopad_ci" id="xxx" flag="nopad" map="greek_general_ci"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            Notice, the "map" attribute has another collation name rather than a full weight table definition.

            h3. 4c. Add new 8-bit collations
            At this sub-step we'll use the changes made in the collation definition loader mentioned in 4a and 4b, and do the following:
            - Edit Index.xml:
            -- Add new _nopad_bin collations to all 8-bit character sets, as described in the example for "greek" in 4a.
            -- Add new _nopad_ci collations to all 8-bit character sets, as described in the example for "greek" in 4b.
            - Regenerate ctype-extra.c, as described in the beginning of the section 4.
            - Add tests

            bar Alexander Barkov made changes -
            Description MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other Unicode and Asian character sets and for latin1 are to be added using about the same steps (some minor details may differ though).
            Collations for 8-bit character sets are to be done in a different way, see section 4.

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details" (for Unicode and Asian character sets and for latin1) and in section 4 (for 8-bit character sets)
            - Tests, as described in "Testing"

            h2. 1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            h2. 2. Collations for Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            h2. 3. Collations for latin1
            Unlike all other 8-bit character sets, latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            h2. 4. Collations for other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are implemented in ctype-extra.c.
            Inlike all other collations, this file is not manually written. It's generated from collation definition files in sql/share/charsets/*.xml files.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.

            We'll need the following changes:
            h3. 4a. Make collation definition loader understand a new "nopad" flag.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml can look like this (notice a new line highlighted):
            {code:xml|highlight=5}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            h3. 4b. Make collation definition loader reuse weight tables
            New _ci collations should use exactly the same weight definition (i.e. mapping from character codes to their collating weight) with their PAD SPACE counterparts.
            For example, weight table definition for greek_general_ci resides in greek.xml and looks like this:
            {code:xml}
            <collation name="greek_general_ci">
            <map>
             00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
             10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
             20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
             30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
             40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
             60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
             80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
             90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
             A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
             B0 B1 B2 B3 B4 B5 C1 B7 C5 C7 C9 BB CF BD D5 D9
             C9 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 C9 D5 C1 C5 C7 C9
             D5 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D3 D3 D4 D5 D6 D7 D8 D9 C9 D5 CF D5 D9 FF
            </map>
            </collation>
            {code}
            The easiest way would be just to copy-and-paste its definition and use the copy with a new name greek_general_nopad_ci.
            But it would be nice to avoid duplication of definitions. We'll extend ctype.c to understand references to other collations instead of explicit weight table definitions, so a new block for the "greek" character set can look like this:

            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_nopad_ci" id="xxx" flag="nopad" map="greek_general_ci"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            Notice, the "map" attribute has another collation name rather than a full weight table definition.

            h3. 4c. Add new 8-bit collations
            At this sub-step we'll use the changes made in the collation definition loader mentioned in 4a and 4b, and do the following:
            - Edit Index.xml:
            -- Add new _nopad_bin collations to all 8-bit character sets, as described in the example for "greek" in 4a.
            -- Add new _nopad_ci collations to all 8-bit character sets, as described in the example for "greek" in 4b.
            - Regenerate ctype-extra.c, as described in the beginning of the section 4.
            - Add tests

            MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other Unicode and Asian character sets and for latin1 are to be added using about the same steps (some minor details may differ though).
            Collations for 8-bit character sets are to be done in a different way, see section 4.

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details" (for Unicode and Asian character sets and for latin1) and in section 4 (for 8-bit character sets)
            - Tests, as described in "Testing"

            h2. 1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            h2. 2. Collations for Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            h2. 3. Collations for latin1
            Unlike all other 8-bit character sets, latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            h2. 4. Collations for other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are implemented in ctype-extra.c.
            Inlike all other collations, this file is not manually written. It's generated from collation definition files in sql/share/charsets/*.xml files.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.

            We'll need the following changes:
            h3. 4a. Make collation definition loader understand a new "nopad" flag.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml can look like this (notice a new line highlighted):
            {code:xml|highlight=5}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            h3. 4b. Make collation definition loader reuse weight tables
            New _ci collations should use exactly the same weight definition (i.e. mapping from character codes to their collating weight) with their PAD SPACE counterparts.
            For example, weight table definition for greek_general_ci resides in greek.xml and looks like this:
            {code:xml}
            <collation name="greek_general_ci">
            <map>
             00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
             10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
             20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
             30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
             40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
             60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
             80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
             90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
             A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
             B0 B1 B2 B3 B4 B5 C1 B7 C5 C7 C9 BB CF BD D5 D9
             C9 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 C9 D5 C1 C5 C7 C9
             D5 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D3 D3 D4 D5 D6 D7 D8 D9 C9 D5 CF D5 D9 FF
            </map>
            </collation>
            {code}
            The easiest way would be just to copy-and-paste its definition and use the copy with a new name greek_general_nopad_ci.
            But it would be nice to avoid duplication of definitions. We'll extend ctype.c to understand references to other collations instead of explicit weight table definitions, so a new block for the "greek" character set can look like this:

            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_nopad_ci" id="xxx" flag="nopad" map="greek_general_ci"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            Notice, the "map" attribute has another collation name rather than a full weight table definition.

            h3. 4c. Add new 8-bit collations
            At this sub-step we'll use the changes made in the collation definition loader mentioned in 4a and 4b, and do the following:
            - Edit Index.xml:
            -- Add new _nopad_bin collations to all 8-bit character sets, as described in the example for "greek" in 4a (one collation per character set)
            -- Add new _nopad_ci collations to all 8-bit character sets, as described in the example for "greek" in 4b (one collation per character set)
            - Regenerate ctype-extra.c, as described in the beginning of the section 4.
            - Add tests

            bar Alexander Barkov made changes -
            Description MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other Unicode and Asian character sets and for latin1 are to be added using about the same steps (some minor details may differ though).
            Collations for 8-bit character sets are to be done in a different way, see section 4.

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details" (for Unicode and Asian character sets and for latin1) and in section 4 (for 8-bit character sets)
            - Tests, as described in "Testing"

            h2. 1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            h2. 2. Collations for Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            h2. 3. Collations for latin1
            Unlike all other 8-bit character sets, latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            h2. 4. Collations for other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are implemented in ctype-extra.c.
            Inlike all other collations, this file is not manually written. It's generated from collation definition files in sql/share/charsets/*.xml files.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.

            We'll need the following changes:
            h3. 4a. Make collation definition loader understand a new "nopad" flag.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml can look like this (notice a new line highlighted):
            {code:xml|highlight=5}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            h3. 4b. Make collation definition loader reuse weight tables
            New _ci collations should use exactly the same weight definition (i.e. mapping from character codes to their collating weight) with their PAD SPACE counterparts.
            For example, weight table definition for greek_general_ci resides in greek.xml and looks like this:
            {code:xml}
            <collation name="greek_general_ci">
            <map>
             00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
             10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
             20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
             30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
             40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
             60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
             80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
             90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
             A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
             B0 B1 B2 B3 B4 B5 C1 B7 C5 C7 C9 BB CF BD D5 D9
             C9 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 C9 D5 C1 C5 C7 C9
             D5 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D3 D3 D4 D5 D6 D7 D8 D9 C9 D5 CF D5 D9 FF
            </map>
            </collation>
            {code}
            The easiest way would be just to copy-and-paste its definition and use the copy with a new name greek_general_nopad_ci.
            But it would be nice to avoid duplication of definitions. We'll extend ctype.c to understand references to other collations instead of explicit weight table definitions, so a new block for the "greek" character set can look like this:

            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_nopad_ci" id="xxx" flag="nopad" map="greek_general_ci"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            Notice, the "map" attribute has another collation name rather than a full weight table definition.

            h3. 4c. Add new 8-bit collations
            At this sub-step we'll use the changes made in the collation definition loader mentioned in 4a and 4b, and do the following:
            - Edit Index.xml:
            -- Add new _nopad_bin collations to all 8-bit character sets, as described in the example for "greek" in 4a (one collation per character set)
            -- Add new _nopad_ci collations to all 8-bit character sets, as described in the example for "greek" in 4b (one collation per character set)
            - Regenerate ctype-extra.c, as described in the beginning of the section 4.
            - Add tests

            MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other Unicode and Asian character sets and for latin1 are to be added using about the same steps (some minor details may differ though).
            Collations for 8-bit character sets are to be done in a different way, see section 4.

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            To simplify adding test, we'll create a shared include file, say mysql-test/include/ctype_pad.inc,
            which will be then included in all mysql-test/t/ctype_xxx.test files for individual character sets. See ctype_regex.inc as an example of such shared file.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details" (for Unicode and Asian character sets and for latin1) and in section 4 (for 8-bit character sets)
            - Tests, as described in "Testing"

            h2. 1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            h2. 2. Collations for Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            h2. 3. Collations for latin1
            Unlike all other 8-bit character sets, latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            h2. 4. Collations for other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are implemented in ctype-extra.c.
            Inlike all other collations, this file is not manually written. It's generated from collation definition files in sql/share/charsets/*.xml files.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.

            We'll need the following changes:
            h3. 4a. Make collation definition loader understand a new "nopad" flag.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml can look like this (notice a new line highlighted):
            {code:xml|highlight=5}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            h3. 4b. Make collation definition loader reuse weight tables
            New _ci collations should use exactly the same weight definition (i.e. mapping from character codes to their collating weight) with their PAD SPACE counterparts.
            For example, weight table definition for greek_general_ci resides in greek.xml and looks like this:
            {code:xml}
            <collation name="greek_general_ci">
            <map>
             00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
             10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
             20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
             30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
             40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
             60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
             80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
             90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
             A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
             B0 B1 B2 B3 B4 B5 C1 B7 C5 C7 C9 BB CF BD D5 D9
             C9 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 C9 D5 C1 C5 C7 C9
             D5 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D3 D3 D4 D5 D6 D7 D8 D9 C9 D5 CF D5 D9 FF
            </map>
            </collation>
            {code}
            The easiest way would be just to copy-and-paste its definition and use the copy with a new name greek_general_nopad_ci.
            But it would be nice to avoid duplication of definitions. We'll extend ctype.c to understand references to other collations instead of explicit weight table definitions, so a new block for the "greek" character set can look like this:

            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_nopad_ci" id="xxx" flag="nopad" map="greek_general_ci"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            Notice, the "map" attribute has another collation name rather than a full weight table definition.

            h3. 4c. Add new 8-bit collations
            At this sub-step we'll use the changes made in the collation definition loader mentioned in 4a and 4b, and do the following:
            - Edit Index.xml:
            -- Add new _nopad_bin collations to all 8-bit character sets, as described in the example for "greek" in 4a (one collation per character set)
            -- Add new _nopad_ci collations to all 8-bit character sets, as described in the example for "greek" in 4b (one collation per character set)
            - Regenerate ctype-extra.c, as described in the beginning of the section 4.
            - Add tests

            bar Alexander Barkov made changes -
            Description MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other Unicode and Asian character sets and for latin1 are to be added using about the same steps (some minor details may differ though).
            Collations for 8-bit character sets are to be done in a different way, see section 4.

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            To simplify adding test, we'll create a shared include file, say mysql-test/include/ctype_pad.inc,
            which will be then included in all mysql-test/t/ctype_xxx.test files for individual character sets. See ctype_regex.inc as an example of such shared file.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details" (for Unicode and Asian character sets and for latin1) and in section 4 (for 8-bit character sets)
            - Tests, as described in "Testing"

            h2. 1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            h2. 2. Collations for Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            h2. 3. Collations for latin1
            Unlike all other 8-bit character sets, latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            h2. 4. Collations for other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are implemented in ctype-extra.c.
            Inlike all other collations, this file is not manually written. It's generated from collation definition files in sql/share/charsets/*.xml files.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.

            We'll need the following changes:
            h3. 4a. Make collation definition loader understand a new "nopad" flag.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml can look like this (notice a new line highlighted):
            {code:xml|highlight=5}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            h3. 4b. Make collation definition loader reuse weight tables
            New _ci collations should use exactly the same weight definition (i.e. mapping from character codes to their collating weight) with their PAD SPACE counterparts.
            For example, weight table definition for greek_general_ci resides in greek.xml and looks like this:
            {code:xml}
            <collation name="greek_general_ci">
            <map>
             00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
             10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
             20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
             30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
             40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
             60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
             80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
             90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
             A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
             B0 B1 B2 B3 B4 B5 C1 B7 C5 C7 C9 BB CF BD D5 D9
             C9 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 C9 D5 C1 C5 C7 C9
             D5 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D3 D3 D4 D5 D6 D7 D8 D9 C9 D5 CF D5 D9 FF
            </map>
            </collation>
            {code}
            The easiest way would be just to copy-and-paste its definition and use the copy with a new name greek_general_nopad_ci.
            But it would be nice to avoid duplication of definitions. We'll extend ctype.c to understand references to other collations instead of explicit weight table definitions, so a new block for the "greek" character set can look like this:

            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_nopad_ci" id="xxx" flag="nopad" map="greek_general_ci"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            Notice, the "map" attribute has another collation name rather than a full weight table definition.

            h3. 4c. Add new 8-bit collations
            At this sub-step we'll use the changes made in the collation definition loader mentioned in 4a and 4b, and do the following:
            - Edit Index.xml:
            -- Add new _nopad_bin collations to all 8-bit character sets, as described in the example for "greek" in 4a (one collation per character set)
            -- Add new _nopad_ci collations to all 8-bit character sets, as described in the example for "greek" in 4b (one collation per character set)
            - Regenerate ctype-extra.c, as described in the beginning of the section 4.
            - Add tests

            MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other Unicode and Asian character sets and for latin1 are to be added using about the same steps (some minor details may differ though).
            Collations for 8-bit character sets are to be done in a different way, see section 4.

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            To simplify adding test, we'll create a shared include file, say mysql-test/include/ctype_pad.inc,
            which will be then included in all mysql-test/t/ctype_xxx.test files for individual character sets. See ctype_regex.inc as an example of such shared file.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details" (for Unicode and Asian character sets and for latin1) and in section 4 (for 8-bit character sets)
            - Tests, as described in "Testing"

            h2. 1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            h2. 2. Collations for the Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            h2. 3. Collations for latin1
            Unlike all other 8-bit character sets, latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            h2. 4. Collations for the other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are implemented in ctype-extra.c.
            Inlike all other collations, this file is not manually written. It's generated from collation definition files in sql/share/charsets/*.xml files.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.

            We'll need the following changes:
            h3. 4a. Make collation definition loader understand a new "nopad" flag.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml can look like this (notice a new line highlighted):
            {code:xml|highlight=5}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            h3. 4b. Make collation definition loader reuse weight tables
            New _ci collations should use exactly the same weight definition (i.e. mapping from character codes to their collating weight) with their PAD SPACE counterparts.
            For example, weight table definition for greek_general_ci resides in greek.xml and looks like this:
            {code:xml}
            <collation name="greek_general_ci">
            <map>
             00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
             10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
             20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
             30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
             40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
             60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
             80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
             90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
             A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
             B0 B1 B2 B3 B4 B5 C1 B7 C5 C7 C9 BB CF BD D5 D9
             C9 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 C9 D5 C1 C5 C7 C9
             D5 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D3 D3 D4 D5 D6 D7 D8 D9 C9 D5 CF D5 D9 FF
            </map>
            </collation>
            {code}
            The easiest way would be just to copy-and-paste its definition and use the copy with a new name greek_general_nopad_ci.
            But it would be nice to avoid duplication of definitions. We'll extend ctype.c to understand references to other collations instead of explicit weight table definitions, so a new block for the "greek" character set can look like this:

            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_nopad_ci" id="xxx" flag="nopad" map="greek_general_ci"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            Notice, the "map" attribute has another collation name rather than a full weight table definition.

            h3. 4c. Add new 8-bit collations
            At this sub-step we'll use the changes made in the collation definition loader mentioned in 4a and 4b, and do the following:
            - Edit Index.xml:
            -- Add new _nopad_bin collations to all 8-bit character sets, as described in the example for "greek" in 4a (one collation per character set)
            -- Add new _nopad_ci collations to all 8-bit character sets, as described in the example for "greek" in 4b (one collation per character set)
            - Regenerate ctype-extra.c, as described in the beginning of the section 4.
            - Add tests

            bar Alexander Barkov made changes -
            Description MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other Unicode and Asian character sets and for latin1 are to be added using about the same steps (some minor details may differ though).
            Collations for 8-bit character sets are to be done in a different way, see section 4.

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            To simplify adding test, we'll create a shared include file, say mysql-test/include/ctype_pad.inc,
            which will be then included in all mysql-test/t/ctype_xxx.test files for individual character sets. See ctype_regex.inc as an example of such shared file.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details" (for Unicode and Asian character sets and for latin1) and in section 4 (for 8-bit character sets)
            - Tests, as described in "Testing"

            h2. 1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            h2. 2. Collations for the Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            h2. 3. Collations for latin1
            Unlike all other 8-bit character sets, latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            h2. 4. Collations for the other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are implemented in ctype-extra.c.
            Inlike all other collations, this file is not manually written. It's generated from collation definition files in sql/share/charsets/*.xml files.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.

            We'll need the following changes:
            h3. 4a. Make collation definition loader understand a new "nopad" flag.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml can look like this (notice a new line highlighted):
            {code:xml|highlight=5}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            h3. 4b. Make collation definition loader reuse weight tables
            New _ci collations should use exactly the same weight definition (i.e. mapping from character codes to their collating weight) with their PAD SPACE counterparts.
            For example, weight table definition for greek_general_ci resides in greek.xml and looks like this:
            {code:xml}
            <collation name="greek_general_ci">
            <map>
             00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
             10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
             20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
             30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
             40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
             60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
             80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
             90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
             A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
             B0 B1 B2 B3 B4 B5 C1 B7 C5 C7 C9 BB CF BD D5 D9
             C9 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 C9 D5 C1 C5 C7 C9
             D5 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D3 D3 D4 D5 D6 D7 D8 D9 C9 D5 CF D5 D9 FF
            </map>
            </collation>
            {code}
            The easiest way would be just to copy-and-paste its definition and use the copy with a new name greek_general_nopad_ci.
            But it would be nice to avoid duplication of definitions. We'll extend ctype.c to understand references to other collations instead of explicit weight table definitions, so a new block for the "greek" character set can look like this:

            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_nopad_ci" id="xxx" flag="nopad" map="greek_general_ci"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            Notice, the "map" attribute has another collation name rather than a full weight table definition.

            h3. 4c. Add new 8-bit collations
            At this sub-step we'll use the changes made in the collation definition loader mentioned in 4a and 4b, and do the following:
            - Edit Index.xml:
            -- Add new _nopad_bin collations to all 8-bit character sets, as described in the example for "greek" in 4a (one collation per character set)
            -- Add new _nopad_ci collations to all 8-bit character sets, as described in the example for "greek" in 4b (one collation per character set)
            - Regenerate ctype-extra.c, as described in the beginning of the section 4.
            - Add tests

            MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other Unicode and Asian character sets and for latin1 are to be added using about the same steps (some minor details may differ though).
            Collations for 8-bit character sets are to be done in a different way, see section 4.

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            To simplify adding test, we'll create a shared include file, say mysql-test/include/ctype_pad.inc,
            which will be then included in all mysql-test/t/ctype_xxx.test files for individual character sets. See ctype_regex.inc as an example of such shared file.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details" (for Unicode and Asian character sets and for latin1) and in section 4 (for 8-bit character sets)
            - Tests, as described in "Testing"

            h2. 1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            h2. 2. Collations for the Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            h2. 3. Collations for latin1
            Unlike all other 8-bit character sets, latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            h2. 4. Collations for the other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are defined in ctype-extra.c.
            Inlike all other collations, this file is not manually written. It's generated from collation definition files in sql/share/charsets/*.xml files.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.

            We'll need the following changes:
            h3. 4a. Make collation definition loader understand a new "nopad" flag.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml can look like this (notice a new line highlighted):
            {code:xml|highlight=5}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            h3. 4b. Make collation definition loader reuse weight tables
            New _ci collations should use exactly the same weight definition (i.e. mapping from character codes to their collating weight) with their PAD SPACE counterparts.
            For example, weight table definition for greek_general_ci resides in greek.xml and looks like this:
            {code:xml}
            <collation name="greek_general_ci">
            <map>
             00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
             10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
             20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
             30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
             40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
             60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
             80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
             90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
             A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
             B0 B1 B2 B3 B4 B5 C1 B7 C5 C7 C9 BB CF BD D5 D9
             C9 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 C9 D5 C1 C5 C7 C9
             D5 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D3 D3 D4 D5 D6 D7 D8 D9 C9 D5 CF D5 D9 FF
            </map>
            </collation>
            {code}
            The easiest way would be just to copy-and-paste its definition and use the copy with a new name greek_general_nopad_ci.
            But it would be nice to avoid duplication of definitions. We'll extend ctype.c to understand references to other collations instead of explicit weight table definitions, so a new block for the "greek" character set can look like this:

            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_nopad_ci" id="xxx" flag="nopad" map="greek_general_ci"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            Notice, the "map" attribute has another collation name rather than a full weight table definition.

            h3. 4c. Add new 8-bit collations
            At this sub-step we'll use the changes made in the collation definition loader mentioned in 4a and 4b, and do the following:
            - Edit Index.xml:
            -- Add new _nopad_bin collations to all 8-bit character sets, as described in the example for "greek" in 4a (one collation per character set)
            -- Add new _nopad_ci collations to all 8-bit character sets, as described in the example for "greek" in 4b (one collation per character set)
            - Regenerate ctype-extra.c, as described in the beginning of the section 4.
            - Add tests

            bar Alexander Barkov made changes -
            Description MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other Unicode and Asian character sets and for latin1 are to be added using about the same steps (some minor details may differ though).
            Collations for 8-bit character sets are to be done in a different way, see section 4.

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            To simplify adding test, we'll create a shared include file, say mysql-test/include/ctype_pad.inc,
            which will be then included in all mysql-test/t/ctype_xxx.test files for individual character sets. See ctype_regex.inc as an example of such shared file.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details" (for Unicode and Asian character sets and for latin1) and in section 4 (for 8-bit character sets)
            - Tests, as described in "Testing"

            h2. 1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            h2. 2. Collations for the Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            h2. 3. Collations for latin1
            Unlike all other 8-bit character sets, latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            h2. 4. Collations for the other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are defined in ctype-extra.c.
            Inlike all other collations, this file is not manually written. It's generated from collation definition files in sql/share/charsets/*.xml files.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.

            We'll need the following changes:
            h3. 4a. Make collation definition loader understand a new "nopad" flag.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml can look like this (notice a new line highlighted):
            {code:xml|highlight=5}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            h3. 4b. Make collation definition loader reuse weight tables
            New _ci collations should use exactly the same weight definition (i.e. mapping from character codes to their collating weight) with their PAD SPACE counterparts.
            For example, weight table definition for greek_general_ci resides in greek.xml and looks like this:
            {code:xml}
            <collation name="greek_general_ci">
            <map>
             00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
             10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
             20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
             30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
             40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
             60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
             80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
             90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
             A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
             B0 B1 B2 B3 B4 B5 C1 B7 C5 C7 C9 BB CF BD D5 D9
             C9 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 C9 D5 C1 C5 C7 C9
             D5 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D3 D3 D4 D5 D6 D7 D8 D9 C9 D5 CF D5 D9 FF
            </map>
            </collation>
            {code}
            The easiest way would be just to copy-and-paste its definition and use the copy with a new name greek_general_nopad_ci.
            But it would be nice to avoid duplication of definitions. We'll extend ctype.c to understand references to other collations instead of explicit weight table definitions, so a new block for the "greek" character set can look like this:

            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_nopad_ci" id="xxx" flag="nopad" map="greek_general_ci"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            Notice, the "map" attribute has another collation name rather than a full weight table definition.

            h3. 4c. Add new 8-bit collations
            At this sub-step we'll use the changes made in the collation definition loader mentioned in 4a and 4b, and do the following:
            - Edit Index.xml:
            -- Add new _nopad_bin collations to all 8-bit character sets, as described in the example for "greek" in 4a (one collation per character set)
            -- Add new _nopad_ci collations to all 8-bit character sets, as described in the example for "greek" in 4b (one collation per character set)
            - Regenerate ctype-extra.c, as described in the beginning of the section 4.
            - Add tests

            MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other Unicode and Asian character sets and for latin1 are to be added using about the same steps (some minor details may differ though).
            Collations for 8-bit character sets are to be done in a different way, see section 4.

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            To simplify adding test, we'll create a shared include file, say mysql-test/include/ctype_pad.inc,
            which will be then included in all mysql-test/t/ctype_xxx.test files for individual character sets. See ctype_regex.inc as an example of such shared file.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details" (for Unicode and Asian character sets and for latin1) and in section 4 (for 8-bit character sets)
            - Tests, as described in "Testing"

            h2. 1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            h2. 2. Collations for the Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            h2. 3. Collations for latin1
            Unlike all other 8-bit character sets, latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            h2. 4. Collations for the other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are defined in ctype-extra.c.
            Inlike all other ctype-xxxx.c files, this file is not manually written. It's generated from collation definition files in sql/share/charsets/*.xml files.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.

            We'll need the following changes:
            h3. 4a. Make collation definition loader understand a new "nopad" flag.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml can look like this (notice a new line highlighted):
            {code:xml|highlight=5}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            h3. 4b. Make collation definition loader reuse weight tables
            New _ci collations should use exactly the same weight definition (i.e. mapping from character codes to their collating weight) with their PAD SPACE counterparts.
            For example, weight table definition for greek_general_ci resides in greek.xml and looks like this:
            {code:xml}
            <collation name="greek_general_ci">
            <map>
             00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
             10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
             20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
             30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
             40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
             60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
             80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
             90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
             A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
             B0 B1 B2 B3 B4 B5 C1 B7 C5 C7 C9 BB CF BD D5 D9
             C9 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 C9 D5 C1 C5 C7 C9
             D5 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D3 D3 D4 D5 D6 D7 D8 D9 C9 D5 CF D5 D9 FF
            </map>
            </collation>
            {code}
            The easiest way would be just to copy-and-paste its definition and use the copy with a new name greek_general_nopad_ci.
            But it would be nice to avoid duplication of definitions. We'll extend ctype.c to understand references to other collations instead of explicit weight table definitions, so a new block for the "greek" character set can look like this:

            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_nopad_ci" id="xxx" flag="nopad" map="greek_general_ci"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            Notice, the "map" attribute has another collation name rather than a full weight table definition.

            h3. 4c. Add new 8-bit collations
            At this sub-step we'll use the changes made in the collation definition loader mentioned in 4a and 4b, and do the following:
            - Edit Index.xml:
            -- Add new _nopad_bin collations to all 8-bit character sets, as described in the example for "greek" in 4a (one collation per character set)
            -- Add new _nopad_ci collations to all 8-bit character sets, as described in the example for "greek" in 4b (one collation per character set)
            - Regenerate ctype-extra.c, as described in the beginning of the section 4.
            - Add tests

            bar Alexander Barkov made changes -
            Description MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other Unicode and Asian character sets and for latin1 are to be added using about the same steps (some minor details may differ though).
            Collations for 8-bit character sets are to be done in a different way, see section 4.

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            To simplify adding test, we'll create a shared include file, say mysql-test/include/ctype_pad.inc,
            which will be then included in all mysql-test/t/ctype_xxx.test files for individual character sets. See ctype_regex.inc as an example of such shared file.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details" (for Unicode and Asian character sets and for latin1) and in section 4 (for 8-bit character sets)
            - Tests, as described in "Testing"

            h2. 1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            h2. 2. Collations for the Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            h2. 3. Collations for latin1
            Unlike all other 8-bit character sets, latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            h2. 4. Collations for the other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are defined in ctype-extra.c.
            Inlike all other ctype-xxxx.c files, this file is not manually written. It's generated from collation definition files in sql/share/charsets/*.xml files.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.

            We'll need the following changes:
            h3. 4a. Make collation definition loader understand a new "nopad" flag.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml can look like this (notice a new line highlighted):
            {code:xml|highlight=5}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            h3. 4b. Make collation definition loader reuse weight tables
            New _ci collations should use exactly the same weight definition (i.e. mapping from character codes to their collating weight) with their PAD SPACE counterparts.
            For example, weight table definition for greek_general_ci resides in greek.xml and looks like this:
            {code:xml}
            <collation name="greek_general_ci">
            <map>
             00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
             10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
             20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
             30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
             40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
             60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
             80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
             90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
             A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
             B0 B1 B2 B3 B4 B5 C1 B7 C5 C7 C9 BB CF BD D5 D9
             C9 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 C9 D5 C1 C5 C7 C9
             D5 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D3 D3 D4 D5 D6 D7 D8 D9 C9 D5 CF D5 D9 FF
            </map>
            </collation>
            {code}
            The easiest way would be just to copy-and-paste its definition and use the copy with a new name greek_general_nopad_ci.
            But it would be nice to avoid duplication of definitions. We'll extend ctype.c to understand references to other collations instead of explicit weight table definitions, so a new block for the "greek" character set can look like this:

            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_nopad_ci" id="xxx" flag="nopad" map="greek_general_ci"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            Notice, the "map" attribute has another collation name rather than a full weight table definition.

            h3. 4c. Add new 8-bit collations
            At this sub-step we'll use the changes made in the collation definition loader mentioned in 4a and 4b, and do the following:
            - Edit Index.xml:
            -- Add new _nopad_bin collations to all 8-bit character sets, as described in the example for "greek" in 4a (one collation per character set)
            -- Add new _nopad_ci collations to all 8-bit character sets, as described in the example for "greek" in 4b (one collation per character set)
            - Regenerate ctype-extra.c, as described in the beginning of the section 4.
            - Add tests

            MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other Unicode and Asian character sets and for latin1 are to be added using about the same steps (some minor details may differ though).
            Collations for 8-bit character sets are to be done in a different way, see section 4.

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            To simplify adding test, we'll create a shared include file, say mysql-test/include/ctype_pad.inc,
            which will be then included in all mysql-test/t/ctype_xxx.test files for individual character sets. See ctype_regex.inc as an example of such shared file.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details" (for Unicode and Asian character sets and for latin1) and in section 4 (for 8-bit character sets)
            - Tests, as described in "Testing"

            h2. 1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            h2. 2. Collations for the Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            h2. 3. Collations for latin1
            Unlike all other 8-bit character sets, latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            h2. 4. Collations for the other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are defined in ctype-extra.c.
            Inlike all other ctype-xxxx.c files, this file is not manually written. It's generated from collation definitions in files in sql/share/charsets/*.xml.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.

            We'll need the following changes:
            h3. 4a. Make collation definition loader understand a new "nopad" flag.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml can look like this (notice a new line highlighted):
            {code:xml|highlight=5}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            h3. 4b. Make collation definition loader reuse weight tables
            New _ci collations should use exactly the same weight definition (i.e. mapping from character codes to their collating weight) with their PAD SPACE counterparts.
            For example, weight table definition for greek_general_ci resides in greek.xml and looks like this:
            {code:xml}
            <collation name="greek_general_ci">
            <map>
             00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
             10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
             20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
             30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
             40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
             60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
             80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
             90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
             A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
             B0 B1 B2 B3 B4 B5 C1 B7 C5 C7 C9 BB CF BD D5 D9
             C9 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 C9 D5 C1 C5 C7 C9
             D5 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D3 D3 D4 D5 D6 D7 D8 D9 C9 D5 CF D5 D9 FF
            </map>
            </collation>
            {code}
            The easiest way would be just to copy-and-paste its definition and use the copy with a new name greek_general_nopad_ci.
            But it would be nice to avoid duplication of definitions. We'll extend ctype.c to understand references to other collations instead of explicit weight table definitions, so a new block for the "greek" character set can look like this:

            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_nopad_ci" id="xxx" flag="nopad" map="greek_general_ci"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            Notice, the "map" attribute has another collation name rather than a full weight table definition.

            h3. 4c. Add new 8-bit collations
            At this sub-step we'll use the changes made in the collation definition loader mentioned in 4a and 4b, and do the following:
            - Edit Index.xml:
            -- Add new _nopad_bin collations to all 8-bit character sets, as described in the example for "greek" in 4a (one collation per character set)
            -- Add new _nopad_ci collations to all 8-bit character sets, as described in the example for "greek" in 4b (one collation per character set)
            - Regenerate ctype-extra.c, as described in the beginning of the section 4.
            - Add tests

            bar Alexander Barkov made changes -
            Description MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other Unicode and Asian character sets and for latin1 are to be added using about the same steps (some minor details may differ though).
            Collations for 8-bit character sets are to be done in a different way, see section 4.

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            To simplify adding test, we'll create a shared include file, say mysql-test/include/ctype_pad.inc,
            which will be then included in all mysql-test/t/ctype_xxx.test files for individual character sets. See ctype_regex.inc as an example of such shared file.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details" (for Unicode and Asian character sets and for latin1) and in section 4 (for 8-bit character sets)
            - Tests, as described in "Testing"

            h2. 1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            h2. 2. Collations for the Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            h2. 3. Collations for latin1
            Unlike all other 8-bit character sets, latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            h2. 4. Collations for the other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are defined in ctype-extra.c.
            Inlike all other ctype-xxxx.c files, this file is not manually written. It's generated from collation definitions in files in sql/share/charsets/*.xml.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.

            We'll need the following changes:
            h3. 4a. Make collation definition loader understand a new "nopad" flag.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml can look like this (notice a new line highlighted):
            {code:xml|highlight=5}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            h3. 4b. Make collation definition loader reuse weight tables
            New _ci collations should use exactly the same weight definition (i.e. mapping from character codes to their collating weight) with their PAD SPACE counterparts.
            For example, weight table definition for greek_general_ci resides in greek.xml and looks like this:
            {code:xml}
            <collation name="greek_general_ci">
            <map>
             00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
             10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
             20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
             30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
             40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
             60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
             80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
             90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
             A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
             B0 B1 B2 B3 B4 B5 C1 B7 C5 C7 C9 BB CF BD D5 D9
             C9 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 C9 D5 C1 C5 C7 C9
             D5 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D3 D3 D4 D5 D6 D7 D8 D9 C9 D5 CF D5 D9 FF
            </map>
            </collation>
            {code}
            The easiest way would be just to copy-and-paste its definition and use the copy with a new name greek_general_nopad_ci.
            But it would be nice to avoid duplication of definitions. We'll extend ctype.c to understand references to other collations instead of explicit weight table definitions, so a new block for the "greek" character set can look like this:

            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_nopad_ci" id="xxx" flag="nopad" map="greek_general_ci"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            Notice, the "map" attribute has another collation name rather than a full weight table definition.

            h3. 4c. Add new 8-bit collations
            At this sub-step we'll use the changes made in the collation definition loader mentioned in 4a and 4b, and do the following:
            - Edit Index.xml:
            -- Add new _nopad_bin collations to all 8-bit character sets, as described in the example for "greek" in 4a (one collation per character set)
            -- Add new _nopad_ci collations to all 8-bit character sets, as described in the example for "greek" in 4b (one collation per character set)
            - Regenerate ctype-extra.c, as described in the beginning of the section 4.
            - Add tests

            MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other Unicode and Asian character sets and for latin1 are to be added using about the same steps (some minor details may differ though).
            Collations for 8-bit character sets are to be done in a different way, see section 4.

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            To simplify adding test, we'll create a shared include file, say mysql-test/include/ctype_pad.inc,
            which will be then included in all mysql-test/t/ctype_xxx.test files for individual character sets. See ctype_regex.inc as an example of such shared file.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details" (for Unicode and Asian character sets and for latin1) and in section 4 (for 8-bit character sets)
            - Tests, as described in "Testing"

            h2. 1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            h2. 2. Collations for the Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            h2. 3. Collations for latin1
            Unlike all other 8-bit character sets, latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            h2. 4. Collations for the other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are defined in ctype-extra.c.
            Inlike all other ctype-xxxx.c files, this file is not manually written. It's generated from collation definition files sql/share/charsets/*.xml.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.

            We'll need the following changes:
            h3. 4a. Make collation definition loader understand a new "nopad" flag.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml can look like this (notice a new line highlighted):
            {code:xml|highlight=5}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            h3. 4b. Make collation definition loader reuse weight tables
            New _ci collations should use exactly the same weight definition (i.e. mapping from character codes to their collating weight) with their PAD SPACE counterparts.
            For example, weight table definition for greek_general_ci resides in greek.xml and looks like this:
            {code:xml}
            <collation name="greek_general_ci">
            <map>
             00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
             10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
             20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
             30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
             40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
             60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
             80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
             90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
             A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
             B0 B1 B2 B3 B4 B5 C1 B7 C5 C7 C9 BB CF BD D5 D9
             C9 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 C9 D5 C1 C5 C7 C9
             D5 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D3 D3 D4 D5 D6 D7 D8 D9 C9 D5 CF D5 D9 FF
            </map>
            </collation>
            {code}
            The easiest way would be just to copy-and-paste its definition and use the copy with a new name greek_general_nopad_ci.
            But it would be nice to avoid duplication of definitions. We'll extend ctype.c to understand references to other collations instead of explicit weight table definitions, so a new block for the "greek" character set can look like this:

            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_nopad_ci" id="xxx" flag="nopad" map="greek_general_ci"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            Notice, the "map" attribute has another collation name rather than a full weight table definition.

            h3. 4c. Add new 8-bit collations
            At this sub-step we'll use the changes made in the collation definition loader mentioned in 4a and 4b, and do the following:
            - Edit Index.xml:
            -- Add new _nopad_bin collations to all 8-bit character sets, as described in the example for "greek" in 4a (one collation per character set)
            -- Add new _nopad_ci collations to all 8-bit character sets, as described in the example for "greek" in 4b (one collation per character set)
            - Regenerate ctype-extra.c, as described in the beginning of the section 4.
            - Add tests

            bar Alexander Barkov made changes -
            Description MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other Unicode and Asian character sets and for latin1 are to be added using about the same steps (some minor details may differ though).
            Collations for 8-bit character sets are to be done in a different way, see section 4.

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            To simplify adding test, we'll create a shared include file, say mysql-test/include/ctype_pad.inc,
            which will be then included in all mysql-test/t/ctype_xxx.test files for individual character sets. See ctype_regex.inc as an example of such shared file.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details" (for Unicode and Asian character sets and for latin1) and in section 4 (for 8-bit character sets)
            - Tests, as described in "Testing"

            h2. 1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            h2. 2. Collations for the Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            h2. 3. Collations for latin1
            Unlike all other 8-bit character sets, latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            h2. 4. Collations for the other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are defined in ctype-extra.c.
            Inlike all other ctype-xxxx.c files, this file is not manually written. It's generated from collation definition files sql/share/charsets/*.xml.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.

            We'll need the following changes:
            h3. 4a. Make collation definition loader understand a new "nopad" flag.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml can look like this (notice a new line highlighted):
            {code:xml|highlight=5}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            h3. 4b. Make collation definition loader reuse weight tables
            New _ci collations should use exactly the same weight definition (i.e. mapping from character codes to their collating weight) with their PAD SPACE counterparts.
            For example, weight table definition for greek_general_ci resides in greek.xml and looks like this:
            {code:xml}
            <collation name="greek_general_ci">
            <map>
             00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
             10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
             20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
             30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
             40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
             60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
             80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
             90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
             A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
             B0 B1 B2 B3 B4 B5 C1 B7 C5 C7 C9 BB CF BD D5 D9
             C9 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 C9 D5 C1 C5 C7 C9
             D5 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D3 D3 D4 D5 D6 D7 D8 D9 C9 D5 CF D5 D9 FF
            </map>
            </collation>
            {code}
            The easiest way would be just to copy-and-paste its definition and use the copy with a new name greek_general_nopad_ci.
            But it would be nice to avoid duplication of definitions. We'll extend ctype.c to understand references to other collations instead of explicit weight table definitions, so a new block for the "greek" character set can look like this:

            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_nopad_ci" id="xxx" flag="nopad" map="greek_general_ci"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            Notice, the "map" attribute has another collation name rather than a full weight table definition.

            h3. 4c. Add new 8-bit collations
            At this sub-step we'll use the changes made in the collation definition loader mentioned in 4a and 4b, and do the following:
            - Edit Index.xml:
            -- Add new _nopad_bin collations to all 8-bit character sets, as described in the example for "greek" in 4a (one collation per character set)
            -- Add new _nopad_ci collations to all 8-bit character sets, as described in the example for "greek" in 4b (one collation per character set)
            - Regenerate ctype-extra.c, as described in the beginning of the section 4.
            - Add tests

            MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other Unicode and Asian character sets and for latin1 are to be added using about the same steps (some minor details may differ though).
            Collations for 8-bit character sets are to be done in a different way, see section 4.

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            To simplify adding test, we'll create a shared include file, say mysql-test/include/ctype_pad.inc,
            which will be then included in all mysql-test/t/ctype_xxx.test files for individual character sets. See ctype_regex.inc as an example of such shared file.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details" (for Unicode and Asian character sets and for latin1) and in section 4 (for 8-bit character sets)
            - Tests, as described in "Testing"

            h2. 1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            h2. 2. Collations for the Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            h2. 3. Collations for latin1
            Unlike all other 8-bit character sets, latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            h2. 4. Collations for the other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are defined in ctype-extra.c.
            Inlike all other ctype-xxxx.c files, this file is not manually written. It's generated from collation definition files sql/share/charsets/*.xml.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.

            We'll need the following changes:
            h3. 4a. Make collation definition loader understand a new "nopad" flag.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml can look like this (notice a new line highlighted):
            {code:xml|highlight=5}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            h3. 4b. Make collation definition loader reuse weight tables
            New _ci collations should use exactly the same weight definition (i.e. mapping from character codes to their collating weights) with their PAD SPACE counterparts.
            For example, weight table definition for greek_general_ci resides in greek.xml and looks like this:
            {code:xml}
            <collation name="greek_general_ci">
            <map>
             00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
             10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
             20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
             30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
             40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
             60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
             80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
             90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
             A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
             B0 B1 B2 B3 B4 B5 C1 B7 C5 C7 C9 BB CF BD D5 D9
             C9 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 C9 D5 C1 C5 C7 C9
             D5 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D3 D3 D4 D5 D6 D7 D8 D9 C9 D5 CF D5 D9 FF
            </map>
            </collation>
            {code}
            The easiest way would be just to copy-and-paste its definition and use the copy with a new name greek_general_nopad_ci.
            But it would be nice to avoid duplication of definitions. We'll extend ctype.c to understand references to other collations instead of explicit weight table definitions, so a new block for the "greek" character set can look like this:

            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_nopad_ci" id="xxx" flag="nopad" map="greek_general_ci"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            Notice, the "map" attribute has another collation name rather than a full weight table definition.

            h3. 4c. Add new 8-bit collations
            At this sub-step we'll use the changes made in the collation definition loader mentioned in 4a and 4b, and do the following:
            - Edit Index.xml:
            -- Add new _nopad_bin collations to all 8-bit character sets, as described in the example for "greek" in 4a (one collation per character set)
            -- Add new _nopad_ci collations to all 8-bit character sets, as described in the example for "greek" in 4b (one collation per character set)
            - Regenerate ctype-extra.c, as described in the beginning of the section 4.
            - Add tests

            bar Alexander Barkov made changes -
            Description MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other Unicode and Asian character sets and for latin1 are to be added using about the same steps (some minor details may differ though).
            Collations for 8-bit character sets are to be done in a different way, see section 4.

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            To simplify adding test, we'll create a shared include file, say mysql-test/include/ctype_pad.inc,
            which will be then included in all mysql-test/t/ctype_xxx.test files for individual character sets. See ctype_regex.inc as an example of such shared file.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details" (for Unicode and Asian character sets and for latin1) and in section 4 (for 8-bit character sets)
            - Tests, as described in "Testing"

            h2. 1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            h2. 2. Collations for the Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            h2. 3. Collations for latin1
            Unlike all other 8-bit character sets, latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            h2. 4. Collations for the other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are defined in ctype-extra.c.
            Inlike all other ctype-xxxx.c files, this file is not manually written. It's generated from collation definition files sql/share/charsets/*.xml.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.

            We'll need the following changes:
            h3. 4a. Make collation definition loader understand a new "nopad" flag.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml can look like this (notice a new line highlighted):
            {code:xml|highlight=5}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            h3. 4b. Make collation definition loader reuse weight tables
            New _ci collations should use exactly the same weight definition (i.e. mapping from character codes to their collating weights) with their PAD SPACE counterparts.
            For example, weight table definition for greek_general_ci resides in greek.xml and looks like this:
            {code:xml}
            <collation name="greek_general_ci">
            <map>
             00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
             10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
             20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
             30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
             40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
             60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
             80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
             90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
             A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
             B0 B1 B2 B3 B4 B5 C1 B7 C5 C7 C9 BB CF BD D5 D9
             C9 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 C9 D5 C1 C5 C7 C9
             D5 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D3 D3 D4 D5 D6 D7 D8 D9 C9 D5 CF D5 D9 FF
            </map>
            </collation>
            {code}
            The easiest way would be just to copy-and-paste its definition and use the copy with a new name greek_general_nopad_ci.
            But it would be nice to avoid duplication of definitions. We'll extend ctype.c to understand references to other collations instead of explicit weight table definitions, so a new block for the "greek" character set can look like this:

            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_nopad_ci" id="xxx" flag="nopad" map="greek_general_ci"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            Notice, the "map" attribute has another collation name rather than a full weight table definition.

            h3. 4c. Add new 8-bit collations
            At this sub-step we'll use the changes made in the collation definition loader mentioned in 4a and 4b, and do the following:
            - Edit Index.xml:
            -- Add new _nopad_bin collations to all 8-bit character sets, as described in the example for "greek" in 4a (one collation per character set)
            -- Add new _nopad_ci collations to all 8-bit character sets, as described in the example for "greek" in 4b (one collation per character set)
            - Regenerate ctype-extra.c, as described in the beginning of the section 4.
            - Add tests

            MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other Unicode and Asian character sets and for latin1 are to be added using about the same steps (some minor details may differ though).
            Collations for 8-bit character sets are to be done in a different way, see section 4.

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            To simplify adding test, we'll create a shared include file, say mysql-test/include/ctype_pad.inc,
            which will be then included in all mysql-test/t/ctype_xxx.test files for individual character sets. See ctype_regex.inc as an example of such shared file.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details" (for Unicode and Asian character sets and for latin1) and in section 4 (for 8-bit character sets)
            - Tests, as described in "Testing"

            h2. 1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            h2. 2. Collations for the Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            h2. 3. Collations for latin1
            Unlike all other 8-bit character sets, latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            h2. 4. Collations for the other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are defined in ctype-extra.c.
            Inlike all other ctype-xxxx.c files, this file is not manually written. It's generated from collation definition files sql/share/charsets/*.xml.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.

            We'll need the following changes:
            h3. 4a. Make collation definition loader understand a new "nopad" flag.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml can look like this (notice a new line highlighted):
            {code:xml|highlight=5}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            h3. 4b. Make collation definition loader reuse weight tables
            New _ci collations should use exactly the same collating weights with their PAD SPACE counterparts.
            For example, a collating weight table for greek_general_ci resides in greek.xml and looks like this:
            {code:xml}
            <collation name="greek_general_ci">
            <map>
             00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
             10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
             20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
             30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
             40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
             60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
             80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
             90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
             A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
             B0 B1 B2 B3 B4 B5 C1 B7 C5 C7 C9 BB CF BD D5 D9
             C9 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 C9 D5 C1 C5 C7 C9
             D5 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D3 D3 D4 D5 D6 D7 D8 D9 C9 D5 CF D5 D9 FF
            </map>
            </collation>
            {code}
            It consists of 256 weights, one weight per code. The first weight corresponds the character with the code 0x00, the last weight corresponds to the character with the code 0xFF.

            The easiest way would be just to copy-and-paste its definition and use the copy with a new name greek_general_nopad_ci.
            But it would be nice to avoid duplication of definitions. We'll extend ctype.c to understand references to other collations instead of explicit weight table definitions, so a new block for the "greek" character set can look like this:

            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_nopad_ci" id="xxx" flag="nopad" map="greek_general_ci"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            Notice, the "map" attribute has another collation name rather than a full weight table definition.

            h3. 4c. Add new 8-bit collations
            At this sub-step we'll use the changes made in the collation definition loader mentioned in 4a and 4b, and do the following:
            - Edit Index.xml:
            -- Add new _nopad_bin collations to all 8-bit character sets, as described in the example for "greek" in 4a (one collation per character set)
            -- Add new _nopad_ci collations to all 8-bit character sets, as described in the example for "greek" in 4b (one collation per character set)
            - Regenerate ctype-extra.c, as described in the beginning of the section 4.
            - Add tests

            bar Alexander Barkov made changes -
            Description MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other Unicode and Asian character sets and for latin1 are to be added using about the same steps (some minor details may differ though).
            Collations for 8-bit character sets are to be done in a different way, see section 4.

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            To simplify adding test, we'll create a shared include file, say mysql-test/include/ctype_pad.inc,
            which will be then included in all mysql-test/t/ctype_xxx.test files for individual character sets. See ctype_regex.inc as an example of such shared file.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details" (for Unicode and Asian character sets and for latin1) and in section 4 (for 8-bit character sets)
            - Tests, as described in "Testing"

            h2. 1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            h2. 2. Collations for the Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            h2. 3. Collations for latin1
            Unlike all other 8-bit character sets, latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            h2. 4. Collations for the other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are defined in ctype-extra.c.
            Inlike all other ctype-xxxx.c files, this file is not manually written. It's generated from collation definition files sql/share/charsets/*.xml.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.

            We'll need the following changes:
            h3. 4a. Make collation definition loader understand a new "nopad" flag.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml can look like this (notice a new line highlighted):
            {code:xml|highlight=5}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            h3. 4b. Make collation definition loader reuse weight tables
            New _ci collations should use exactly the same collating weights with their PAD SPACE counterparts.
            For example, a collating weight table for greek_general_ci resides in greek.xml and looks like this:
            {code:xml}
            <collation name="greek_general_ci">
            <map>
             00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
             10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
             20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
             30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
             40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
             60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
             80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
             90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
             A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
             B0 B1 B2 B3 B4 B5 C1 B7 C5 C7 C9 BB CF BD D5 D9
             C9 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 C9 D5 C1 C5 C7 C9
             D5 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D3 D3 D4 D5 D6 D7 D8 D9 C9 D5 CF D5 D9 FF
            </map>
            </collation>
            {code}
            It consists of 256 weights, one weight per code. The first weight corresponds the character with the code 0x00, the last weight corresponds to the character with the code 0xFF.

            The easiest way would be just to copy-and-paste its definition and use the copy with a new name greek_general_nopad_ci.
            But it would be nice to avoid duplication of definitions. We'll extend ctype.c to understand references to other collations instead of explicit weight table definitions, so a new block for the "greek" character set can look like this:

            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_nopad_ci" id="xxx" flag="nopad" map="greek_general_ci"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            Notice, the "map" attribute has another collation name rather than a full weight table definition.

            h3. 4c. Add new 8-bit collations
            At this sub-step we'll use the changes made in the collation definition loader mentioned in 4a and 4b, and do the following:
            - Edit Index.xml:
            -- Add new _nopad_bin collations to all 8-bit character sets, as described in the example for "greek" in 4a (one collation per character set)
            -- Add new _nopad_ci collations to all 8-bit character sets, as described in the example for "greek" in 4b (one collation per character set)
            - Regenerate ctype-extra.c, as described in the beginning of the section 4.
            - Add tests

            MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other Unicode and Asian character sets and for latin1 are to be added using about the same steps (some minor details may differ though).
            Collations for 8-bit character sets are to be done in a different way, see section 4.

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            To simplify adding test, we'll create a shared include file, say mysql-test/include/ctype_pad.inc,
            which will be then included in all mysql-test/t/ctype_xxx.test files for individual character sets. See ctype_regex.inc as an example of such shared file.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details" (for Unicode and Asian character sets and for latin1) and in section 4 (for 8-bit character sets)
            - Tests, as described in "Testing"

            h2. 1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            h2. 2. Collations for the Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            h2. 3. Collations for latin1
            Unlike all other 8-bit character sets, latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            h2. 4. Collations for the other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are defined in ctype-extra.c.
            Inlike all other ctype-xxxx.c files, this file is not manually written. It's generated from collation definition files sql/share/charsets/*.xml.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.

            We'll need the following changes:
            h3. 4a. Make collation definition loader understand a new "nopad" flag.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml can look like this (notice a new line highlighted):
            {code:xml|highlight=5}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            h3. 4b. Make collation definition loader reuse weight tables
            New _ci collations should use exactly the same collating weights with their PAD SPACE counterparts.
            For example, a collating weight table for greek_general_ci resides in greek.xml and looks like this:
            {code:xml}
            <collation name="greek_general_ci">
            <map>
             00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
             10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
             20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
             30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
             40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
             60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
             80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
             90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
             A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
             B0 B1 B2 B3 B4 B5 C1 B7 C5 C7 C9 BB CF BD D5 D9
             C9 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 C9 D5 C1 C5 C7 C9
             D5 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D3 D3 D4 D5 D6 D7 D8 D9 C9 D5 CF D5 D9 FF
            </map>
            </collation>
            {code}
            It consists of 256 weights, one weight per code. The first weight corresponds the character with the code 0x00, the last weight corresponds to the character with the code 0xFF.
            Notice, the characters 0x41 and 0x61 (Latin letters 'a' and 'A') have the same weight of 0x41. This makes the collation case insensitive.

            The easiest way would be just to copy-and-paste its definition and use the copy with a new name greek_general_nopad_ci.
            But it would be nice to avoid duplication of definitions. We'll extend ctype.c to understand references to other collations instead of explicit weight table definitions, so a new block for the "greek" character set can look like this:

            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_nopad_ci" id="xxx" flag="nopad" map="greek_general_ci"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            Notice, the "map" attribute has another collation name rather than a full weight table definition.

            h3. 4c. Add new 8-bit collations
            At this sub-step we'll use the changes made in the collation definition loader mentioned in 4a and 4b, and do the following:
            - Edit Index.xml:
            -- Add new _nopad_bin collations to all 8-bit character sets, as described in the example for "greek" in 4a (one collation per character set)
            -- Add new _nopad_ci collations to all 8-bit character sets, as described in the example for "greek" in 4b (one collation per character set)
            - Regenerate ctype-extra.c, as described in the beginning of the section 4.
            - Add tests

            bar Alexander Barkov made changes -
            Description MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other Unicode and Asian character sets and for latin1 are to be added using about the same steps (some minor details may differ though).
            Collations for 8-bit character sets are to be done in a different way, see section 4.

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            To simplify adding test, we'll create a shared include file, say mysql-test/include/ctype_pad.inc,
            which will be then included in all mysql-test/t/ctype_xxx.test files for individual character sets. See ctype_regex.inc as an example of such shared file.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details" (for Unicode and Asian character sets and for latin1) and in section 4 (for 8-bit character sets)
            - Tests, as described in "Testing"

            h2. 1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            h2. 2. Collations for the Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            h2. 3. Collations for latin1
            Unlike all other 8-bit character sets, latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            h2. 4. Collations for the other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are defined in ctype-extra.c.
            Inlike all other ctype-xxxx.c files, this file is not manually written. It's generated from collation definition files sql/share/charsets/*.xml.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.

            We'll need the following changes:
            h3. 4a. Make collation definition loader understand a new "nopad" flag.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml can look like this (notice a new line highlighted):
            {code:xml|highlight=5}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            h3. 4b. Make collation definition loader reuse weight tables
            New _ci collations should use exactly the same collating weights with their PAD SPACE counterparts.
            For example, a collating weight table for greek_general_ci resides in greek.xml and looks like this:
            {code:xml}
            <collation name="greek_general_ci">
            <map>
             00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
             10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
             20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
             30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
             40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
             60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
             80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
             90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
             A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
             B0 B1 B2 B3 B4 B5 C1 B7 C5 C7 C9 BB CF BD D5 D9
             C9 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 C9 D5 C1 C5 C7 C9
             D5 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D3 D3 D4 D5 D6 D7 D8 D9 C9 D5 CF D5 D9 FF
            </map>
            </collation>
            {code}
            It consists of 256 weights, one weight per code. The first weight corresponds the character with the code 0x00, the last weight corresponds to the character with the code 0xFF.
            Notice, the characters 0x41 and 0x61 (Latin letters 'a' and 'A') have the same weight of 0x41. This makes the collation case insensitive.

            The easiest way would be just to copy-and-paste its definition and use the copy with a new name greek_general_nopad_ci.
            But it would be nice to avoid duplication of definitions. We'll extend ctype.c to understand references to other collations instead of explicit weight table definitions, so a new block for the "greek" character set can look like this:

            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_nopad_ci" id="xxx" flag="nopad" map="greek_general_ci"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            Notice, the "map" attribute has another collation name rather than a full weight table definition.

            h3. 4c. Add new 8-bit collations
            At this sub-step we'll use the changes made in the collation definition loader mentioned in 4a and 4b, and do the following:
            - Edit Index.xml:
            -- Add new _nopad_bin collations to all 8-bit character sets, as described in the example for "greek" in 4a (one collation per character set)
            -- Add new _nopad_ci collations to all 8-bit character sets, as described in the example for "greek" in 4b (one collation per character set)
            - Regenerate ctype-extra.c, as described in the beginning of the section 4.
            - Add tests

            MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other Unicode and Asian character sets and for latin1 are to be added using about the same steps (some minor details may differ though).
            Collations for 8-bit character sets are to be done in a different way, see section 4.

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            To simplify adding test, we'll create a shared include file, say mysql-test/include/ctype_pad.inc,
            which will be then included in all mysql-test/t/ctype_xxx.test files for individual character sets. See ctype_regex.inc as an example of such shared file.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details" (for Unicode and Asian character sets and for latin1) and in section 4 (for 8-bit character sets)
            - Tests, as described in "Testing"

            h2. 1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            h2. 2. Collations for the Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            h2. 3. Collations for latin1
            Unlike all other 8-bit character sets, latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            h2. 4. Collations for the other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are defined in ctype-extra.c.
            Inlike all other ctype-xxxx.c files, this file is not manually written. It's generated from collation definition files sql/share/charsets/*.xml.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.

            We'll need the following changes:
            h3. 4a. Make collation definition loader understand a new "nopad" flag.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml can look like this (notice a new line highlighted):
            {code:xml|highlight=5}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            h3. 4b. Make collation definition loader reuse weight tables
            New _ci collations should use exactly the same collating weights with their PAD SPACE counterparts.
            For example, a collating weight table for greek_general_ci resides in greek.xml and looks like this:
            {code:xml}
            <collation name="greek_general_ci">
            <map>
             00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
             10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
             20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
             30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
             40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
             60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
             80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
             90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
             A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
             B0 B1 B2 B3 B4 B5 C1 B7 C5 C7 C9 BB CF BD D5 D9
             C9 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 C9 D5 C1 C5 C7 C9
             D5 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D3 D3 D4 D5 D6 D7 D8 D9 C9 D5 CF D5 D9 FF
            </map>
            </collation>
            {code}
            It consists of 256 weights, one weight per code. The first weight corresponds the character with the code 0x00, the last weight corresponds to the character with the code 0xFF.
            Notice, the characters 0x41 and 0x61 (Latin letters 'a' and 'A') have the same weight of 0x41. This makes the collation case insensitive.

            The easiest way would be just to copy-and-paste the weight table definition and use the copy with a new name greek_general_nopad_ci.
            But it would be nice to avoid duplication of definitions. We'll extend ctype.c to understand references to other collations instead of explicit weight table definitions, so a new block for the "greek" character set can look like this:

            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_nopad_ci" id="xxx" flag="nopad" map="greek_general_ci"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            Notice, the "map" attribute has another collation name rather than a full weight table definition.

            h3. 4c. Add new 8-bit collations
            At this sub-step we'll use the changes made in the collation definition loader mentioned in 4a and 4b, and do the following:
            - Edit Index.xml:
            -- Add new _nopad_bin collations to all 8-bit character sets, as described in the example for "greek" in 4a (one collation per character set)
            -- Add new _nopad_ci collations to all 8-bit character sets, as described in the example for "greek" in 4b (one collation per character set)
            - Regenerate ctype-extra.c, as described in the beginning of the section 4.
            - Add tests

            bar Alexander Barkov made changes -
            Description MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other Unicode and Asian character sets and for latin1 are to be added using about the same steps (some minor details may differ though).
            Collations for 8-bit character sets are to be done in a different way, see section 4.

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            To simplify adding test, we'll create a shared include file, say mysql-test/include/ctype_pad.inc,
            which will be then included in all mysql-test/t/ctype_xxx.test files for individual character sets. See ctype_regex.inc as an example of such shared file.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details" (for Unicode and Asian character sets and for latin1) and in section 4 (for 8-bit character sets)
            - Tests, as described in "Testing"

            h2. 1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            h2. 2. Collations for the Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            h2. 3. Collations for latin1
            Unlike all other 8-bit character sets, latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            h2. 4. Collations for the other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are defined in ctype-extra.c.
            Inlike all other ctype-xxxx.c files, this file is not manually written. It's generated from collation definition files sql/share/charsets/*.xml.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.

            We'll need the following changes:
            h3. 4a. Make collation definition loader understand a new "nopad" flag.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml can look like this (notice a new line highlighted):
            {code:xml|highlight=5}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            h3. 4b. Make collation definition loader reuse weight tables
            New _ci collations should use exactly the same collating weights with their PAD SPACE counterparts.
            For example, a collating weight table for greek_general_ci resides in greek.xml and looks like this:
            {code:xml}
            <collation name="greek_general_ci">
            <map>
             00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
             10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
             20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
             30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
             40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
             60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
             80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
             90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
             A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
             B0 B1 B2 B3 B4 B5 C1 B7 C5 C7 C9 BB CF BD D5 D9
             C9 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 C9 D5 C1 C5 C7 C9
             D5 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D3 D3 D4 D5 D6 D7 D8 D9 C9 D5 CF D5 D9 FF
            </map>
            </collation>
            {code}
            It consists of 256 weights, one weight per code. The first weight corresponds the character with the code 0x00, the last weight corresponds to the character with the code 0xFF.
            Notice, the characters 0x41 and 0x61 (Latin letters 'a' and 'A') have the same weight of 0x41. This makes the collation case insensitive.

            The easiest way would be just to copy-and-paste the weight table definition and use the copy with a new name greek_general_nopad_ci.
            But it would be nice to avoid duplication of definitions. We'll extend ctype.c to understand references to other collations instead of explicit weight table definitions, so a new block for the "greek" character set can look like this:

            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_nopad_ci" id="xxx" flag="nopad" map="greek_general_ci"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            Notice, the "map" attribute has another collation name rather than a full weight table definition.

            h3. 4c. Add new 8-bit collations
            At this sub-step we'll use the changes made in the collation definition loader mentioned in 4a and 4b, and do the following:
            - Edit Index.xml:
            -- Add new _nopad_bin collations to all 8-bit character sets, as described in the example for "greek" in 4a (one collation per character set)
            -- Add new _nopad_ci collations to all 8-bit character sets, as described in the example for "greek" in 4b (one collation per character set)
            - Regenerate ctype-extra.c, as described in the beginning of the section 4.
            - Add tests

            MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other Unicode and Asian character sets and for latin1 are to be added using about the same steps (some minor details may differ though).
            Collations for 8-bit character sets are to be done in a different way, see section 4.

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            To simplify adding test, we'll create a shared include file, say mysql-test/include/ctype_pad.inc,
            which will be then included in all mysql-test/t/ctype_xxx.test files for individual character sets. See ctype_regex.inc as an example of such shared file.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details" (for Unicode and Asian character sets and for latin1) and in section 4 (for 8-bit character sets)
            - Tests, as described in "Testing"

            h2. 1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            h2. 2. Collations for the Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            h2. 3. Collations for latin1
            Unlike all other 8-bit character sets, latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            h2. 4. Collations for the other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are defined in ctype-extra.c.
            Inlike all other ctype-xxxx.c files, this file is not manually written. It's generated from collation definition files sql/share/charsets/*.xml.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.

            We'll need the following changes:
            h3. 4a. Make collation definition loader understand a new "nopad" flag.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml can look like this (notice a new line highlighted):
            {code:xml|highlight=5}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            h3. 4b. Make collation definition loader reuse weight tables
            New _ci collations should use exactly the same collating weights with their PAD SPACE counterparts.
            For example, a collating weight table for greek_general_ci resides in greek.xml and looks like this:
            {code:xml}
            <collation name="greek_general_ci">
            <map>
             00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
             10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
             20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
             30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
             40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
             60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
             80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
             90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
             A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
             B0 B1 B2 B3 B4 B5 C1 B7 C5 C7 C9 BB CF BD D5 D9
             C9 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 C9 D5 C1 C5 C7 C9
             D5 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D3 D3 D4 D5 D6 D7 D8 D9 C9 D5 CF D5 D9 FF
            </map>
            </collation>
            {code}
            It consists of 256 weights, one weight per code. The first weight corresponds the character with the code 0x00, the last weight corresponds to the character with the code 0xFF.
            Notice, the characters 0x41 and 0x61 (Latin letters 'a' and 'A') have the same weight of 0x41. This makes the collation case insensitive.

            The easiest way would be just to copy-and-paste the weight table definition and use the copy with a new name greek_general_nopad_ci.
            But it would be nice to avoid duplication of definitions. We'll extend ctype.c to understand references to other collations instead of explicit weight table definitions, so a new block for the "greek" character set can look like this:

            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_nopad_ci" id="xxx" flag="nopad" map="greek_general_ci"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            Notice, the "map" attribute has another collation name rather than a full weight table definition.

            h3. 4c. Add new 8-bit collations
            At this sub-step we'll use the changes made in the collation definition loader mentioned in 4a and 4b, and do the following:
            - Edit Index.xml:
            -- Add new _nopad_bin collations to all 8-bit character sets, as described in the example for "greek" in 4a (one collation per character set)
            -- Add new _nopad_ci collations to all 8-bit character sets, as described in the example for "greek" in 4b (one collation per character set)
            - Extend this block in conf_to_src.c:
            {code:c}
              if (cs->state & MY_CS_BINSORT)
                fprintf(f," &my_collation_8bit_bin_handler,\n");
              else
                fprintf(f," &my_collation_8bit_simple_ci_handler,\n");
            {code}
            to check the MY_CS_NOPAD flag and print my_collation_8bit_nopad_bin_handler and my_collation_8bit_simple_nopad_ci_handler (which we added at step 3).
             

            - Regenerate ctype-extra.c, as described in the beginning of the section 4.
            - Add tests

            bar Alexander Barkov made changes -
            Description MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other Unicode and Asian character sets and for latin1 are to be added using about the same steps (some minor details may differ though).
            Collations for 8-bit character sets are to be done in a different way, see section 4.

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            To simplify adding test, we'll create a shared include file, say mysql-test/include/ctype_pad.inc,
            which will be then included in all mysql-test/t/ctype_xxx.test files for individual character sets. See ctype_regex.inc as an example of such shared file.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details" (for Unicode and Asian character sets and for latin1) and in section 4 (for 8-bit character sets)
            - Tests, as described in "Testing"

            h2. 1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            h2. 2. Collations for the Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            h2. 3. Collations for latin1
            Unlike all other 8-bit character sets, latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            h2. 4. Collations for the other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are defined in ctype-extra.c.
            Inlike all other ctype-xxxx.c files, this file is not manually written. It's generated from collation definition files sql/share/charsets/*.xml.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.

            We'll need the following changes:
            h3. 4a. Make collation definition loader understand a new "nopad" flag.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml can look like this (notice a new line highlighted):
            {code:xml|highlight=5}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            h3. 4b. Make collation definition loader reuse weight tables
            New _ci collations should use exactly the same collating weights with their PAD SPACE counterparts.
            For example, a collating weight table for greek_general_ci resides in greek.xml and looks like this:
            {code:xml}
            <collation name="greek_general_ci">
            <map>
             00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
             10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
             20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
             30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
             40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
             60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
             80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
             90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
             A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
             B0 B1 B2 B3 B4 B5 C1 B7 C5 C7 C9 BB CF BD D5 D9
             C9 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 C9 D5 C1 C5 C7 C9
             D5 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D3 D3 D4 D5 D6 D7 D8 D9 C9 D5 CF D5 D9 FF
            </map>
            </collation>
            {code}
            It consists of 256 weights, one weight per code. The first weight corresponds the character with the code 0x00, the last weight corresponds to the character with the code 0xFF.
            Notice, the characters 0x41 and 0x61 (Latin letters 'a' and 'A') have the same weight of 0x41. This makes the collation case insensitive.

            The easiest way would be just to copy-and-paste the weight table definition and use the copy with a new name greek_general_nopad_ci.
            But it would be nice to avoid duplication of definitions. We'll extend ctype.c to understand references to other collations instead of explicit weight table definitions, so a new block for the "greek" character set can look like this:

            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_nopad_ci" id="xxx" flag="nopad" map="greek_general_ci"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            Notice, the "map" attribute has another collation name rather than a full weight table definition.

            h3. 4c. Add new 8-bit collations
            At this sub-step we'll use the changes made in the collation definition loader mentioned in 4a and 4b, and do the following:
            - Edit Index.xml:
            -- Add new _nopad_bin collations to all 8-bit character sets, as described in the example for "greek" in 4a (one collation per character set)
            -- Add new _nopad_ci collations to all 8-bit character sets, as described in the example for "greek" in 4b (one collation per character set)
            - Extend this block in conf_to_src.c:
            {code:c}
              if (cs->state & MY_CS_BINSORT)
                fprintf(f," &my_collation_8bit_bin_handler,\n");
              else
                fprintf(f," &my_collation_8bit_simple_ci_handler,\n");
            {code}
            to check the MY_CS_NOPAD flag and print my_collation_8bit_nopad_bin_handler and my_collation_8bit_simple_nopad_ci_handler (which we added at step 3).
             

            - Regenerate ctype-extra.c, as described in the beginning of the section 4.
            - Add tests

            MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other Unicode and Asian character sets and for latin1 are to be added using about the same steps (some minor details may differ though).
            Collations for 8-bit character sets are to be done in a different way, see section 4.

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            To simplify adding test, we'll create a shared include file, say mysql-test/include/ctype_pad.inc,
            which will be then included in all mysql-test/t/ctype_xxx.test files for individual character sets. See ctype_regex.inc as an example of such shared file.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details" (for Unicode and Asian character sets and for latin1) and in section 4 (for 8-bit character sets)
            - Tests, as described in "Testing"

            h2. 1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            h2. 2. Collations for the Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            h2. 3. Collations for latin1
            Unlike all other 8-bit character sets, latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            h2. 4. Collations for the other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are defined in ctype-extra.c.
            Inlike all other ctype-xxxx.c files, this file is not manually written. It's generated from collation definition files sql/share/charsets/*.xml.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.

            We'll need the following changes:
            h3. 4a. Make collation definition loader understand a new "nopad" flag.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml can look like this (notice a new line highlighted):
            {code:xml|highlight=5}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            h3. 4b. Make collation definition loader reuse weight tables
            New _ci collations should use exactly the same collating weights with their PAD SPACE counterparts.
            For example, a collating weight table for greek_general_ci resides in greek.xml and looks like this:
            {code:xml}
            <collation name="greek_general_ci">
            <map>
             00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
             10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
             20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
             30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
             40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
             60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
             80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
             90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
             A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
             B0 B1 B2 B3 B4 B5 C1 B7 C5 C7 C9 BB CF BD D5 D9
             C9 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 C9 D5 C1 C5 C7 C9
             D5 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D3 D3 D4 D5 D6 D7 D8 D9 C9 D5 CF D5 D9 FF
            </map>
            </collation>
            {code}
            It consists of 256 weights, one weight per code. The first weight corresponds the character with the code 0x00, the last weight corresponds to the character with the code 0xFF.
            Notice, the characters 0x41 and 0x61 (Latin letters 'a' and 'A') have the same weight of 0x41. This makes the collation case insensitive.

            The easiest way would be just to copy-and-paste the weight table definition and use the copy with a new name greek_general_nopad_ci.
            But it would be nice to avoid duplication of definitions. We'll extend ctype.c to understand references to other collations instead of explicit weight table definitions, so a new block for the "greek" character set can look like this:

            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_nopad_ci" id="xxx" flag="nopad" map="greek_general_ci"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            Notice, the "map" attribute has another collation name rather than a full weight table definition.

            h3. 4c. Add new 8-bit collations
            At this sub-step we'll use the changes made in the collation definition loader mentioned in 4a and 4b, and do the following:
            - Edit Index.xml:
            -- Add new _nopad_bin collations to all 8-bit character sets, as described in the example for "greek" in 4a (one collation per character set)
            -- Add new _nopad_ci collations to all 8-bit character sets, as described in the example for "greek" in 4b (one collation per character set)
            - Extend this block in conf_to_src.c:
            {code:c}
              if (cs->state & MY_CS_BINSORT)
                fprintf(f," &my_collation_8bit_bin_handler,\n");
              else
                fprintf(f," &my_collation_8bit_simple_ci_handler,\n");
            {code}
            to check the MY_CS_NOPAD flag and print *my_collation_8bit_nopad_bin_handler* and *my_collation_8bit_simple_nopad_ci_handler* (which we added at step 3).
             

            - Regenerate ctype-extra.c, as described in the beginning of the section 4.
            - Add tests

            bar Alexander Barkov made changes -
            Description MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other Unicode and Asian character sets and for latin1 are to be added using about the same steps (some minor details may differ though).
            Collations for 8-bit character sets are to be done in a different way, see section 4.

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            To simplify adding test, we'll create a shared include file, say mysql-test/include/ctype_pad.inc,
            which will be then included in all mysql-test/t/ctype_xxx.test files for individual character sets. See ctype_regex.inc as an example of such shared file.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details" (for Unicode and Asian character sets and for latin1) and in section 4 (for 8-bit character sets)
            - Tests, as described in "Testing"

            h2. 1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            h2. 2. Collations for the Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            h2. 3. Collations for latin1
            Unlike all other 8-bit character sets, latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            h2. 4. Collations for the other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are defined in ctype-extra.c.
            Inlike all other ctype-xxxx.c files, this file is not manually written. It's generated from collation definition files sql/share/charsets/*.xml.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.

            We'll need the following changes:
            h3. 4a. Make collation definition loader understand a new "nopad" flag.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml can look like this (notice a new line highlighted):
            {code:xml|highlight=5}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            h3. 4b. Make collation definition loader reuse weight tables
            New _ci collations should use exactly the same collating weights with their PAD SPACE counterparts.
            For example, a collating weight table for greek_general_ci resides in greek.xml and looks like this:
            {code:xml}
            <collation name="greek_general_ci">
            <map>
             00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
             10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
             20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
             30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
             40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
             60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
             80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
             90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
             A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
             B0 B1 B2 B3 B4 B5 C1 B7 C5 C7 C9 BB CF BD D5 D9
             C9 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 C9 D5 C1 C5 C7 C9
             D5 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D3 D3 D4 D5 D6 D7 D8 D9 C9 D5 CF D5 D9 FF
            </map>
            </collation>
            {code}
            It consists of 256 weights, one weight per code. The first weight corresponds the character with the code 0x00, the last weight corresponds to the character with the code 0xFF.
            Notice, the characters 0x41 and 0x61 (Latin letters 'a' and 'A') have the same weight of 0x41. This makes the collation case insensitive.

            The easiest way would be just to copy-and-paste the weight table definition and use the copy with a new name greek_general_nopad_ci.
            But it would be nice to avoid duplication of definitions. We'll extend ctype.c to understand references to other collations instead of explicit weight table definitions, so a new block for the "greek" character set can look like this:

            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_nopad_ci" id="xxx" flag="nopad" map="greek_general_ci"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            Notice, the "map" attribute has another collation name rather than a full weight table definition.

            h3. 4c. Add new 8-bit collations
            At this sub-step we'll use the changes made in the collation definition loader mentioned in 4a and 4b, and do the following:
            - Edit Index.xml:
            -- Add new _nopad_bin collations to all 8-bit character sets, as described in the example for "greek" in 4a (one collation per character set)
            -- Add new _nopad_ci collations to all 8-bit character sets, as described in the example for "greek" in 4b (one collation per character set)
            - Extend this block in conf_to_src.c:
            {code:c}
              if (cs->state & MY_CS_BINSORT)
                fprintf(f," &my_collation_8bit_bin_handler,\n");
              else
                fprintf(f," &my_collation_8bit_simple_ci_handler,\n");
            {code}
            to check the MY_CS_NOPAD flag and print *my_collation_8bit_nopad_bin_handler* and *my_collation_8bit_simple_nopad_ci_handler* (which we added at step 3).
             

            - Regenerate ctype-extra.c, as described in the beginning of the section 4.
            - Add tests

            MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other Unicode and Asian character sets and for latin1 are to be added using about the same steps (some minor details may differ though).
            Collations for 8-bit character sets are to be done in a different way, see section 4.

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            To simplify adding test, we'll create a shared include file, say mysql-test/include/ctype_pad.inc,
            which will be then included in all mysql-test/t/ctype_xxx.test files for individual character sets. See ctype_regex.inc as an example of such shared file.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details" (for Unicode and Asian character sets and for latin1) and in section 4 (for 8-bit character sets)
            - Tests, as described in "Testing"

            h2. 1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            h2. 2. Collations for the Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            h2. 3. Collations for latin1
            Unlike all other 8-bit character sets, latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            h2. 4. Collations for the other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are defined in ctype-extra.c.
            Inlike all other ctype-xxxx.c files, this file is not manually written. It's generated from collation definition files sql/share/charsets/*.xml.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.

            We'll need the following changes:
            h3. 4a. Make collation definition loader understand a new "nopad" flag.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml can look like this (notice a new line highlighted):
            {code:xml|highlight=5}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            h3. 4b. Make collation definition loader reuse weight tables
            New _ci collations should use exactly the same collating weights with their PAD SPACE counterparts.
            For example, a collating weight table for greek_general_ci resides in greek.xml and looks like this:
            {code:xml}
            <collation name="greek_general_ci">
            <map>
             00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
             10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
             20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
             30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
             40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
             60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
             80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
             90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
             A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
             B0 B1 B2 B3 B4 B5 C1 B7 C5 C7 C9 BB CF BD D5 D9
             C9 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 C9 D5 C1 C5 C7 C9
             D5 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D3 D3 D4 D5 D6 D7 D8 D9 C9 D5 CF D5 D9 FF
            </map>
            </collation>
            {code}
            It consists of 256 weights, one weight per code. The first weight corresponds the character with the code 0x00, the last weight corresponds to the character with the code 0xFF.
            Notice, the characters 0x41 and 0x61 (Latin letters 'a' and 'A') have the same weight of 0x41. This makes the collation case insensitive.

            The easiest way would be just to copy-and-paste the weight table definition and use the copy with a new name greek_general_nopad_ci.
            But it would be nice to avoid duplication of definitions. We'll extend ctype.c to understand references to other collations instead of explicit weight table definitions, so a new block for the "greek" character set can look like this:

            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_nopad_ci" id="xxx" flag="nopad" map="greek_general_ci"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            Notice, the "map" attribute has another collation name rather than a full weight table definition.

            h3. 4c. Add new 8-bit collations
            At this sub-step we'll use the changes made in the collation definition loader mentioned in 4a and 4b, and do the following:
            - Edit Index.xml:
            -- Add new _nopad_bin collations to all 8-bit character sets, as described in the example for "greek" in 4a (one collation per character set)
            -- Add new _nopad_ci collations to all 8-bit character sets, as described in the example for "greek" in 4b (one collation per character set)
            - Extend this block in conf_to_src.c:
            {code:c}
              if (cs->state & MY_CS_BINSORT)
                fprintf(f," &my_collation_8bit_bin_handler,\n");
              else
                fprintf(f," &my_collation_8bit_simple_ci_handler,\n");
            {code}
            to check the MY_CS_NOPAD flag and print *my_collation_8bit_nopad_bin_handler* and *my_collation_8bit_simple_nopad_ci_handler* (which we added at step 3) when {{MY_CS_NOPAD}} is set.
             

            - Regenerate ctype-extra.c, as described in the beginning of the section 4.
            - Add tests

            bar Alexander Barkov made changes -
            Description MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other Unicode and Asian character sets and for latin1 are to be added using about the same steps (some minor details may differ though).
            Collations for 8-bit character sets are to be done in a different way, see section 4.

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            To simplify adding test, we'll create a shared include file, say mysql-test/include/ctype_pad.inc,
            which will be then included in all mysql-test/t/ctype_xxx.test files for individual character sets. See ctype_regex.inc as an example of such shared file.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details" (for Unicode and Asian character sets and for latin1) and in section 4 (for 8-bit character sets)
            - Tests, as described in "Testing"

            h2. 1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            h2. 2. Collations for the Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            h2. 3. Collations for latin1
            Unlike all other 8-bit character sets, latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            h2. 4. Collations for the other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are defined in ctype-extra.c.
            Inlike all other ctype-xxxx.c files, this file is not manually written. It's generated from collation definition files sql/share/charsets/*.xml.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.

            We'll need the following changes:
            h3. 4a. Make collation definition loader understand a new "nopad" flag.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml can look like this (notice a new line highlighted):
            {code:xml|highlight=5}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            h3. 4b. Make collation definition loader reuse weight tables
            New _ci collations should use exactly the same collating weights with their PAD SPACE counterparts.
            For example, a collating weight table for greek_general_ci resides in greek.xml and looks like this:
            {code:xml}
            <collation name="greek_general_ci">
            <map>
             00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
             10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
             20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
             30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
             40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
             60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
             80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
             90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
             A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
             B0 B1 B2 B3 B4 B5 C1 B7 C5 C7 C9 BB CF BD D5 D9
             C9 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 C9 D5 C1 C5 C7 C9
             D5 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D3 D3 D4 D5 D6 D7 D8 D9 C9 D5 CF D5 D9 FF
            </map>
            </collation>
            {code}
            It consists of 256 weights, one weight per code. The first weight corresponds the character with the code 0x00, the last weight corresponds to the character with the code 0xFF.
            Notice, the characters 0x41 and 0x61 (Latin letters 'a' and 'A') have the same weight of 0x41. This makes the collation case insensitive.

            The easiest way would be just to copy-and-paste the weight table definition and use the copy with a new name greek_general_nopad_ci.
            But it would be nice to avoid duplication of definitions. We'll extend ctype.c to understand references to other collations instead of explicit weight table definitions, so a new block for the "greek" character set can look like this:

            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_nopad_ci" id="xxx" flag="nopad" map="greek_general_ci"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            Notice, the "map" attribute has another collation name rather than a full weight table definition.

            h3. 4c. Add new 8-bit collations
            At this sub-step we'll use the changes made in the collation definition loader mentioned in 4a and 4b, and do the following:
            - Edit Index.xml:
            -- Add new _nopad_bin collations to all 8-bit character sets, as described in the example for "greek" in 4a (one collation per character set)
            -- Add new _nopad_ci collations to all 8-bit character sets, as described in the example for "greek" in 4b (one collation per character set)
            - Extend this block in conf_to_src.c:
            {code:c}
              if (cs->state & MY_CS_BINSORT)
                fprintf(f," &my_collation_8bit_bin_handler,\n");
              else
                fprintf(f," &my_collation_8bit_simple_ci_handler,\n");
            {code}
            to check the MY_CS_NOPAD flag and print *my_collation_8bit_nopad_bin_handler* and *my_collation_8bit_simple_nopad_ci_handler* (which we added at step 3) when {{MY_CS_NOPAD}} is set.
             

            - Regenerate ctype-extra.c, as described in the beginning of the section 4.
            - Add tests

            MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other Unicode and Asian character sets and for latin1 are to be added using about the same steps (some minor details may differ though).
            Collations for 8-bit character sets are to be done in a different way, see section 4.

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            To simplify adding test, we'll create a shared include file, say mysql-test/include/ctype_pad.inc,
            which will be then included in all mysql-test/t/ctype_xxx.test files for individual character sets. See ctype_regex.inc as an example of such shared file.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details" (for Unicode and Asian character sets and for latin1) and in section 4 (for 8-bit character sets)
            - Tests, as described in "Testing"

            h2. 1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            h2. 2. Collations for the Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            h2. 3. Collations for latin1
            Unlike all other 8-bit character sets, latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            h2. 4. Collations for the other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are defined in ctype-extra.c.
            Inlike all other ctype-xxxx.c files, this file is not manually written. It's generated from collation definition files sql/share/charsets/*.xml.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.

            We'll need the following changes:
            h3. 4a. Make collation definition loader understand a new "nopad" flag.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml can look like this (notice a new line highlighted):
            {code:xml|highlight=5}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            h3. 4b. Make collation definition loader reuse weight tables
            New _ci collations should use exactly the same collating weights with their PAD SPACE counterparts.
            For example, a collating weight table for greek_general_ci resides in greek.xml and looks like this:
            {code:xml}
            <collation name="greek_general_ci">
            <map>
             00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
             10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
             20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
             30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
             40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
             60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
             80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
             90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
             A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
             B0 B1 B2 B3 B4 B5 C1 B7 C5 C7 C9 BB CF BD D5 D9
             C9 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 C9 D5 C1 C5 C7 C9
             D5 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D3 D3 D4 D5 D6 D7 D8 D9 C9 D5 CF D5 D9 FF
            </map>
            </collation>
            {code}
            It consists of 256 weights, one weight per code. The first weight corresponds the character with the code 0x00, the last weight corresponds to the character with the code 0xFF.
            Notice, the characters 0x41 and 0x61 (Latin letters 'a' and 'A') have the same weight of 0x41. This makes the collation case insensitive.

            The easiest way would be just to copy-and-paste the weight table definition and use the copy with a new name greek_general_nopad_ci.
            But it would be nice to avoid duplication of definitions. We'll extend ctype.c to understand references to other collations instead of explicit weight table definitions, so a new block for the "greek" character set can look like this:

            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_nopad_ci" id="xxx" flag="nopad" map="greek_general_ci"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            Notice, the "map" attribute has another collation name rather than a full weight table definition.

            h3. 4c. Add new 8-bit collations
            At this sub-step we'll use the changes made in the collation definition loader mentioned in 4a and 4b, and do the following:
            - Edit Index.xml:
            -- Add new _nopad_bin collations to all 8-bit character sets, as described in the example for "greek" in 4a (one collation per character set)
            -- Add new _nopad_ci collations to all 8-bit character sets, as described in the example for "greek" in 4b (one collation per character set)
            - Extend this block in conf_to_src.c:
            {code:c}
              if (cs->state & MY_CS_BINSORT)
                fprintf(f," &my_collation_8bit_bin_handler,\n");
              else
                fprintf(f," &my_collation_8bit_simple_ci_handler,\n");
            {code}
            to check the {{MY_CS_NOPAD}} flag and print {{my_collation_8bit_nopad_bin_handler}} and {{my_collation_8bit_simple_nopad_ci_handler}} (which we added at step 3) when {{MY_CS_NOPAD}} is set.
             

            - Regenerate ctype-extra.c, as described in the beginning of the section 4.
            - Add tests

            bar Alexander Barkov made changes -
            Description MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other Unicode and Asian character sets and for latin1 are to be added using about the same steps (some minor details may differ though).
            Collations for 8-bit character sets are to be done in a different way, see section 4.

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            To simplify adding test, we'll create a shared include file, say mysql-test/include/ctype_pad.inc,
            which will be then included in all mysql-test/t/ctype_xxx.test files for individual character sets. See ctype_regex.inc as an example of such shared file.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details" (for Unicode and Asian character sets and for latin1) and in section 4 (for 8-bit character sets)
            - Tests, as described in "Testing"

            h2. 1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            h2. 2. Collations for the Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            h2. 3. Collations for latin1
            Unlike all other 8-bit character sets, latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            h2. 4. Collations for the other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are defined in ctype-extra.c.
            Inlike all other ctype-xxxx.c files, this file is not manually written. It's generated from collation definition files sql/share/charsets/*.xml.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.

            We'll need the following changes:
            h3. 4a. Make collation definition loader understand a new "nopad" flag.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml can look like this (notice a new line highlighted):
            {code:xml|highlight=5}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            h3. 4b. Make collation definition loader reuse weight tables
            New _ci collations should use exactly the same collating weights with their PAD SPACE counterparts.
            For example, a collating weight table for greek_general_ci resides in greek.xml and looks like this:
            {code:xml}
            <collation name="greek_general_ci">
            <map>
             00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
             10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
             20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
             30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
             40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
             60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
             80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
             90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
             A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
             B0 B1 B2 B3 B4 B5 C1 B7 C5 C7 C9 BB CF BD D5 D9
             C9 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 C9 D5 C1 C5 C7 C9
             D5 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D3 D3 D4 D5 D6 D7 D8 D9 C9 D5 CF D5 D9 FF
            </map>
            </collation>
            {code}
            It consists of 256 weights, one weight per code. The first weight corresponds the character with the code 0x00, the last weight corresponds to the character with the code 0xFF.
            Notice, the characters 0x41 and 0x61 (Latin letters 'a' and 'A') have the same weight of 0x41. This makes the collation case insensitive.

            The easiest way would be just to copy-and-paste the weight table definition and use the copy with a new name greek_general_nopad_ci.
            But it would be nice to avoid duplication of definitions. We'll extend ctype.c to understand references to other collations instead of explicit weight table definitions, so a new block for the "greek" character set can look like this:

            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_nopad_ci" id="xxx" flag="nopad" map="greek_general_ci"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            Notice, the "map" attribute has another collation name rather than a full weight table definition.

            h3. 4c. Add new 8-bit collations
            At this sub-step we'll use the changes made in the collation definition loader mentioned in 4a and 4b, and do the following:
            - Edit Index.xml:
            -- Add new _nopad_bin collations to all 8-bit character sets, as described in the example for "greek" in 4a (one collation per character set)
            -- Add new _nopad_ci collations to all 8-bit character sets, as described in the example for "greek" in 4b (one collation per character set)
            - Extend this block in conf_to_src.c:
            {code:c}
              if (cs->state & MY_CS_BINSORT)
                fprintf(f," &my_collation_8bit_bin_handler,\n");
              else
                fprintf(f," &my_collation_8bit_simple_ci_handler,\n");
            {code}
            to check the {{MY_CS_NOPAD}} flag and print {{my_collation_8bit_nopad_bin_handler}} and {{my_collation_8bit_simple_nopad_ci_handler}} (which we added at step 3) when {{MY_CS_NOPAD}} is set.
             

            - Regenerate ctype-extra.c, as described in the beginning of the section 4.
            - Add tests

            MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other Unicode and Asian character sets and for latin1 are to be added using about the same steps (some minor details may differ though).
            Collations for 8-bit character sets are to be done in a different way, see section 4.

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            To simplify adding test, we'll create a shared include file, say mysql-test/include/ctype_pad.inc,
            which will be then included in all mysql-test/t/ctype_xxx.test files for individual character sets. See ctype_regex.inc as an example of such shared file.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details" (for Unicode and Asian character sets and for latin1) and in section 4 (for 8-bit character sets)
            - Tests, as described in "Testing"

            h2. 1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            h2. 2. Collations for the Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            h2. 3. Collations for latin1
            Unlike all other 8-bit character sets, latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            h2. 4. Collations for the other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are defined in ctype-extra.c.
            Inlike all other ctype-xxxx.c files, this file is not manually written. It's generated from collation definition files sql/share/charsets/*.xml.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.

            We'll need the following changes:
            h3. 4a. Make collation definition loader understand a new "nopad" flag.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml can look like this (notice a new line highlighted):
            {code:xml|highlight=5}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            h3. 4b. Make collation definition loader reuse weight tables
            New _ci collations should use exactly the same collating weights with their PAD SPACE counterparts.
            For example, a collating weight table for greek_general_ci resides in greek.xml and looks like this:
            {code:xml}
            <collation name="greek_general_ci">
            <map>
             00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
             10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
             20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
             30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
             40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
             60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
             80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
             90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
             A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
             B0 B1 B2 B3 B4 B5 C1 B7 C5 C7 C9 BB CF BD D5 D9
             C9 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 C9 D5 C1 C5 C7 C9
             D5 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D3 D3 D4 D5 D6 D7 D8 D9 C9 D5 CF D5 D9 FF
            </map>
            </collation>
            {code}
            It consists of 256 weights, one weight per code. The first weight corresponds the character with the code 0x00, the last weight corresponds to the character with the code 0xFF.
            Notice, the characters 0x41 and 0x61 (Latin letters 'a' and 'A') have the same weight of 0x41. This makes the collation case insensitive.

            The easiest way would be just to copy-and-paste the weight table definition and use the copy with a new name greek_general_nopad_ci.
            But it would be nice to avoid duplication of definitions. We'll extend ctype.c to understand references to other collations instead of explicit weight table definitions, so a new block for the "greek" character set can look like this:

            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_nopad_ci" id="xxx" flag="nopad" map="greek_general_ci"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            Notice, the "map" attribute has another collation name rather than a full weight table definition.

            h3. 4c. Add new 8-bit collations
            At this sub-step we'll use the changes made in the collation definition loader mentioned in 4a and 4b, and do the following:
            - Edit Index.xml:
            -- Add new _nopad_bin collations to all 8-bit character sets, as described in the example for "greek" in 4a (one collation per character set)
            -- Add new _nopad_ci collations to all 8-bit character sets, as described in the example for "greek" in 4b (one collation per character set)
            - Extend this block in conf_to_src.c:
            {code:c}
              if (cs->state & MY_CS_BINSORT)
                fprintf(f," &my_collation_8bit_bin_handler,\n");
              else
                fprintf(f," &my_collation_8bit_simple_ci_handler,\n");
            {code}
            to check the {{MY_CS_NOPAD}} flag and print {{my_collation_8bit_nopad_bin_handler}} and {{my_collation_8bit_simple_nopad_ci_handler}} (which we added at step 3) when {{MY_CS_NOPAD}} is set.
             

            - Regenerate ctype-extra.c, as described in the beginning of the section 4.
            - Add tests for the new 8bit collations.

            bar Alexander Barkov made changes -
            Description MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other Unicode and Asian character sets and for latin1 are to be added using about the same steps (some minor details may differ though).
            Collations for 8-bit character sets are to be done in a different way, see section 4.

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            To simplify adding test, we'll create a shared include file, say mysql-test/include/ctype_pad.inc,
            which will be then included in all mysql-test/t/ctype_xxx.test files for individual character sets. See ctype_regex.inc as an example of such shared file.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details" (for Unicode and Asian character sets and for latin1) and in section 4 (for 8-bit character sets)
            - Tests, as described in "Testing"

            h2. 1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            h2. 2. Collations for the Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            h2. 3. Collations for latin1
            Unlike all other 8-bit character sets, latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            h2. 4. Collations for the other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are defined in ctype-extra.c.
            Inlike all other ctype-xxxx.c files, this file is not manually written. It's generated from collation definition files sql/share/charsets/*.xml.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.

            We'll need the following changes:
            h3. 4a. Make collation definition loader understand a new "nopad" flag.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml can look like this (notice a new line highlighted):
            {code:xml|highlight=5}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            h3. 4b. Make collation definition loader reuse weight tables
            New _ci collations should use exactly the same collating weights with their PAD SPACE counterparts.
            For example, a collating weight table for greek_general_ci resides in greek.xml and looks like this:
            {code:xml}
            <collation name="greek_general_ci">
            <map>
             00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
             10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
             20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
             30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
             40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
             60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
             80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
             90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
             A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
             B0 B1 B2 B3 B4 B5 C1 B7 C5 C7 C9 BB CF BD D5 D9
             C9 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 C9 D5 C1 C5 C7 C9
             D5 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D3 D3 D4 D5 D6 D7 D8 D9 C9 D5 CF D5 D9 FF
            </map>
            </collation>
            {code}
            It consists of 256 weights, one weight per code. The first weight corresponds the character with the code 0x00, the last weight corresponds to the character with the code 0xFF.
            Notice, the characters 0x41 and 0x61 (Latin letters 'a' and 'A') have the same weight of 0x41. This makes the collation case insensitive.

            The easiest way would be just to copy-and-paste the weight table definition and use the copy with a new name greek_general_nopad_ci.
            But it would be nice to avoid duplication of definitions. We'll extend ctype.c to understand references to other collations instead of explicit weight table definitions, so a new block for the "greek" character set can look like this:

            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_nopad_ci" id="xxx" flag="nopad" map="greek_general_ci"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            Notice, the "map" attribute has another collation name rather than a full weight table definition.

            h3. 4c. Add new 8-bit collations
            At this sub-step we'll use the changes made in the collation definition loader mentioned in 4a and 4b, and do the following:
            - Edit Index.xml:
            -- Add new _nopad_bin collations to all 8-bit character sets, as described in the example for "greek" in 4a (one collation per character set)
            -- Add new _nopad_ci collations to all 8-bit character sets, as described in the example for "greek" in 4b (one collation per character set)
            - Extend this block in conf_to_src.c:
            {code:c}
              if (cs->state & MY_CS_BINSORT)
                fprintf(f," &my_collation_8bit_bin_handler,\n");
              else
                fprintf(f," &my_collation_8bit_simple_ci_handler,\n");
            {code}
            to check the {{MY_CS_NOPAD}} flag and print {{my_collation_8bit_nopad_bin_handler}} and {{my_collation_8bit_simple_nopad_ci_handler}} (which we added at step 3) when {{MY_CS_NOPAD}} is set.
             

            - Regenerate ctype-extra.c, as described in the beginning of the section 4.
            - Add tests for the new 8bit collations.

            MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other Unicode and Asian character sets and for latin1 are to be added using about the same steps (some minor details may differ though).
            Collations for 8-bit character sets are to be done in a different way, see section 4.

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            To simplify adding test, we'll create a shared include file, say mysql-test/include/ctype_pad.inc,
            which will be then included in all mysql-test/t/ctype_xxx.test files for individual character sets. See ctype_regex.inc as an example of such shared file.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details" (for Unicode and Asian character sets and for latin1) and in section 4 (for 8-bit character sets)
            - Tests, as described in "Testing"

            h2. 1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            h2. 2. Collations for the Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            h2. 3. Collations for latin1
            Unlike all other 8-bit character sets, latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            h2. 4. Collations for the other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are defined in ctype-extra.c.
            Unlike all other ctype-xxxx.c files, this file is not manually written. It's generated from collation definition files sql/share/charsets/*.xml.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.

            We'll need the following changes:
            h3. 4a. Make collation definition loader understand a new "nopad" flag.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml can look like this (notice a new line highlighted):
            {code:xml|highlight=5}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            h3. 4b. Make collation definition loader reuse weight tables
            New _ci collations should use exactly the same collating weights with their PAD SPACE counterparts.
            For example, a collating weight table for greek_general_ci resides in greek.xml and looks like this:
            {code:xml}
            <collation name="greek_general_ci">
            <map>
             00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
             10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
             20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
             30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
             40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
             60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
             80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
             90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
             A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
             B0 B1 B2 B3 B4 B5 C1 B7 C5 C7 C9 BB CF BD D5 D9
             C9 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 C9 D5 C1 C5 C7 C9
             D5 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D3 D3 D4 D5 D6 D7 D8 D9 C9 D5 CF D5 D9 FF
            </map>
            </collation>
            {code}
            It consists of 256 weights, one weight per code. The first weight corresponds the character with the code 0x00, the last weight corresponds to the character with the code 0xFF.
            Notice, the characters 0x41 and 0x61 (Latin letters 'a' and 'A') have the same weight of 0x41. This makes the collation case insensitive.

            The easiest way would be just to copy-and-paste the weight table definition and use the copy with a new name greek_general_nopad_ci.
            But it would be nice to avoid duplication of definitions. We'll extend ctype.c to understand references to other collations instead of explicit weight table definitions, so a new block for the "greek" character set can look like this:

            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_nopad_ci" id="xxx" flag="nopad" map="greek_general_ci"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            Notice, the "map" attribute has another collation name rather than a full weight table definition.

            h3. 4c. Add new 8-bit collations
            At this sub-step we'll use the changes made in the collation definition loader mentioned in 4a and 4b, and do the following:
            - Edit Index.xml:
            -- Add new _nopad_bin collations to all 8-bit character sets, as described in the example for "greek" in 4a (one collation per character set)
            -- Add new _nopad_ci collations to all 8-bit character sets, as described in the example for "greek" in 4b (one collation per character set)
            - Extend this block in conf_to_src.c:
            {code:c}
              if (cs->state & MY_CS_BINSORT)
                fprintf(f," &my_collation_8bit_bin_handler,\n");
              else
                fprintf(f," &my_collation_8bit_simple_ci_handler,\n");
            {code}
            to check the {{MY_CS_NOPAD}} flag and print {{my_collation_8bit_nopad_bin_handler}} and {{my_collation_8bit_simple_nopad_ci_handler}} (which we added at step 3) when {{MY_CS_NOPAD}} is set.
             

            - Regenerate ctype-extra.c, as described in the beginning of the section 4.
            - Add tests for the new 8bit collations.

            bar Alexander Barkov made changes -
            Description MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD. MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other Unicode and Asian character sets and for latin1 are to be added using about the same steps (some minor details may differ though).
            Collations for 8-bit character sets are to be done in a different way, see section 4.

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            To simplify adding test, we'll create a shared include file, say mysql-test/include/ctype_pad.inc,
            which will be then included in all mysql-test/t/ctype_xxx.test files for individual character sets. See ctype_regex.inc as an example of such shared file.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details" (for Unicode and Asian character sets and for latin1) and in section 4 (for 8-bit character sets)
            - Tests, as described in "Testing"

            h2. 1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            h2. 2. Collations for the Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            h2. 3. Collations for latin1
            Unlike all other 8-bit character sets, latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            h2. 4. Collations for the other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are defined in ctype-extra.c.
            Unlike all other ctype-xxxx.c files, this file is not manually written. It's generated from collation definition files sql/share/charsets/*.xml.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.

            We'll need the following changes:
            h3. 4a. Make collation definition loader understand a new "nopad" flag.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml can look like this (notice a new line highlighted):
            {code:xml|highlight=5}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            h3. 4b. Make collation definition loader reuse weight tables
            New _ci collations should use exactly the same collating weights with their PAD SPACE counterparts.
            For example, a collating weight table for greek_general_ci resides in greek.xml and looks like this:
            {code:xml}
            <collation name="greek_general_ci">
            <map>
             00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
             10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
             20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
             30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
             40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
             60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
             80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
             90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
             A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
             B0 B1 B2 B3 B4 B5 C1 B7 C5 C7 C9 BB CF BD D5 D9
             C9 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 C9 D5 C1 C5 C7 C9
             D5 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D3 D3 D4 D5 D6 D7 D8 D9 C9 D5 CF D5 D9 FF
            </map>
            </collation>
            {code}
            It consists of 256 weights, one weight per code. The first weight corresponds the character with the code 0x00, the last weight corresponds to the character with the code 0xFF.
            Notice, the characters 0x41 and 0x61 (Latin letters 'a' and 'A') have the same weight of 0x41. This makes the collation case insensitive.

            The easiest way would be just to copy-and-paste the weight table definition and use the copy with a new name greek_general_nopad_ci.
            But it would be nice to avoid duplication of definitions. We'll extend ctype.c to understand references to other collations instead of explicit weight table definitions, so a new block for the "greek" character set can look like this:

            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_nopad_ci" id="xxx" flag="nopad" map="greek_general_ci"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            Notice, the "map" attribute has another collation name rather than a full weight table definition.

            h3. 4c. Add new 8-bit collations
            At this sub-step we'll use the changes made in the collation definition loader mentioned in 4a and 4b, and do the following:
            - Edit Index.xml:
            -- Add new _nopad_bin collations to all 8-bit character sets, as described in the example for "greek" in 4a (one collation per character set)
            -- Add new _nopad_ci collations to all 8-bit character sets, as described in the example for "greek" in 4b (one collation per character set)
            - Extend this block in conf_to_src.c:
            {code:c}
              if (cs->state & MY_CS_BINSORT)
                fprintf(f," &my_collation_8bit_bin_handler,\n");
              else
                fprintf(f," &my_collation_8bit_simple_ci_handler,\n");
            {code}
            to check the {{MY_CS_NOPAD}} flag and print {{my_collation_8bit_nopad_bin_handler}} and {{my_collation_8bit_simple_nopad_ci_handler}} (which we added at step 3) when {{MY_CS_NOPAD}} is set.
             

            - Regenerate ctype-extra.c, as described in the beginning of the section 4.
            - Add tests for the new 8bit collations.

            MariaDB currently ignores trailing spaces when comparing values of the CHAR, VARCHAR or TEXT data types.

            For example, these three scripts:
            {code:sql}
            CREATE TABLE t1 (a CHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a VARCHAR(5));
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            {code:sql}
            CREATE TABLE t1 (a TEXT);
            INSERT INTO t1 VALUES ('a'),('a ');
            SELECT * FROM t1 WHERE a='a ';
            {code}
            ignore trailing spaces in the column value and in the string constant and return two rows as a result.

            This is correct. According to the standard, the behavior of trailing space comparison does not depend on the data type. It depends only on the collation, and namely on its PAD attribute, which can be PAD SPACE or NO PAD.

            The SQL standard says:
            {quote}
            The comparison of two character string expressions depends on the collation used for the comparison (see Subclause 9.13, “Collation determination”). When values of unequal length are compared, if the collation for the comparison has the NO PAD characteristic and the shorter value is equal to some prefix of the longer value, then the shorter value is considered less than the longer value. If the collation for the comparison has the PAD SPACE characteristic, for the purposes of the comparison, the shorter value is effectively extended to the length of the longer by concatenation of <space>s on the right.
            {quote}
            MariaDB currently has PAD SPACE collations only, so trailing spaces are always ignored for the CHAR, VARCHAR and TEXT data types.

            In certain cases it would be nice to take trailing spaces into account.

            Under terms of this task, we want to add NO PAD variants for all default collations and for all _bin collations.

            {quote}
            Note: We eventually want to add NO PAD variants for all other collations. But this will be done separately. This task is only about adding NO PAD variants for default and _bin collations.
            {quote}
            New collation names will have the _nopad_ substring. The list of all new desired collations can be extracted using this SQL query:

            {code:sql}
            SELECT
              CHARACTER_SET_NAME,
              REGEXP_REPLACE(DEFAULT_COLLATE_NAME,'_[^_]*$','_nopad\\0') AS NEW_NAME1,
              CONCAT(CHARACTER_SET_NAME,'_nopad_bin') AS NEW_NAME2
            FROM INFORMATION_SCHEMA.CHARACTER_SETS
            WHERE CHARACTER_SET_NAME<>'binary';
            {code}
            {noformat}
            +--------------------+---------------------------+--------------------+
            | CHARACTER_SET_NAME | NEW_NAME1 | NEW_NAME2 |
            +--------------------+---------------------------+--------------------+
            | big5 | big5_chinese_nopad_ci | big5_nopad_bin |
            | dec8 | dec8_swedish_nopad_ci | dec8_nopad_bin |
            | cp850 | cp850_general_nopad_ci | cp850_nopad_bin |
            | hp8 | hp8_english_nopad_ci | hp8_nopad_bin |
            | koi8r | koi8r_general_nopad_ci | koi8r_nopad_bin |
            | latin1 | latin1_swedish_nopad_ci | latin1_nopad_bin |
            | latin2 | latin2_general_nopad_ci | latin2_nopad_bin |
            | swe7 | swe7_swedish_nopad_ci | swe7_nopad_bin |
            | ascii | ascii_general_nopad_ci | ascii_nopad_bin |
            | ujis | ujis_japanese_nopad_ci | ujis_nopad_bin |
            | sjis | sjis_japanese_nopad_ci | sjis_nopad_bin |
            | hebrew | hebrew_general_nopad_ci | hebrew_nopad_bin |
            | tis620 | tis620_thai_nopad_ci | tis620_nopad_bin |
            | euckr | euckr_korean_nopad_ci | euckr_nopad_bin |
            | koi8u | koi8u_general_nopad_ci | koi8u_nopad_bin |
            | gb2312 | gb2312_chinese_nopad_ci | gb2312_nopad_bin |
            | greek | greek_general_nopad_ci | greek_nopad_bin |
            | cp1250 | cp1250_general_nopad_ci | cp1250_nopad_bin |
            | gbk | gbk_chinese_nopad_ci | gbk_nopad_bin |
            | latin5 | latin5_turkish_nopad_ci | latin5_nopad_bin |
            | armscii8 | armscii8_general_nopad_ci | armscii8_nopad_bin |
            | utf8 | utf8_general_nopad_ci | utf8_nopad_bin |
            | ucs2 | ucs2_general_nopad_ci | ucs2_nopad_bin |
            | cp866 | cp866_general_nopad_ci | cp866_nopad_bin |
            | keybcs2 | keybcs2_general_nopad_ci | keybcs2_nopad_bin |
            | macce | macce_general_nopad_ci | macce_nopad_bin |
            | macroman | macroman_general_nopad_ci | macroman_nopad_bin |
            | cp852 | cp852_general_nopad_ci | cp852_nopad_bin |
            | latin7 | latin7_general_nopad_ci | latin7_nopad_bin |
            | utf8mb4 | utf8mb4_general_nopad_ci | utf8mb4_nopad_bin |
            | cp1251 | cp1251_general_nopad_ci | cp1251_nopad_bin |
            | utf16 | utf16_general_nopad_ci | utf16_nopad_bin |
            | utf16le | utf16le_general_nopad_ci | utf16le_nopad_bin |
            | cp1256 | cp1256_general_nopad_ci | cp1256_nopad_bin |
            | cp1257 | cp1257_general_nopad_ci | cp1257_nopad_bin |
            | utf32 | utf32_general_nopad_ci | utf32_nopad_bin |
            | geostd8 | geostd8_general_nopad_ci | geostd8_nopad_bin |
            | cp932 | cp932_japanese_nopad_ci | cp932_nopad_bin |
            | eucjpms | eucjpms_japanese_nopad_ci | eucjpms_nopad_bin |
            +--------------------+---------------------------+--------------------+
            39 rows in set (0.00 sec)
            {noformat}


            h1. Implementation details


            Suppose we need to add utf8_general_nopad_ci, which will be based on utf8_general_ci but will have the NO PAD attribute.
            utf8_general_ci is implemented in strings/ctype-utf8.c.
            Adding utf8_general_nopad_ci can be done in these three steps:

            1. Add a new collation handler

            Copy the collation handler from the existing my_collation_utf8_general_ci_handler which looks like this:

            {code:c}
            static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler =
            {
                NULL, /* init */
                my_strnncoll_utf8_general_ci,
                my_strnncollsp_utf8_general_ci,
                my_strnxfrm_unicode,
                my_strnxfrmlen_unicode,
                my_like_range_mb,
                my_wildcmp_utf8,
                my_strcasecmp_utf8,
                my_instr_mb,
                my_hash_sort_utf8,
                my_propagate_complex
            };
            {code}

            and then replace these three virtual functions to new similar functions that will not ignore trailing spaces:

            - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes
            - my_hash_sort_utf8 - this is used for HASH indexes
            - my_strnxfrm_unicode - this is used for filesort (non-indexed ORDER BY)

            All other functions can be reused from the existing PAD SPACE collation.

            So the new handler will look about like this:

            {code:c|highlight=5,6,12}
            static MY_COLLATION_HANDLER my_collation_utf8_general_nopad_ci_handler =
            {
              NULL, /* init */
              my_strnncoll_utf8_general_ci,
              my_strnncollsp_utf8_general_nopad_ci, /* a new function */
              my_strnxfrm_unicode_nopad, /* a new function */
              my_strnxfrmlen_unicode,
              my_like_range_mb,
              my_wildcmp_utf8,
              my_strcasecmp_utf8,
              my_instr_mb,
              my_hash_sort_utf8_nopad, /* a new function */
              my_propagate_complex
            };
            {code}

            2. Add a new collation definition by copying it from my_charset_utf8_general_ci

            The new definition will look like this:

            {code:c|highlight=3,4,30}
            struct charset_info_st my_charset_utf8_general_nopad_ci=
            {
              333,0,0, /* number */
              MY_CS_COMPILED|MY_CS_STRNXFRM|MY_CS_UNICODE|MY_CS_NOPAD, /* state */
              "utf8", /* cs name */
              "utf8_general_nopad_ci", /* name */
              "", /* comment */
              NULL, /* tailoring */
              ctype_utf8, /* ctype */
              to_lower_utf8, /* to_lower */
              to_upper_utf8, /* to_upper */
              to_upper_utf8, /* sort_order */
              NULL, /* uca */
              NULL, /* tab_to_uni */
              NULL, /* tab_from_uni */
              &my_unicase_default,/* caseinfo */
              NULL, /* state_map */
              NULL, /* ident_map */
              1, /* strxfrm_multiply */
              1, /* caseup_multiply */
              1, /* casedn_multiply */
              1, /* mbminlen */
              3, /* mbmaxlen */
              0, /* min_sort_char */
              0xFFFF, /* max_sort_char */
              ' ', /* pad char */
              0, /* escape_with_backslash_is_dangerous */
              1, /* levels_for_order */
              &my_charset_utf8_handler,
              &my_collation_utf8_general_nopad_ci_handler
            };
            {code}

            Notice, it looks very similar to the definition of my_charset_utf8_general_ci but
            - has a new distinct ID: 333 (the exact IDs for all new collations will be chosen later)
            - has a new distinct name: utf8_general_nopad_ci
            - has an extra {{MY_CS_NOPAD}} flag
            - does not have {{MY_CS_PRIMARY}} in flags
            - uses the new collation handler {{my_collation_utf8_general_nopad_ci_handler}} instead of {{my_collation_utf8_general_ci_handler}}

            3. Add a collation initialization code

            Open {{mysys/charset-def.c}} and add an initialization line like this:

            {code:c|highlight=3}
            #ifdef HAVE_CHARSET_utf8
              add_compiled_collation(&my_charset_utf8_general_ci);
              add_compiled_collation(&my_charset_utf8_general_nopad_ci); /* This is the new line */
              add_compiled_collation(&my_charset_utf8_bin);
              add_compiled_collation(&my_charset_utf8_general_mysql500_ci);
            {code}

            Collations for the other Unicode and Asian character sets and for latin1 are to be added using about the same steps (some minor details may differ though).
            Collations for 8-bit character sets are to be done in a different way, see section 4.

            h1. Testing
            The task will include tests for the MariaDB "mtr" test infrastructure in the mysql-test source directory.
            Tests will cover all new collations, in all parts of SQL queries involving comparison, including:
            - Unique indexes
            - UNION DISTINCT
            - DISTINCT (with and without indexes)
            - ORDER BY (with and without indexes)
            - GROUP BY (with and without indexes)
            - Mixing NO PAD and PAD collations
            - Aggregate functions
            -- MIIN(), MAX()
            -- COUNT(DISTINCT)
            -- GROUP_CONCAT(DISTINCT)
            - SQL functions involving comparison
            -- LEAST(), GREATEST()
            -- IF()
            -- NULLIF()
            -- CASE WHEN a = b THEN ...
            -- CASE a WHEN a0 THEN ...

            Tests should cover MyISAM, HEAP and InnoDB tables.

            To simplify adding test, we'll create a shared include file, say mysql-test/include/ctype_pad.inc,
            which will be then included in all mysql-test/t/ctype_xxx.test files for individual character sets. See ctype_regex.inc as an example of such shared file.

            h1. Proposed development order
            As collations are quite stand-alone pieces of code and do not affect each other, it's easier to implement new collations one by one, consequently, in separate commits.
            Every commit can include collations xxx_nopad_ci and xxx_bin for a single character set and should consist of:
            - implementation, as described in "Implementation details" (for Unicode and Asian character sets and for latin1) and in section 4 (for 8-bit character sets)
            - Tests, as described in "Testing"

            h2. 1. Collations for the Unicode character sets:
            - utf8
            - utf8mb4
            - utf16
            - utf16le
            - utf32

            h2. 2. Collations for the Asian character sets
            - big5
            - cp932
            - eucjpms
            - euckr
            - gb2312
            - gbk
            - sjis
            - ujis

            h2. 3. Collations for latin1
            Unlike all other 8-bit character sets, latin1 has a special implementation in ctype-latin1.c, so it's easier to have it in a separate step.
            This step will start with introducing new shared handlers:
            - my_collation_8bit_simple_nopad_ci_handler
            - my_collation_8bit_nopad_bin_handler

            and use these handlers to actually add new collations for latin1.
            Note, the handlers added at this step will be reused for all other 8-bit character sets at the next step.

            h2. 4. Collations for the other 8-bit character sets:
            - dec8
            - cp850
            - hp8
            - koi8r
            - latin2
            - swe7
            - ascii
            - hebrew
            - tis620
            - koi8u
            - greek
            - cp1250
            - latin5
            - armscii8
            - cp866
            - keybcs2
            - macce
            - macroman
            - cp852
            - latin7
            - cp1251
            - cp1256
            - cp1257
            - geostd8

            8-bit collations are defined in ctype-extra.c.
            Unlike all other ctype-xxxx.c files, this file is not manually written. It's generated from collation definition files sql/share/charsets/*.xml.

            Whenever we modify the collation definition files, we do the following procedure to regenerate ctype-extra.c:
            {noformat}
            cd strings
            make conf_to_src
            ./conf_to_src ../sql/share/charsets/ >ctype-extra2.c
            # now make sure that ctype-extra2.c is OK, e.g. "diff ctype-extra.c ctype-extra2.c"
            mv ctype-extra2.c ctype-extra.c
            {noformat}
            New 8-bit collations should also be generated from the collation definition files.

            We'll need the following changes:
            h3. 4a. Make collation definition loader understand a new "nopad" flag.
            The collation definition loader (implemented in ctype.c) should be extended to handle a new flag "nopad", so Index.xml can look like this (notice a new line highlighted):
            {code:xml|highlight=5}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            h3. 4b. Make collation definition loader reuse weight tables
            New _ci collations should use exactly the same collating weights with their PAD SPACE counterparts.
            For example, a collating weight table for greek_general_ci resides in greek.xml and looks like this:
            {code:xml}
            <collation name="greek_general_ci">
            <map>
             00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
             10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
             20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
             30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
             40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
             60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
             50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
             80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
             90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
             A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
             B0 B1 B2 B3 B4 B5 C1 B7 C5 C7 C9 BB CF BD D5 D9
             C9 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 C9 D5 C1 C5 C7 C9
             D5 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
             D0 D1 D3 D3 D4 D5 D6 D7 D8 D9 C9 D5 CF D5 D9 FF
            </map>
            </collation>
            {code}
            It consists of 256 weights, one weight per code. The first weight corresponds the character with the code 0x00, the last weight corresponds to the character with the code 0xFF.
            Notice, the characters 0x41 and 0x61 (Latin letters 'a' and 'A') have the same weight of 0x41. This makes the collation case insensitive.

            The easiest way would be just to copy-and-paste the weight table definition and use the copy with a new name greek_general_nopad_ci.
            But it would be nice to avoid duplication of definitions. We'll extend ctype.c to understand references to other collations instead of explicit weight table definitions, so a new block for the "greek" character set can look like this:

            {code:xml|highlight=5,6}
            <charset name="greek">
              ...
              <collation name="greek_general_ci" id="25" order="Greek" flag="primary"/>
              <collation name="greek_bin" id="70" order="Binary" flag="binary"/>
              <collation name="greek_general_nopad_ci" id="xxx" flag="nopad" map="greek_general_ci"/>
              <collation name="greek_nopad_bin" id="xxx" flag="binary" flag="nopad"/>
            </charset>
            {code}
            Notice, the "map" attribute has another collation name rather than a full weight table definition.

            h3. 4c. Add new 8-bit collations
            At this sub-step we'll use the changes made in the collation definition loader mentioned in 4a and 4b, and do the following:
            - Edit Index.xml:
            -- Add new _nopad_bin collations to all 8-bit character sets, as described in the example for "greek" in 4a (one collation per character set)
            -- Add new _nopad_ci collations to all 8-bit character sets, as described in the example for "greek" in 4b (one collation per character set)
            - Extend this block in conf_to_src.c:
            {code:c}
              if (cs->state & MY_CS_BINSORT)
                fprintf(f," &my_collation_8bit_bin_handler,\n");
              else
                fprintf(f," &my_collation_8bit_simple_ci_handler,\n");
            {code}
            to check the {{MY_CS_NOPAD}} flag and print {{my_collation_8bit_nopad_bin_handler}} and {{my_collation_8bit_simple_nopad_ci_handler}} (which we added at step 3) when {{MY_CS_NOPAD}} is set.
             

            - Regenerate ctype-extra.c, as described in the beginning of the section 4.
            - Add tests for the new 8bit collations.

            monty Michael Widenius made changes -
            serg Sergei Golubchik made changes -
            alvinr Alvin Richards (Inactive) made changes -
            Labels gsoc16 Compatibility gsoc16
            ratzpo Rasmus Johansson (Inactive) made changes -
            Sprint 10.2.2-2 [ 92 ]
            bar Alexander Barkov made changes -
            bar Alexander Barkov made changes -
            bar Alexander Barkov made changes -
            Fix Version/s 10.2.2 [ 22013 ]
            Fix Version/s 10.2 [ 14601 ]
            Resolution Fixed [ 1 ]
            Status Open [ 1 ] Closed [ 6 ]
            alvinr Alvin Richards (Inactive) made changes -
            Sprint 10.2.2-2 [ 92 ] 10.2.2-3, 10.2.2-2 [ 83, 92 ]
            alvinr Alvin Richards (Inactive) made changes -
            Epic Link MDEV-10872 [ 58182 ]
            bar Alexander Barkov made changes -
            alvinr Alvin Richards (Inactive) made changes -
            Labels Compatibility gsoc16 Approved Compatibility gsoc16
            Krishnadas Krishnadas made changes -
            Attachment MDEV-9711.zip [ 42681 ]
            Krishnadas Krishnadas added a comment -

            DBS test cases MDEV-9711.zip

            Krishnadas Krishnadas added a comment - DBS test cases MDEV-9711.zip
            alvinr Alvin Richards (Inactive) made changes -
            Labels Approved Compatibility gsoc16 Approved Compatibility NRE-307517 gsoc16
            alvinr Alvin Richards (Inactive) made changes -
            NRE Projects NRE-307517
            alvinr Alvin Richards (Inactive) made changes -
            Labels Approved Compatibility NRE-307517 gsoc16 Approved Compatibility gsoc16
            alvinr Alvin Richards (Inactive) made changes -
            NRE Approved Yes [ 10304 ]
            alvinr Alvin Richards (Inactive) made changes -
            Labels Approved Compatibility gsoc16 Compatibility gsoc16
            bar Alexander Barkov made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 74526 ] MariaDB v4 [ 132805 ]
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -

            People

              bar Alexander Barkov
              bar Alexander Barkov
              Votes:
              3 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.