Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-5241

Collation incompatibilities with MySQL-5.6

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 10.0.4, 5.5.33a
    • Fix Version/s: 10.0.6
    • Component/s: None
    • Labels:
      None

      Description

      There are incompatibilities between some MariaDB and MySQL collations
      which we need to solve somehow.

      Problems

      1.

      The utf8_croatian_ci and ucs2_croatian_ci collations appeared in MariaDB-5.1 in the end of 2009, based on Alexander Barkov's patch from: http://collation-charts.org/articles/croatian.htm

      Later, the Croatian collations were added into MySQL-5.6.

      Still, MariaDB Croatian collation uses the latest version of the rules from http://unicode.org/cldr/trac/browser/trunk/common/collation/hr.xml while MySQL implements the older version.

      The difference is in 3 letters only. But it's enough to make the indexes incompatible.

      As a effect:

      • utf8_croatian_ci (ID 213) is different in MariaDB and MySQL
      • ucs2_croatian_ci (ID 149) is different in MariaDB and MySQL

      2.

      Later, MySQL-5.5 added support for utf8mb4, utf16, utf32. When merging the new character sets (MySQL-5.5 -> MariaDB-5.5) the MariaDB team added the following corresponding collations, for symmetry with utf8 and ucs2:

      • utf8mb4_croatian_ci (ID=245)
      • utf16_croatian_ci (ID=215)
      • utf32_croatian_ci (ID=214)

      But when the collations with the same names finally appeared in MySQL-5.6, they were given different IDs. So the IDs 215, 215, 245 are assigned in MySQL-5.6 to something else.

      This is what we have in MariaDB:

      mysql> SELECT COLLATION_NAME, ID FROM INFORMATION_SCHEMA.COLLATIONS
         --> WHERE COLLATION_NAME LIKE 'u%croat%';
      +---------------------+-----+
      | COLLATION_NAME      | ID  |
      +---------------------+-----+
      | ucs2_croatian_ci    | 149 |
      | utf8_croatian_ci    | 213 |
      | utf32_croatian_ci   | 214 |
      | utf16_croatian_ci   | 215 |
      | utf8mb4_croatian_ci | 245 |
      +---------------------+-----+
      5 rows in set (0.01 sec)

      This is what we have in MySQL-5.6:

      mysql> SELECT COLLATION_NAME, ID FROM INFORMATION_SCHEMA.COLLATIONS
         --> WHERE ID IN (149,213,214,215,245);
      +---------------------+-----+
      | COLLATION_NAME      | ID  | Problem:
      +---------------------+-----+
      | ucs2_croatian_ci    | 149 | MySQL rules differ from MariaDB rules
      | utf8_croatian_ci    | 213 | MySQL rules differ from MariaDB rules
      | utf8_unicode_520_ci | 214 | MariaDB utf32_croatian_ci
      | utf8_vietnamese_ci  | 215 | MariaDB utf16_croatian_ci
      | utf8mb4_croatian_ci | 245 | MySQL rules differ from MariaDB rules
      +---------------------+-----+

      Solution

      Collation changes

      • Bar moves MariaDB-5.5 xxx_croatian_ci collations to new IDs (preferrably, outside of the 0..255 range), without changing the collation name.
      • Bar merges MySQL-5.6 xxx_croatian_ci using MySQL-5.6 IDs, but changing the names to xxx_croatian_mysql56_ci.

      Detect attempts to open tables with the old MariaDB collations.

      Bar fixes TABLE_SHARE::init_from_binary_frm_image() and adds an error message for a table created by any MariaDB version prior to 10.0.5 that have indexes using collation IDs 213, 149, 245, 215, 214:

      +---------------------+---------+-----+---------+----------+---------+
      | Collation           | Charset | Id  | Default | Compiled | Sortlen |
      +---------------------+---------+-----+---------+----------+---------+
      | utf8_croatian_ci    | utf8    | 213 |         | Yes      |       8 |
      | ucs2_croatian_ci    | ucs2    | 149 |         | Yes      |       8 |
      | utf8mb4_croatian_ci | utf8mb4 | 245 |         | Yes      |       8 |
      | utf16_croatian_ci   | utf16   | 215 |         | Yes      |       8 |
      | utf32_croatian_ci   | utf32   | 214 |         | Yes      |       8 |
      +---------------------+---------+-----+---------+----------+---------+

      ER_TABLE_NEEDS_UPGRADE looks suitable for this purposes:

      "Table upgrade required. Please do \"REPAIR TABLE `%-.32s`\" or dump/reload to fix it!"

      mysql_upgrade

      Monty will try to fix REPAIR to solve the conflicting IDs problem.

      quick REPAIR

      In long terms we can add a quick REPAIR to replace collation IDs in table definitions in FRM files and in engine-specific structure definitions (e.g. in MYI files for MyISAM) without having to do the full repair for the table.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              bar Alexander Barkov
              Reporter:
              serg Sergei Golubchik
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: