[MDEV-5241] Collation incompatibilities with MySQL-5.6 Created: 2013-11-05  Updated: 2019-07-08  Resolved: 2013-11-14

Status: Closed
Project: MariaDB Server
Component/s: None
Affects Version/s: 10.0.4, 5.5.33a
Fix Version/s: 10.0.6

Type: Bug Priority: Major
Reporter: Sergei Golubchik Assignee: Alexander Barkov
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Relates
relates to MDEV-16945 main.ctype_upgrade failed in buildbot... Open

 Description   

There are incompatibilities between some MariaDB and MySQL collations
which we need to solve somehow.

Problems

1.

The utf8_croatian_ci and ucs2_croatian_ci collations appeared in MariaDB-5.1 in the end of 2009, based on Alexander Barkov's patch from: http://collation-charts.org/articles/croatian.htm

Later, the Croatian collations were added into MySQL-5.6.

Still, MariaDB Croatian collation uses the latest version of the rules from http://unicode.org/cldr/trac/browser/trunk/common/collation/hr.xml while MySQL implements the older version.

The difference is in 3 letters only. But it's enough to make the indexes incompatible.

As a effect:

  • utf8_croatian_ci (ID 213) is different in MariaDB and MySQL
  • ucs2_croatian_ci (ID 149) is different in MariaDB and MySQL

2.

Later, MySQL-5.5 added support for utf8mb4, utf16, utf32. When merging the new character sets (MySQL-5.5 -> MariaDB-5.5) the MariaDB team added the following corresponding collations, for symmetry with utf8 and ucs2:

  • utf8mb4_croatian_ci (ID=245)
  • utf16_croatian_ci (ID=215)
  • utf32_croatian_ci (ID=214)

But when the collations with the same names finally appeared in MySQL-5.6, they were given different IDs. So the IDs 215, 215, 245 are assigned in MySQL-5.6 to something else.

This is what we have in MariaDB:

mysql> SELECT COLLATION_NAME, ID FROM INFORMATION_SCHEMA.COLLATIONS
   --> WHERE COLLATION_NAME LIKE 'u%croat%';
+---------------------+-----+
| COLLATION_NAME      | ID  |
+---------------------+-----+
| ucs2_croatian_ci    | 149 |
| utf8_croatian_ci    | 213 |
| utf32_croatian_ci   | 214 |
| utf16_croatian_ci   | 215 |
| utf8mb4_croatian_ci | 245 |
+---------------------+-----+
5 rows in set (0.01 sec)

This is what we have in MySQL-5.6:

mysql> SELECT COLLATION_NAME, ID FROM INFORMATION_SCHEMA.COLLATIONS
   --> WHERE ID IN (149,213,214,215,245);
+---------------------+-----+
| COLLATION_NAME      | ID  | Problem:
+---------------------+-----+
| ucs2_croatian_ci    | 149 | MySQL rules differ from MariaDB rules
| utf8_croatian_ci    | 213 | MySQL rules differ from MariaDB rules
| utf8_unicode_520_ci | 214 | MariaDB utf32_croatian_ci
| utf8_vietnamese_ci  | 215 | MariaDB utf16_croatian_ci
| utf8mb4_croatian_ci | 245 | MySQL rules differ from MariaDB rules
+---------------------+-----+

Solution

Collation changes

  • Bar moves MariaDB-5.5 xxx_croatian_ci collations to new IDs (preferrably, outside of the 0..255 range), without changing the collation name.
  • Bar merges MySQL-5.6 xxx_croatian_ci using MySQL-5.6 IDs, but changing the names to xxx_croatian_mysql56_ci.

Detect attempts to open tables with the old MariaDB collations.

Bar fixes TABLE_SHARE::init_from_binary_frm_image() and adds an error message for a table created by any MariaDB version prior to 10.0.5 that have indexes using collation IDs 213, 149, 245, 215, 214:

+---------------------+---------+-----+---------+----------+---------+
| Collation           | Charset | Id  | Default | Compiled | Sortlen |
+---------------------+---------+-----+---------+----------+---------+
| utf8_croatian_ci    | utf8    | 213 |         | Yes      |       8 |
| ucs2_croatian_ci    | ucs2    | 149 |         | Yes      |       8 |
| utf8mb4_croatian_ci | utf8mb4 | 245 |         | Yes      |       8 |
| utf16_croatian_ci   | utf16   | 215 |         | Yes      |       8 |
| utf32_croatian_ci   | utf32   | 214 |         | Yes      |       8 |
+---------------------+---------+-----+---------+----------+---------+

ER_TABLE_NEEDS_UPGRADE looks suitable for this purposes:

"Table upgrade required. Please do \"REPAIR TABLE `%-.32s`\" or dump/reload to fix it!"

mysql_upgrade

Monty will try to fix REPAIR to solve the conflicting IDs problem.

quick REPAIR

In long terms we can add a quick REPAIR to replace collation IDs in table definitions in FRM files and in engine-specific structure definitions (e.g. in MYI files for MyISAM) without having to do the full repair for the table.



 Comments   
Comment by Alexander Barkov [ 2013-11-14 ]

Pushed into MariaDB-10.0.6

Generated at Thu Feb 08 07:02:46 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.