Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
10.0.4, 5.5.33a
-
None
-
None
Description
There are incompatibilities between some MariaDB and MySQL collations
which we need to solve somehow.
Problems
1.
The utf8_croatian_ci and ucs2_croatian_ci collations appeared in MariaDB-5.1 in the end of 2009, based on Alexander Barkov's patch from: http://collation-charts.org/articles/croatian.htm
Later, the Croatian collations were added into MySQL-5.6.
Still, MariaDB Croatian collation uses the latest version of the rules from http://unicode.org/cldr/trac/browser/trunk/common/collation/hr.xml while MySQL implements the older version.
The difference is in 3 letters only. But it's enough to make the indexes incompatible.
As a effect:
- utf8_croatian_ci (ID 213) is different in MariaDB and MySQL
- ucs2_croatian_ci (ID 149) is different in MariaDB and MySQL
2.
Later, MySQL-5.5 added support for utf8mb4, utf16, utf32. When merging the new character sets (MySQL-5.5 -> MariaDB-5.5) the MariaDB team added the following corresponding collations, for symmetry with utf8 and ucs2:
- utf8mb4_croatian_ci (ID=245)
- utf16_croatian_ci (ID=215)
- utf32_croatian_ci (ID=214)
But when the collations with the same names finally appeared in MySQL-5.6, they were given different IDs. So the IDs 215, 215, 245 are assigned in MySQL-5.6 to something else.
This is what we have in MariaDB:
mysql> SELECT COLLATION_NAME, ID FROM INFORMATION_SCHEMA.COLLATIONS
|
--> WHERE COLLATION_NAME LIKE 'u%croat%';
|
+---------------------+-----+
|
| COLLATION_NAME | ID |
|
+---------------------+-----+
|
| ucs2_croatian_ci | 149 |
|
| utf8_croatian_ci | 213 |
|
| utf32_croatian_ci | 214 |
|
| utf16_croatian_ci | 215 |
|
| utf8mb4_croatian_ci | 245 |
|
+---------------------+-----+
|
5 rows in set (0.01 sec)
|
This is what we have in MySQL-5.6:
mysql> SELECT COLLATION_NAME, ID FROM INFORMATION_SCHEMA.COLLATIONS
|
--> WHERE ID IN (149,213,214,215,245);
|
+---------------------+-----+
|
| COLLATION_NAME | ID | Problem:
|
+---------------------+-----+
|
| ucs2_croatian_ci | 149 | MySQL rules differ from MariaDB rules
|
| utf8_croatian_ci | 213 | MySQL rules differ from MariaDB rules
|
| utf8_unicode_520_ci | 214 | MariaDB utf32_croatian_ci
|
| utf8_vietnamese_ci | 215 | MariaDB utf16_croatian_ci
|
| utf8mb4_croatian_ci | 245 | MySQL rules differ from MariaDB rules
|
+---------------------+-----+
|
Solution
Collation changes
- Bar moves MariaDB-5.5 xxx_croatian_ci collations to new IDs (preferrably, outside of the 0..255 range), without changing the collation name.
- Bar merges MySQL-5.6 xxx_croatian_ci using MySQL-5.6 IDs, but changing the names to xxx_croatian_mysql56_ci.
Detect attempts to open tables with the old MariaDB collations.
Bar fixes TABLE_SHARE::init_from_binary_frm_image() and adds an error message for a table created by any MariaDB version prior to 10.0.5 that have indexes using collation IDs 213, 149, 245, 215, 214:
+---------------------+---------+-----+---------+----------+---------+
|
| Collation | Charset | Id | Default | Compiled | Sortlen |
|
+---------------------+---------+-----+---------+----------+---------+
|
| utf8_croatian_ci | utf8 | 213 | | Yes | 8 |
|
| ucs2_croatian_ci | ucs2 | 149 | | Yes | 8 |
|
| utf8mb4_croatian_ci | utf8mb4 | 245 | | Yes | 8 |
|
| utf16_croatian_ci | utf16 | 215 | | Yes | 8 |
|
| utf32_croatian_ci | utf32 | 214 | | Yes | 8 |
|
+---------------------+---------+-----+---------+----------+---------+
|
ER_TABLE_NEEDS_UPGRADE looks suitable for this purposes:
"Table upgrade required. Please do \"REPAIR TABLE `%-.32s`\" or dump/reload to fix it!"
|
mysql_upgrade
Monty will try to fix REPAIR to solve the conflicting IDs problem.
quick REPAIR
In long terms we can add a quick REPAIR to replace collation IDs in table definitions in FRM files and in engine-specific structure definitions (e.g. in MYI files for MyISAM) without having to do the full repair for the table.
Attachments
Issue Links
- relates to
-
MDEV-16945 main.ctype_upgrade failed in buildbot, error upon mysql_upgrade
-
- Open
-
Activity
Field | Original Value | New Value |
---|---|---|
Description |
There are incompatibilities between some MariaDB and MySQL collations which we need to solve somehow. h2. Problems h3. 1. The utf8_croatian_ci and ucs2_croatian_ci collations appeared in MariaDB-5.1 in the end of 2009, based on Alexander Barkov's patch from: http://collation-charts.org/articles/croatian.htm Later, the Croation collations were added into MySQL-5.6. Still, MariaDB croatian collation uses the latest version of the rules from http://unicode.org/cldr/trac/browser/trunk/common/collation/hr.xml while MySQL implements the older version. The difference is in 3 letters only. But it's enough to make the indexes incompatible. As a effect: - utf8_croatian_ci (ID 213) is different in MariaDB and MySQL - ucs2_croatian_ci (ID 149) is different in MariaDB and MySQL h3. 2. Later, MySQL-5.5 added support for utf8mb4, utf16, utf32. When merging the new character sets (MySQL-5.5 -> MariaDB-5.5) the MariaDB team added the following corresponding collations, for symmetry with utf8 and ucs2: - utf8mb4_croatian_ci (ID=245) - utf16_croatian_ci (ID=215) - utf32_croatian_ci (ID=214) But when the collations with the same names finally appeared in MySQL-5.6, they were given different IDs. So the IDs 215, 215, 245 are assigned in MySQL-5.6 to something else. This is what we have in MariaDB: {noformat} mysql> SELECT COLLATION_NAME, ID FROM INFORMATION_SCHEMA.COLLATIONS --> WHERE COLLATION_NAME LIKE 'u%croat%'; +---------------------+-----+ | COLLATION_NAME | ID | +---------------------+-----+ | ucs2_croatian_ci | 149 | | utf8_croatian_ci | 213 | | utf32_croatian_ci | 214 | | utf16_croatian_ci | 215 | | utf8mb4_croatian_ci | 245 | +---------------------+-----+ 5 rows in set (0.01 sec) {noformat} This is what we have in MySQL-5.6: {noformat} mysql> SELECT COLLATION_NAME, ID FROM INFORMATION_SCHEMA.COLLATIONS --> WHERE ID IN (149,213,214,215,245); +---------------------+-----+ | COLLATION_NAME | ID | Problem: +---------------------+-----+ | ucs2_croatian_ci | 149 | MySQL rules differ from MariaDB rules | utf8_croatian_ci | 213 | MySQL rules differ from MariaDB rules | utf8_unicode_520_ci | 214 | MariaDB utf32_croatian_ci | utf8_vietnamese_ci | 215 | MariaDB utf16_croatian_ci | utf8mb4_croatian_ci | 245 | MySQL rules differ from MariaDB rules +---------------------+-----+ {noformat} h2. Solution h3. Collation changes - Bar moves MariaDB-5.5 xxx_croatian_ci collations to new IDs (preferrably, outside of the 0..255 range), without changing the collation name. - Bar merges MySQL-5.6 xxx_croatian_ci using MySQL-5.6 IDs, but changing the names to xxx_croatian_mysql56_ci. h3. Detect attempts to open tables with the old MariaDB collations. Bar fixes {{TABLE_SHARE::init_from_binary_frm_image()}} and adds an error message for a table created by any MariaDB version prior to 10.0.5 that have indexes using collation IDs 213, 149, 245, 215, 214: {noformat} +---------------------+---------+-----+---------+----------+---------+ | Collation | Charset | Id | Default | Compiled | Sortlen | +---------------------+---------+-----+---------+----------+---------+ | utf8_croatian_ci | utf8 | 213 | | Yes | 8 | | ucs2_croatian_ci | ucs2 | 149 | | Yes | 8 | | utf8mb4_croatian_ci | utf8mb4 | 245 | | Yes | 8 | | utf16_croatian_ci | utf16 | 215 | | Yes | 8 | | utf32_croatian_ci | utf32 | 214 | | Yes | 8 | +---------------------+---------+-----+---------+----------+---------+ {noformat} ER_TABLE_NEEDS_UPGRADE looks suitable for this purposes: {noformat} "Table upgrade required. Please do \"REPAIR TABLE `%-.32s`\" or dump/reload to fix it!" {noformat} h3. mysql_upgrade Monty will try to fix REPAIR to solve the conflicting IDs problem. h3. quick REPAIR In long terms we can add a quick REPAIR to replace collation IDs in table definitions in FRM files and in engine-specific structure definitions (e.g. in MYI files for MyISAM) without having to do the full repair for the table. |
There are incompatibilities between some MariaDB and MySQL collations which we need to solve somehow. h2. Problems h3. 1. The utf8_croatian_ci and ucs2_croatian_ci collations appeared in MariaDB-5.1 in the end of 2009, based on Alexander Barkov's patch from: http://collation-charts.org/articles/croatian.htm Later, the Croatian collations were added into MySQL-5.6. Still, MariaDB Croatian collation uses the latest version of the rules from http://unicode.org/cldr/trac/browser/trunk/common/collation/hr.xml while MySQL implements the older version. The difference is in 3 letters only. But it's enough to make the indexes incompatible. As a effect: - utf8_croatian_ci (ID 213) is different in MariaDB and MySQL - ucs2_croatian_ci (ID 149) is different in MariaDB and MySQL h3. 2. Later, MySQL-5.5 added support for utf8mb4, utf16, utf32. When merging the new character sets (MySQL-5.5 -> MariaDB-5.5) the MariaDB team added the following corresponding collations, for symmetry with utf8 and ucs2: - utf8mb4_croatian_ci (ID=245) - utf16_croatian_ci (ID=215) - utf32_croatian_ci (ID=214) But when the collations with the same names finally appeared in MySQL-5.6, they were given different IDs. So the IDs 215, 215, 245 are assigned in MySQL-5.6 to something else. This is what we have in MariaDB: {noformat} mysql> SELECT COLLATION_NAME, ID FROM INFORMATION_SCHEMA.COLLATIONS --> WHERE COLLATION_NAME LIKE 'u%croat%'; +---------------------+-----+ | COLLATION_NAME | ID | +---------------------+-----+ | ucs2_croatian_ci | 149 | | utf8_croatian_ci | 213 | | utf32_croatian_ci | 214 | | utf16_croatian_ci | 215 | | utf8mb4_croatian_ci | 245 | +---------------------+-----+ 5 rows in set (0.01 sec) {noformat} This is what we have in MySQL-5.6: {noformat} mysql> SELECT COLLATION_NAME, ID FROM INFORMATION_SCHEMA.COLLATIONS --> WHERE ID IN (149,213,214,215,245); +---------------------+-----+ | COLLATION_NAME | ID | Problem: +---------------------+-----+ | ucs2_croatian_ci | 149 | MySQL rules differ from MariaDB rules | utf8_croatian_ci | 213 | MySQL rules differ from MariaDB rules | utf8_unicode_520_ci | 214 | MariaDB utf32_croatian_ci | utf8_vietnamese_ci | 215 | MariaDB utf16_croatian_ci | utf8mb4_croatian_ci | 245 | MySQL rules differ from MariaDB rules +---------------------+-----+ {noformat} h2. Solution h3. Collation changes - Bar moves MariaDB-5.5 xxx_croatian_ci collations to new IDs (preferrably, outside of the 0..255 range), without changing the collation name. - Bar merges MySQL-5.6 xxx_croatian_ci using MySQL-5.6 IDs, but changing the names to xxx_croatian_mysql56_ci. h3. Detect attempts to open tables with the old MariaDB collations. Bar fixes {{TABLE_SHARE::init_from_binary_frm_image()}} and adds an error message for a table created by any MariaDB version prior to 10.0.5 that have indexes using collation IDs 213, 149, 245, 215, 214: {noformat} +---------------------+---------+-----+---------+----------+---------+ | Collation | Charset | Id | Default | Compiled | Sortlen | +---------------------+---------+-----+---------+----------+---------+ | utf8_croatian_ci | utf8 | 213 | | Yes | 8 | | ucs2_croatian_ci | ucs2 | 149 | | Yes | 8 | | utf8mb4_croatian_ci | utf8mb4 | 245 | | Yes | 8 | | utf16_croatian_ci | utf16 | 215 | | Yes | 8 | | utf32_croatian_ci | utf32 | 214 | | Yes | 8 | +---------------------+---------+-----+---------+----------+---------+ {noformat} ER_TABLE_NEEDS_UPGRADE looks suitable for this purposes: {noformat} "Table upgrade required. Please do \"REPAIR TABLE `%-.32s`\" or dump/reload to fix it!" {noformat} h3. mysql_upgrade Monty will try to fix REPAIR to solve the conflicting IDs problem. h3. quick REPAIR In long terms we can add a quick REPAIR to replace collation IDs in table definitions in FRM files and in engine-specific structure definitions (e.g. in MYI files for MyISAM) without having to do the full repair for the table. |
Resolution | Fixed [ 1 ] | |
Status | Open [ 1 ] | Closed [ 6 ] |
Workflow | defaullt [ 29611 ] | MariaDB v2 [ 43494 ] |
Workflow | MariaDB v2 [ 43494 ] | MariaDB v3 [ 62611 ] |
Link | This issue relates to MDEV-16945 [ MDEV-16945 ] |
Workflow | MariaDB v3 [ 62611 ] | MariaDB v4 [ 147198 ] |