Details
-
Bug
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Fixed
-
10.10(EOL)
-
None
Description
UCA1400 collations (added by MDEV-27009) currently use Unicode-5.2.0 case folding tables.
They should use Unicode-14.0.0 tables instead.
The difference (see attached diff-520-1400.diff) between these two files:
- https://www.unicode.org/Public/5.2.0/ucd/CaseFolding.txt
- https://www.unicode.org/Public/14.0.0/ucd/CaseFolding.txt
shows that a few hundred new case folding mapping pairs where added in these letter scripts:
Cyrillic, Gergian, Cherokee, Glagolitic, Coptic, Latin, Osage, Vithkuqi, Old Hungarian, Warang Citi, Medefaidrin, Adlam.
This SQL script demonstrates the out-dated case folding:
CREATE OR REPLACE TABLE t1 (a VARCHAR(10) CHARACTER SET utf8 COLLATE uca1400_ai_ci); |
# Insert letters appeared in Unicode-6.1 (released in January 2012) |
INSERT INTO t1 VALUES (_ucs2 0xA792) /* U+A792 LATIN CAPITAL LETTER C WITH BAR */; |
INSERT INTO t1 VALUES (_ucs2 0xA793) /* U+A793 LATIN SMALL LETTER C WITH BAR */; |
SELECT HEX(a), HEX(LOWER(a)), HEX(UPPER(a)), a, LOWER(a), UPPER(a) FROM t1; |
+--------+---------------+---------------+------+----------+----------+
|
| HEX(a) | HEX(LOWER(a)) | HEX(UPPER(a)) | a | LOWER(a) | UPPER(a) |
|
+--------+---------------+---------------+------+----------+----------+
|
| EA9E92 | EA9E92 | EA9E92 | Ꞓ | Ꞓ | Ꞓ |
|
| EA9E93 | EA9E93 | EA9E93 | ꞓ | ꞓ | ꞓ |
|
+--------+---------------+---------------+------+----------+----------+
|
The above two characters (first appeared in Unicode-6.1) are expected to map to each other by functions UPPER and LOWER.
Attachments
Issue Links
- blocks
-
MDEV-19123 Change default charset from latin1 to utf8mb4
-
- Closed
-
-
MDEV-25829 Change default Unicode collation to uca1400_ai_ci
-
- Closed
-
-
MDEV-27490 Allow full utf8mb4 for identifiers
-
- Stalled
-
- is blocked by
-
MDEV-30692 conf_to_src is not up to date
-
- Closed
-
-
MDEV-30695 Refactor case folding data types in Asian collation
-
- Closed
-
-
MDEV-30716 Wrong casefolding in xxx_unicode_520_ci for U+0700..U+07FF
-
- Closed
-
-
MDEV-30746 Regression in ucs2_general_mysql500_ci
-
- Closed
-
-
MDEV-31068 Reuse duplicate case conversion code in ctype-utf8.c and ctype-ucs2.c
-
- Closed
-
-
MDEV-31069 Reuse duplicate char-to-weight conversion code in ctype-utf8.c and ctype-ucs2.c
-
- Closed
-
-
MDEV-31071 Refactor case folding data types in Unicode collations
-
- Closed
-
- relates to
-
MDEV-27009 Add UCA-14.0.0 collations
-
- Closed
-
-
MDEV-30661 UPPER() returns an empty string for U+0251 in uca1400 collations for utf8
-
- Closed
-
Activity
Field | Original Value | New Value |
---|---|---|
Link | This issue blocks MDEV-27490 [ MDEV-27490 ] |
Priority | Major [ 3 ] | Critical [ 2 ] |
Link |
This issue relates to |
Link |
This issue relates to |
Description |
UCA1400 collations (added by They should use Unicode-14.0.0 tables instead. The difference (see attached diff-520-1400.diff) between these two files: - https://www.unicode.org/Public/5.2.0/ucd/CaseFolding.txt - https://www.unicode.org/Public/14.0.0/ucd/CaseFolding.txt shows that a few hundred new case folding mapping pairs where added in these letter scripts: Cyrillic, Gergian, Cherokee, Glagolitic, Coptic, Latin, Osage, Vithkuqi, Old Hungarian, Warang Citi, Medefaidrin, Adlam. This SQL script demonstrates the out-dated case folding: {code:sql} CREATE OR REPLACE TABLE t1 (a VARCHAR(10) CHARACTER SET utf8 COLLATE uca1400_ai_ci); # Insert letters appeared in Unicode-6.1 (released in January 2012) INSERT INTO t1 VALUES (_ucs2 0xA792) /* U+A792 LATIN CAPITAL LETTER C WITH BAR */; INSERT INTO t1 VALUES (_ucs2 0xA793) /* U+A793 LATIN SMALL LETTER C WITH BAR */; SELECT HEX(a), HEX(LOWER(a)), HEX(UPPER(a)), a, LOWER(a), UPPER(a) FROM t1; {code} {noformat} +--------+---------------+---------------+------+----------+----------+ | HEX(a) | HEX(LOWER(a)) | HEX(UPPER(a)) | a | LOWER(a) | UPPER(a) | +--------+---------------+---------------+------+----------+----------+ | EA9E92 | EA9E92 | EA9E92 | Ꞓ | Ꞓ | Ꞓ | | EA9E93 | EA9E93 | EA9E93 | ꞓ | ꞓ | ꞓ | +--------+---------------+---------------+------+----------+----------+ {noformat} The above two characters (first appeared in Unicode-6.1) are expected to map to each other by functions UPPER and LOWER. |
UCA1400 collations (added by They should use Unicode-14.0.0 tables instead. The difference (see attached diff-520-1400.diff) between these two files: - https://www.unicode.org/Public/5.2.0/ucd/CaseFolding.txt - https://www.unicode.org/Public/14.0.0/ucd/CaseFolding.txt shows that a few hundred new case folding mapping pairs where added in these letter scripts: Cyrillic, Gergian, Cherokee, Glagolitic, Coptic, Latin, Osage, Vithkuqi, Old Hungarian, Warang Citi, Medefaidrin, Adlam. This SQL script demonstrates the out-dated case folding: {code:sql} CREATE OR REPLACE TABLE t1 (a VARCHAR(10) CHARACTER SET utf8 COLLATE uca1400_ai_ci); # Insert letters appeared in Unicode-6.1 (released in January 2012) INSERT INTO t1 VALUES (_ucs2 0xA792) /* U+A792 LATIN CAPITAL LETTER C WITH BAR */; INSERT INTO t1 VALUES (_ucs2 0xA793) /* U+A793 LATIN SMALL LETTER C WITH BAR */; SELECT HEX(a), HEX(LOWER(a)), HEX(UPPER(a)), a, LOWER(a), UPPER(a) FROM t1; {code} {noformat} +--------+---------------+---------------+------+----------+----------+ | HEX(a) | HEX(LOWER(a)) | HEX(UPPER(a)) | a | LOWER(a) | UPPER(a) | +--------+---------------+---------------+------+----------+----------+ | EA9E92 | EA9E92 | EA9E92 | Ꞓ | Ꞓ | Ꞓ | | EA9E93 | EA9E93 | EA9E93 | ꞓ | ꞓ | ꞓ | +--------+---------------+---------------+------+----------+----------+ {noformat} The above two characters (first appeared in Unicode-6.1) are expected to map to each other by functions UPPER and LOWER. |
Link |
This issue blocks |
Fix Version/s | 10.11 [ 27614 ] |
Link |
This issue relates to |
Status | Open [ 1 ] | In Progress [ 3 ] |
Link |
This issue is blocked by |
Link |
This issue relates to |
Link |
This issue relates to |
Link |
This issue is blocked by |
Link |
This issue is caused by |
Link |
This issue is caused by |
Link |
This issue is blocked by |
Link |
This issue is blocked by |
Link |
This issue is blocked by |
Link |
This issue is blocked by |
Link |
This issue is blocked by |
issue.field.resolutiondate | 2023-04-18 08:20:05.0 | 2023-04-18 08:20:05.322 |
Fix Version/s | 10.10.4 [ 28522 ] | |
Fix Version/s | 10.11.3 [ 28524 ] | |
Fix Version/s | 11.1.1 [ 28704 ] | |
Fix Version/s | 11.0.2 [ 28706 ] | |
Fix Version/s | 10.10 [ 27530 ] | |
Fix Version/s | 10.11 [ 27614 ] | |
Resolution | Fixed [ 1 ] | |
Status | In Progress [ 3 ] | Closed [ 6 ] |
Attachment | diff-500-1400.diff [ 72118 ] |
Link |
This issue blocks |
Link |
This issue relates to |