[MDEV-30577] Case folding for uca1400 collations is not up to date - Jira

XML

Word

Printable

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Affects Version/s: 10.10(EOL)
Fix Version/s: 10.10.4, 10.11.3, 11.0.2, 11.1.1
Component/s: Character Sets
Labels:
None

Description

UCA1400 collations (added by ~~MDEV-27009~~) currently use Unicode-5.2.0 case folding tables.

They should use Unicode-14.0.0 tables instead.

The difference (see attached diff-520-1400.diff) between these two files:

shows that a few hundred new case folding mapping pairs where added in these letter scripts:

Cyrillic, Gergian, Cherokee, Glagolitic, Coptic, Latin, Osage, Vithkuqi, Old Hungarian, Warang Citi, Medefaidrin, Adlam.

This SQL script demonstrates the out-dated case folding:

CREATE OR REPLACE TABLE t1 (a VARCHAR(10) CHARACTER SET utf8 COLLATE uca1400_ai_ci);

# Insert letters appeared in Unicode-6.1 (released in January 2012)

INSERT INTO t1 VALUES (_ucs2 0xA792) /* U+A792 LATIN CAPITAL LETTER C WITH BAR */;

INSERT INTO t1 VALUES (_ucs2 0xA793) /* U+A793 LATIN SMALL LETTER C WITH BAR */;

SELECT HEX(a), HEX(LOWER(a)), HEX(UPPER(a)), a, LOWER(a), UPPER(a) FROM t1;

+--------+---------------+---------------+------+----------+----------+

| HEX(a) | HEX(LOWER(a)) | HEX(UPPER(a)) | a    | LOWER(a) | UPPER(a) |

+--------+---------------+---------------+------+----------+----------+

| EA9E92 | EA9E92        | EA9E92        | Ꞓ    | Ꞓ        | Ꞓ        |

| EA9E93 | EA9E93        | EA9E93        | ꞓ    | ꞓ        | ꞓ        |

+--------+---------------+---------------+------+----------+----------+

The above two characters (first appeared in Unicode-6.1) are expected to map to each other by functions UPPER and LOWER.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

diff-500-1400.diff
24 kB
2023-09-27 07:44

Issue Links

blocks

MDEV-19123 Change default charset from latin1 to utf8mb4

Closed

MDEV-25829 Change default Unicode collation to uca1400_ai_ci

Closed

MDEV-27490 Allow full utf8mb4 for identifiers

Stalled

is blocked by

MDEV-30692 conf_to_src is not up to date

Closed

MDEV-30695 Refactor case folding data types in Asian collation

Closed

MDEV-30716 Wrong casefolding in xxx_unicode_520_ci for U+0700..U+07FF

Closed

MDEV-30746 Regression in ucs2_general_mysql500_ci

Closed

MDEV-31068 Reuse duplicate case conversion code in ctype-utf8.c and ctype-ucs2.c

Closed

MDEV-31069 Reuse duplicate char-to-weight conversion code in ctype-utf8.c and ctype-ucs2.c

Closed

MDEV-31071 Refactor case folding data types in Unicode collations

Closed

relates to

MDEV-27009 Add UCA-14.0.0 collations

Closed

MDEV-30661 UPPER() returns an empty string for U+0251 in uca1400 collations for utf8

Closed

(5 is blocked by, 2 relates to)

Activity

People

Assignee:: Alexander Barkov

Reporter:: Alexander Barkov

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 2023-02-06 06:11

Updated:: 2025-10-27 15:33

Resolved:: 2023-04-18 08:20

Time Tracking

Estimated:

Remaining:

Logged:

6d 4.5h

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.