[MDEV-30661] UPPER() returns an empty string for U+0251 in uca1400 collations for utf8 Created: 2023-02-16  Updated: 2023-05-08  Resolved: 2023-02-17

Status: Closed
Project: MariaDB Server
Component/s: Character Sets
Affects Version/s: 10.10, 10.11
Fix Version/s: 10.11.3, 11.0.2, 10.10.4

Type: Bug Priority: Major
Reporter: Alexander Barkov Assignee: Alexander Barkov
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Blocks
blocks MDEV-19123 Change default charset from latin1 to... Open
blocks MDEV-25829 Change default collation to utf8mb4_u... In Review
blocks MDEV-27490 Allow full utf8mb4 for identifiers Stalled
Problem/Incident
causes MCOL-5437 columnstore fails to compile to due o... Closed
Relates
relates to MDEV-30577 Case folding for uca1400 collations i... Closed
relates to MDEV-30556 UPPER() returns an empty string for U... Closed

 Description   

The problem described in MDEV-30556 is repeatable in 10.10 with uca1400 collations:

CREATE OR REPLACE TABLE bad_case_folding
(
  code INT NOT NULL,
  c VARCHAR(32) CHARACTER SET utf8mb4 COLLATE uca1400_ai_ci NOT NULL
);
DELIMITER $$
 
FOR code IN 0..0x10FFFF
DO
  BEGIN
    DECLARE str TEXT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_520_ci DEFAULT CHAR(code USING utf32);
    IF LENGTH(LOWER(str))=0 OR LENGTH(UPPER(str))=0 THEN
      INSERT INTO bad_case_folding VALUES (code, str);
    END IF;
  END;
END FOR;
$$
DELIMITER ;
 
SELECT HEX(code), HEX(LOWER(c)), HEX(UPPER(c)), c FROM bad_case_folding;

+-----------+---------------+---------------+----+
| HEX(code) | HEX(LOWER(c)) | HEX(UPPER(c)) | c  |
+-----------+---------------+---------------+----+
| 23A       |               | C8BA          | Ⱥ  |
| 23E       |               | C8BE          | Ⱦ  |
| 23F       | C8BF          |               | ȿ  |
| 240       | C980          |               | ɀ  |
| 250       | C990          |               | ɐ  |
| 251       | C991          |               | ɑ  |
| 252       | C992          |               | ɒ  |
| 26B       | C9AB          |               | ɫ  |
| 271       | C9B1          |               | ɱ  |
| 27D       | C9BD          |               | ɽ  |
+-----------+---------------+---------------+----+

Or a faster test version:

CREATE OR REPLACE TABLE bad_case_folding
(
  code INT NOT NULL,
  c VARCHAR(32) CHARACTER SET utf8mb4 COLLATE uca1400_ai_ci NOT NULL DEFAULT ''
);
INSERT INTO bad_case_folding (code) VALUES (0x23A),(0x23E),(0x23F),(0x240),(0x250),(0x251),(0x252),(0x26B),(0x271),(0x27D);
UPDATE bad_case_folding SET c=CHAR(code USING utf32);
SELECT HEX(code), HEX(LOWER(c)), HEX(UPPER(c)), c FROM bad_case_folding ORDER BY code;

+-----------+---------------+---------------+----+
| HEX(code) | HEX(LOWER(c)) | HEX(UPPER(c)) | c  |
+-----------+---------------+---------------+----+
| 23A       |               | C8BA          | Ⱥ  |
| 23E       |               | C8BE          | Ⱦ  |
| 23F       | C8BF          |               | ȿ  |
| 240       | C980          |               | ɀ  |
| 250       | C990          |               | ɐ  |
| 251       | C991          |               | ɑ  |
| 252       | C992          |               | ɒ  |
| 26B       | C9AB          |               | ɫ  |
| 271       | C9B1          |               | ɱ  |
| 27D       | C9BD          |               | ɽ  |
+-----------+---------------+---------------+----+


Generated at Thu Feb 08 10:17:56 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.