[MDEV-27490] Allow full utf8mb4 for identifiers Created: 2022-01-13  Updated: 2023-12-22

Status: Stalled
Project: MariaDB Server
Component/s: Character Sets
Fix Version/s: 11.5

Type: New Feature Priority: Critical
Reporter: Vladislav Vaintroub Assignee: Alexander Barkov
Resolution: Unresolved Votes: 1
Labels: None

Attachments: PNG File screenshot.png    
Issue Links:
Blocks
is blocked by MDEV-30556 UPPER() returns an empty string for U... Closed
is blocked by MDEV-30577 Case folding for uca1400 collations i... Closed
is blocked by MDEV-30661 UPPER() returns an empty string for U... Closed
is blocked by MDEV-31340 Remove MY_COLLATION_HANDLER::strcasec... In Review
is blocked by MDEV-31531 Remove my_casedn_str() and my_caseup_... In Testing
is blocked by MDEV-31606 Refactor check_db_name() to get a con... Closed
is blocked by MDEV-31972 Change parameter of make_sp_name*() f... Closed
is blocked by MDEV-31978 Turn ok_for_lower_case_names() to a m... Closed
is blocked by MDEV-32002 Remove my_casedn_str() in append_iden... Closed
is blocked by MDEV-32019 Replace my_casedn_str(local_buffer) t... Closed
is blocked by MDEV-32081 Remove my_casedn_str() from get_canon... Closed
Relates
relates to MDEV-19123 Change default charset from latin1 to... Open
relates to MDEV-25829 Change default collation to utf8mb4_u... In Review
relates to MDEV-32904 smiley emoji (F09F9883) valid in utf8... Closed

 Description   

Identifier names can't contain characters outside of BMP, i.e they are restricted to utf8mb3
Here is a relevant part of Slack discussion on why it is so, and on possible fix

... discussion on character_set_system  and why it is utf8mb3...
....
bar Oct 13th, 2021 at 4:23 PM
@wlad yes, it's hard-coded. I think the biggest problem is to implement table-name-to-file-name encoding for non-BMP characters. Should be doable but needs some time.
5 replies
 
wlad  3 months ago
so, a surrogate pair won't do? like, @d801@dc37
 
bar  3 months ago
for characters that do not have lower/upper variants, it will do.
 
bar  3 months ago
It will actually do for characters that have lower/upper variants as well.
 
bar  3 months ago
Thanks for the good idea.


Generated at Thu Feb 08 09:53:18 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.