[MDEV-27137] command line client : using emoji breaks formatting, due to incorrectly calculated char width Created: 2021-11-29  Updated: 2023-04-27

Status: Open
Project: MariaDB Server
Component/s: Character Sets, Scripts & Clients
Affects Version/s: 10.2, 10.3, 10.4, 10.5, 10.6, 10.7
Fix Version/s: 10.4, 10.5, 10.6

Type: Bug Priority: Major
Reporter: Vladislav Vaintroub Assignee: Alexander Barkov
Resolution: Unresolved Votes: 0
Labels: None

Attachments: PNG File image-2021-11-29-10-28-29-351.png    

 Description   

table is incorrectly formatted, if emoji is used .utf8mb4 is used on client. This is not specific to Windows, it is about numcells charset method, which should be returning 2 (as it takes the same width as Chinese in the example, i.e , double width of a narrow character in monospace font). The "width" of a character is defined in https://unicode.org/Public/UCD/latest/ucd/EastAsianWidth.txt (the name is largely historical, since this file existed back in Unicode 2.0 already, long time before emoji). For example following line is included

1F600..1F64F;W   # So    [80] GRINNING FACE..PERSON WITH FOLDED HANDS

I used 1F600 in the attached example.

Currently, my_numcells_mb() only handles CJK Ideograph Extension B, C , outside of BMP. but nothing in emoji range


Generated at Thu Feb 08 09:50:37 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.