[MDEV-32904] smiley emoji (F09F9883) valid in utf8 but not utf8mb4 Created: 2023-11-28 Updated: 2023-12-02 Resolved: 2023-11-29 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Character Sets |
| Affects Version/s: | 10.2.44, 10.4.32, 10.6.16 |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Major |
| Reporter: | Daniel Black | Assignee: | Alexander Barkov |
| Resolution: | Not a Bug | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||
| Description |
|
Our smiley emoj goes to question mark in utf8mb4 but ok in mb3.
|
| Comments |
| Comment by Alexander Barkov [ 2023-11-29 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Can you please paste send the output of:
I think there's something wrong with @@character_set_connection. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Alexander Barkov [ 2023-11-29 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
It's better to use SET NAMES utf8mb4 instead of setting @@character_set_{client|connection|results} directly and separately from each other.
Looks like this issue should be closed as Not a Bug. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Black [ 2023-11-29 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Alexander Barkov [ 2023-11-29 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I can see no bugs. Works with utf8mb4 as expected. Closed as Not a Bug. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Black [ 2023-11-30 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
So not even a warning generated for the truncation? The title under set names utf8mb4 is "hex('?')". Under character_set_results=utf8mb4 the hex form of column is displayed like below:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Alexander Barkov [ 2023-12-02 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The warning is not needed. The truncation does not happen in the data. Have a look into my previous comment:
It correctly returns the 4-byte utf8 character 0xF09F9883 But the truncation does happen in the column title as identifiers do not support supplementaty characters yet. This will be fixed by MDEV-27490. |