[MDEV-7661] Unexpected result for: CAST(0xHHHH AS CHAR CHARACTER SET xxx) for incorrect byte sequences Created: 2015-03-04  Updated: 2017-09-15  Resolved: 2015-03-18

Status: Closed
Project: MariaDB Server
Component/s: Character Sets
Affects Version/s: None
Fix Version/s: 10.1.4

Type: Bug Priority: Major
Reporter: Alexander Barkov Assignee: Alexander Barkov
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Blocks
is blocked by MDEV-6566 Different INSERT behaviour on bad byt... Closed

 Description   

This result is wrong:

mysql> SELECT HEX(CAST(0xA341 AS CHAR CHARACTER SET gb2312));
+------------------------------------------------+
| HEX(CAST(0xA341 AS CHAR CHARACTER SET gb2312)) |
+------------------------------------------------+
| A341                                           |
+------------------------------------------------+
1 row in set (1.15 sec)

0xA341 is not a well formed gb2312 byte sequence.

mysql> SELECT _gb2312 0xA341;
ERROR 1300 (HY000): Invalid gb2312 character string: 'A341'

0xA3 is a multi-byte head, but it is not followed by a valid multi-byte tails.
The expected result would be to replace the bad byte 0xA3 to '?' and return 0x3F41.

Additionally, badly formed sequences are converted to something strange during
character set conversion:

mysql> SELECT HEX(CONVERT(CAST(0xA341 AS CHAR CHARACTER SET gb2312) USING utf16));
+---------------------------------------------------------------------+
| HEX(CONVERT(CAST(0xA341 AS CHAR CHARACTER SET gb2312) USING utf16)) |
+---------------------------------------------------------------------+
| FF21                                                                |
+---------------------------------------------------------------------+
1 row in set (0.00 sec)

A341 was converted to "U+FF21 FULLWIDTH LATIN CAPITAL LETTER A", which is wrong.

It seems A341 was erroneously taken as A3C1, which is the correct gb2312 for U+FF21.


Generated at Thu Feb 08 07:21:19 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.