[MDEV-6596] Unassigned characters are not fully supported in ENUM and SET Created: 2014-08-18  Updated: 2023-11-28

Status: Open
Project: MariaDB Server
Component/s: None
Affects Version/s: 5.3.12, 5.5.39, 10.0.13
Fix Version/s: 10.4, 10.5, 10.6

Type: Bug Priority: Minor
Reporter: Alexander Barkov Assignee: Alexander Barkov
Resolution: Unresolved Votes: 0
Labels: None


 Description   

Start a terminal session using character set big5.
In gnome-terminal:
Terminal -> Character Coding -> Traditional Chinese (big5)

Make sure everything works fine:

LANG=zh_TW.big mysql --default-character-set=big5 --table << END
SET NAMES big5;
SELECT HEX(''),HEX('乂');
END

should return:

+----------+-----------+
| HEX('?') | HEX('乂') |
+----------+-----------+
| C840     | C940      |
+----------+-----------+

If you get a different output, then something is wrong with the terminal
character set settings.

Notice, the character with the Big5 code C840 is unassigned
(does not have a Unicode mapping), while the character with
the Big5 code c940 is assigned.

Now create an ENUM with non-assigned and assigned characters:

LANG=zh_TW.big mysql --default-character-set=big5 --table test << END
SET NAMES big5;
DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (a ENUM('','乂') CHARACTER SET big5);
SHOW CREATE TABLE t1;
INSERT INTO t1 VALUES (''),('乂');
SELECT HEX(a),a FROM t1;
END

The output will be:

+-------+-----------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                    |
+-------+-----------------------------------------------------------------------------------------------------------------+
| t1    | CREATE TABLE `t1` (
  `a` enum('?','乂') CHARACTER SET big5 DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
+-------+-----------------------------------------------------------------------------------------------------------------+
+--------+------+
| HEX(a) | a    |
+--------+------+
| C840   |    |
| C940   | 乂   |
+--------+------+

Notice, the unassigned character got converted to question mark
in the SHOW CREATE output, but INSERT/SELECT actually work fine.

Now dump and restore:

mysqldump --socket=/tmp/mysql.sock test >t1.sql
mysql -e "drop table t1" test
mysql test <t1.sql
mysql -e "select hex(a),a from t1" test

The output will be:

+--------+------+
| hex(a) | a    |
+--------+------+
| 3F     | ?    |
| C940   | 乂   |
+--------+------+

The unassigned character got lost.


Generated at Thu Feb 08 07:13:06 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.