Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-6596

Unassigned characters are not fully supported in ENUM and SET

    XMLWordPrintable

Details

    • Bug
    • Status: Open (View Workflow)
    • Minor
    • Resolution: Unresolved
    • 5.3.12, 5.5.39, 10.0.13
    • 10.4, 10.5, 10.6
    • None
    • None

    Description

      Start a terminal session using character set big5.
      In gnome-terminal:
      Terminal -> Character Coding -> Traditional Chinese (big5)

      Make sure everything works fine:

      LANG=zh_TW.big mysql --default-character-set=big5 --table << END
      SET NAMES big5;
      SELECT HEX(''),HEX('乂');
      END
      

      should return:

      +----------+-----------+
      | HEX('?') | HEX('乂') |
      +----------+-----------+
      | C840     | C940      |
      +----------+-----------+
      

      If you get a different output, then something is wrong with the terminal
      character set settings.

      Notice, the character with the Big5 code C840 is unassigned
      (does not have a Unicode mapping), while the character with
      the Big5 code c940 is assigned.

      Now create an ENUM with non-assigned and assigned characters:

      LANG=zh_TW.big mysql --default-character-set=big5 --table test << END
      SET NAMES big5;
      DROP TABLE IF EXISTS t1;
      CREATE TABLE t1 (a ENUM('','乂') CHARACTER SET big5);
      SHOW CREATE TABLE t1;
      INSERT INTO t1 VALUES (''),('乂');
      SELECT HEX(a),a FROM t1;
      END
      

      The output will be:

      +-------+-----------------------------------------------------------------------------------------------------------------+
      | Table | Create Table                                                                                                    |
      +-------+-----------------------------------------------------------------------------------------------------------------+
      | t1    | CREATE TABLE `t1` (
        `a` enum('?','乂') CHARACTER SET big5 DEFAULT NULL
      ) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
      +-------+-----------------------------------------------------------------------------------------------------------------+
      +--------+------+
      | HEX(a) | a    |
      +--------+------+
      | C840   |    |
      | C940   | 乂   |
      +--------+------+
      

      Notice, the unassigned character got converted to question mark
      in the SHOW CREATE output, but INSERT/SELECT actually work fine.

      Now dump and restore:

      mysqldump --socket=/tmp/mysql.sock test >t1.sql
      mysql -e "drop table t1" test
      mysql test <t1.sql
      mysql -e "select hex(a),a from t1" test
      

      The output will be:

      +--------+------+
      | hex(a) | a    |
      +--------+------+
      | 3F     | ?    |
      | C940   | 乂   |
      +--------+------+
      

      The unassigned character got lost.

      Attachments

        Activity

          People

            bar Alexander Barkov
            bar Alexander Barkov
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.