Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-28078

Garbage on multiple equal ENUMs with tricky character sets

Details

    Description

      I create a table with two similar ENUM columns, both using CHARACTER SET utf32:

      DROP TABLE IF EXISTS t1;
      CREATE TABLE t1 (
        c1 ENUM ('a','b') CHARACTER SET utf32 DEFAULT 'a',
        c2 ENUM ('a','b') CHARACTER SET utf32 DEFAULT 'a' 
      );
      SHOW CREATE TABLE t1;
      

      +-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
      | Table | Create Table                                                                                                                                                                |
      +-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
      | t1    | CREATE TABLE `t1` (
        `c1` enum('??','??') CHARACTER SET utf32 DEFAULT '??',
        `c2` enum('??','??') CHARACTER SET utf32 DEFAULT '??'
      ) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
      +-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
      

      Notice, the SHOW CREATE returns garbage instead of ENUM values.

      The problem happens in this piece of the code in table.cc:

          if (interval_nr && charset->mbminlen > 1)
          {
            /* Unescape UCS2 intervals from HEX notation */
            TYPELIB *interval= share->intervals + interval_nr - 1;
            unhex_type2(interval);
      

      As the two TYPELIBs are equal, only one copy of this TYPELIB is stored in the FRM file. But unhex_type() is called two times.

      Note, TYPELIBs for tricky character sets like utf32 are stored in HEX notation. So the same problem is repeatable if I use a latin1 ENUM column whose values are equal to HEX representations of the utf32 ENUM column:

      DROP TABLE IF EXISTS t1;
      CREATE TABLE t1 (
        c1 ENUM ('00000061','00000062') DEFAULT '00000061' COLLATE latin1_bin,
        c2 ENUM ('a','b') DEFAULT 'a' COLLATE utf32_general_ci
      );
      SHOW CREATE TABLE t1;
      

      +-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
      | Table | Create Table                                                                                                                                                                                                |
      +-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
      | t1    | CREATE TABLE `t1` (
        `c1` enum('\0\0\0a','\0\0\0b') CHARACTER SET latin1 COLLATE latin1_bin DEFAULT '\0\0\0a',
        `c2` enum('a','b') CHARACTER SET utf32 DEFAULT 'a'
      ) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
      +-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
      

      As in the previous example, only one copy of the TYPELIB is stored in the frm file (because they are binary equal).
      But the unhex_type2() is called for this TYPELIB to unescape the utf32 column value. But the latin1 columns points to the same TYPELIB.

      Attachments

        Issue Links

          Activity

            discussed on slack.

            holyfoot Alexey Botchkov added a comment - discussed on slack.

            ok to push with the recent corrections

            holyfoot Alexey Botchkov added a comment - ok to push with the recent corrections

            People

              bar Alexander Barkov
              bar Alexander Barkov
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.