[MDEV-28078] Garbage on multiple equal ENUMs with tricky character sets Created: 2022-03-16  Updated: 2022-06-16  Resolved: 2022-03-18

Status: Closed
Project: MariaDB Server
Component/s: Character Sets
Affects Version/s: 10.2, 10.3, 10.4, 10.5, 10.6, 10.7, 10.8
Fix Version/s: 10.2.44, 10.3.35, 10.4.25, 10.5.16, 10.6.8, 10.7.4, 10.8.3

Type: Bug Priority: Critical
Reporter: Alexander Barkov Assignee: Alexander Barkov
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Blocks
blocks MDEV-28062 Assertion `(length % 4) == 0' failed ... Closed
Duplicate
is duplicated by MDEV-28062 Assertion `(length % 4) == 0' failed ... Closed
Relates
relates to MDEV-28498 Incorrect information in file: './tes... In Review

 Description   

I create a table with two similar ENUM columns, both using CHARACTER SET utf32:

DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (
  c1 ENUM ('a','b') CHARACTER SET utf32 DEFAULT 'a',
  c2 ENUM ('a','b') CHARACTER SET utf32 DEFAULT 'a' 
);
SHOW CREATE TABLE t1;

+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                                                                |
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| t1    | CREATE TABLE `t1` (
  `c1` enum('??','??') CHARACTER SET utf32 DEFAULT '??',
  `c2` enum('??','??') CHARACTER SET utf32 DEFAULT '??'
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Notice, the SHOW CREATE returns garbage instead of ENUM values.

The problem happens in this piece of the code in table.cc:

    if (interval_nr && charset->mbminlen > 1)
    {
      /* Unescape UCS2 intervals from HEX notation */
      TYPELIB *interval= share->intervals + interval_nr - 1;
      unhex_type2(interval);

As the two TYPELIBs are equal, only one copy of this TYPELIB is stored in the FRM file. But unhex_type() is called two times.

Note, TYPELIBs for tricky character sets like utf32 are stored in HEX notation. So the same problem is repeatable if I use a latin1 ENUM column whose values are equal to HEX representations of the utf32 ENUM column:

DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (
  c1 ENUM ('00000061','00000062') DEFAULT '00000061' COLLATE latin1_bin,
  c2 ENUM ('a','b') DEFAULT 'a' COLLATE utf32_general_ci
);
SHOW CREATE TABLE t1;

+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                                                                                                |
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| t1    | CREATE TABLE `t1` (
  `c1` enum('\0\0\0a','\0\0\0b') CHARACTER SET latin1 COLLATE latin1_bin DEFAULT '\0\0\0a',
  `c2` enum('a','b') CHARACTER SET utf32 DEFAULT 'a'
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

As in the previous example, only one copy of the TYPELIB is stored in the frm file (because they are binary equal).
But the unhex_type2() is called for this TYPELIB to unescape the utf32 column value. But the latin1 columns points to the same TYPELIB.



 Comments   
Comment by Alexey Botchkov [ 2022-03-16 ]

discussed on slack.

Comment by Alexey Botchkov [ 2022-03-17 ]

ok to push with the recent corrections

Generated at Thu Feb 08 09:57:53 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.