[MDEV-28078] Garbage on multiple equal ENUMs with tricky character sets - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Affects Version/s: 10.2(EOL), 10.3(EOL), 10.4(EOL), 10.5, 10.6, 10.7(EOL), 10.8(EOL)
Fix Version/s: 10.2.44, 10.3.35, 10.4.25, 10.5.16, 10.6.8, 10.7.4, 10.8.3
Component/s: Character Sets
Labels:
None

Description

I create a table with two similar ENUM columns, both using CHARACTER SET utf32:

DROP TABLE IF EXISTS t1;

CREATE TABLE t1 (

  c1 ENUM ('a','b') CHARACTER SET utf32 DEFAULT 'a',

  c2 ENUM ('a','b') CHARACTER SET utf32 DEFAULT 'a'

);

SHOW CREATE TABLE t1;

+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| Table | Create Table                                                                                                                                                                |

+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| t1    | CREATE TABLE `t1` (

  `c1` enum('??','??') CHARACTER SET utf32 DEFAULT '??',

  `c2` enum('??','??') CHARACTER SET utf32 DEFAULT '??'

) ENGINE=InnoDB DEFAULT CHARSET=latin1 |

+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Notice, the SHOW CREATE returns garbage instead of ENUM values.

The problem happens in this piece of the code in table.cc:

    if (interval_nr && charset->mbminlen > 1)

      /* Unescape UCS2 intervals from HEX notation */

      TYPELIB *interval= share->intervals + interval_nr - 1;

      unhex_type2(interval);

As the two TYPELIBs are equal, only one copy of this TYPELIB is stored in the FRM file. But unhex_type() is called two times.

Note, TYPELIBs for tricky character sets like utf32 are stored in HEX notation. So the same problem is repeatable if I use a latin1 ENUM column whose values are equal to HEX representations of the utf32 ENUM column:

DROP TABLE IF EXISTS t1;

CREATE TABLE t1 (

  c1 ENUM ('00000061','00000062') DEFAULT '00000061' COLLATE latin1_bin,

  c2 ENUM ('a','b') DEFAULT 'a' COLLATE utf32_general_ci

);

SHOW CREATE TABLE t1;

+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| Table | Create Table                                                                                                                                                                                                |

+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| t1    | CREATE TABLE `t1` (

  `c1` enum('\0\0\0a','\0\0\0b') CHARACTER SET latin1 COLLATE latin1_bin DEFAULT '\0\0\0a',

  `c2` enum('a','b') CHARACTER SET utf32 DEFAULT 'a'

) ENGINE=InnoDB DEFAULT CHARSET=latin1 |

+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

As in the previous example, only one copy of the TYPELIB is stored in the frm file (because they are binary equal).
But the unhex_type2() is called for this TYPELIB to unescape the utf32 column value. But the latin1 columns points to the same TYPELIB.

Attachments

Issue Links

blocks

MDEV-28062 Assertion `(length % 4) == 0' failed in my_lengthsp_utf32 on INSERT..SELECT

Closed

is duplicated by

MDEV-28062 Assertion `(length % 4) == 0' failed in my_lengthsp_utf32 on INSERT..SELECT

Closed

relates to

MDEV-28498 Incorrect information in file: './test/t0.frm' on CREATE TABLE

In Review

Activity

Ascending order - Click to sort in descending order

Alexander Barkov created issue - 2022-03-16 07:53

Alexander Barkov made changes - 2022-03-16 07:54

Field	Original Value	New Value
Link		This issue blocks ~~MDEV-28062~~ [ ~~MDEV-28062~~ ]

Alexander Barkov made changes - 2022-03-16 08:02

Priority

Major [ 3 ]

Critical [ 2 ]

Alexander Barkov made changes - 2022-03-16 10:06

Description

I create a table with two similar ENUM columns, both using CHARACTER SET utf32:
{code:sql}
SET sql_mode='';
DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (
  ENABLED ENUM ('a','b') CHARACTER SET utf32 DEFAULT 'a',
  HISTORY ENUM ('a','b') CHARACTER SET utf32 DEFAULT 'a'
);
SHOW CREATE TABLE t1;
{code}
{noformat}
+-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| t1 | CREATE TABLE `t1` (
  `ENABLED` enum('??','??') CHARACTER SET utf32 DEFAULT '??',
  `HISTORY` enum('??','??') CHARACTER SET utf32 DEFAULT '??'
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
+-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
{noformat}
Notice, the SHOW CREATE returns garbage instead of ENUM values.

The problem happens in this piece of the code in table.cc:
{code:cpp}
    if (interval_nr && charset->mbminlen > 1)
    {
      /* Unescape UCS2 intervals from HEX notation */
      TYPELIB *interval= share->intervals + interval_nr - 1;
      unhex_type2(interval);
{code}
As the two TYPELIBs are equal, only one copy of this TYPELIB is stored in the FRM file. But unhex_type() is called two times.

Note, TYPELIBs for tricky character sets like utf32 are stored in HEX notation. So the same problem is repeatable if I use a latin1 ENUM column whose values are equal to HEX representations of the utf32 ENUM column:

{code:sql}
SET sql_mode='';
DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (
  ENABLED ENUM ('00000061','00000062') DEFAULT '00000061' COLLATE latin1_bin,
  HISTORY ENUM ('a','b') DEFAULT 'a' COLLATE utf32_general_ci
);
SHOW CREATE TABLE t1;
{code}
{noformat}
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| t1 | CREATE TABLE `t1` (
  `ENABLED` enum('\0\0\0a','\0\0\0b') CHARACTER SET latin1 COLLATE latin1_bin DEFAULT '\0\0\0a',
  `HISTORY` enum('a','b') CHARACTER SET utf32 DEFAULT 'a'
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
{noformat}
As in the previous example, only one copy of the TYPELIB is stored in the frm file (because they are binary equal).
But the unhex_type2() is called for this TYPELIB to unescape the utf32 column value. But the latin1 columns points to the same TYPELIB.

I create a table with two similar ENUM columns, both using CHARACTER SET utf32:
{code:sql}
SET sql_mode='';
DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (
  c1 ENUM ('a','b') CHARACTER SET utf32 DEFAULT 'a',
  c2 ENUM ('a','b') CHARACTER SET utf32 DEFAULT 'a'
);
SHOW CREATE TABLE t1;
{code}
{noformat}
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| t1 | CREATE TABLE `t1` (
  `c1` enum('??','??') CHARACTER SET utf32 DEFAULT '??',
  `c2` enum('??','??') CHARACTER SET utf32 DEFAULT '??'
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
{noformat}

Notice, the SHOW CREATE returns garbage instead of ENUM values.

The problem happens in this piece of the code in table.cc:
{code:cpp}
    if (interval_nr && charset->mbminlen > 1)
    {
      /* Unescape UCS2 intervals from HEX notation */
      TYPELIB *interval= share->intervals + interval_nr - 1;
      unhex_type2(interval);
{code}
As the two TYPELIBs are equal, only one copy of this TYPELIB is stored in the FRM file. But unhex_type() is called two times.

Note, TYPELIBs for tricky character sets like utf32 are stored in HEX notation. So the same problem is repeatable if I use a latin1 ENUM column whose values are equal to HEX representations of the utf32 ENUM column:

{code:sql}
SET sql_mode='';
DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (
  c1 ENUM ('00000061','00000062') DEFAULT '00000061' COLLATE latin1_bin,
  c2 ENUM ('a','b') DEFAULT 'a' COLLATE utf32_general_ci
);
SHOW CREATE TABLE t1;
{code}
{noformat}
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| t1 | CREATE TABLE `t1` (
  `c1` enum('\0\0\0a','\0\0\0b') CHARACTER SET latin1 COLLATE latin1_bin DEFAULT '\0\0\0a',
  `c2` enum('a','b') CHARACTER SET utf32 DEFAULT 'a'
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
{noformat}

As in the previous example, only one copy of the TYPELIB is stored in the frm file (because they are binary equal).
But the unhex_type2() is called for this TYPELIB to unescape the utf32 column value. But the latin1 columns points to the same TYPELIB.

Alexander Barkov made changes - 2022-03-16 10:13

Description

I create a table with two similar ENUM columns, both using CHARACTER SET utf32:
{code:sql}
DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (
  c1 ENUM ('a','b') CHARACTER SET utf32 DEFAULT 'a',
  c2 ENUM ('a','b') CHARACTER SET utf32 DEFAULT 'a'
);
SHOW CREATE TABLE t1;
{code}
{noformat}
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| t1 | CREATE TABLE `t1` (
  `c1` enum('??','??') CHARACTER SET utf32 DEFAULT '??',
  `c2` enum('??','??') CHARACTER SET utf32 DEFAULT '??'
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
{noformat}

Notice, the SHOW CREATE returns garbage instead of ENUM values.

The problem happens in this piece of the code in table.cc:
{code:cpp}
    if (interval_nr && charset->mbminlen > 1)
    {
      /* Unescape UCS2 intervals from HEX notation */
      TYPELIB *interval= share->intervals + interval_nr - 1;
      unhex_type2(interval);
{code}
As the two TYPELIBs are equal, only one copy of this TYPELIB is stored in the FRM file. But unhex_type() is called two times.

Note, TYPELIBs for tricky character sets like utf32 are stored in HEX notation. So the same problem is repeatable if I use a latin1 ENUM column whose values are equal to HEX representations of the utf32 ENUM column:

{code:sql}
DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (
  c1 ENUM ('00000061','00000062') DEFAULT '00000061' COLLATE latin1_bin,
  c2 ENUM ('a','b') DEFAULT 'a' COLLATE utf32_general_ci
);
SHOW CREATE TABLE t1;
{code}
{noformat}
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| t1 | CREATE TABLE `t1` (
  `c1` enum('\0\0\0a','\0\0\0b') CHARACTER SET latin1 COLLATE latin1_bin DEFAULT '\0\0\0a',
  `c2` enum('a','b') CHARACTER SET utf32 DEFAULT 'a'
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
{noformat}

As in the previous example, only one copy of the TYPELIB is stored in the frm file (because they are binary equal).
But the unhex_type2() is called for this TYPELIB to unescape the utf32 column value. But the latin1 columns points to the same TYPELIB.

Alexander Barkov made changes - 2022-03-16 10:45

Status

Open [ 1 ]

In Progress [ 3 ]

Alexander Barkov made changes - 2022-03-16 10:45

Assignee	Alexander Barkov [ bar ]	Alexey Botchkov [ holyfoot ]
Status	In Progress [ 3 ]	In Review [ 10002 ]

Alexey Botchkov made changes - 2022-03-16 17:09

Assignee	Alexey Botchkov [ holyfoot ]	Alexander Barkov [ bar ]
Status	In Review [ 10002 ]	Stalled [ 10000 ]

Sergei Golubchik made changes - 2022-03-17 22:17

Fix Version/s		10.3 [ 22126 ]
Fix Version/s		10.4 [ 22408 ]
Fix Version/s		10.5 [ 23123 ]
Fix Version/s		10.6 [ 24028 ]
Fix Version/s		10.7 [ 24805 ]

Alexander Barkov made changes - 2022-03-18 05:11

issue.field.resolutiondate

2022-03-18 05:11:08.0

2022-03-18 05:11:08.22

Alexander Barkov made changes - 2022-03-18 05:11

Fix Version/s		10.2.44 [ 27514 ]
Fix Version/s		10.3.35 [ 27512 ]
Fix Version/s		10.4.25 [ 27510 ]
Fix Version/s		10.5.16 [ 27508 ]
Fix Version/s		10.6.8 [ 27506 ]
Fix Version/s		10.7.4 [ 27504 ]
Fix Version/s		10.8.3 [ 27502 ]
Fix Version/s	10.2 [ 14601 ]
Fix Version/s	10.3 [ 22126 ]
Fix Version/s	10.4 [ 22408 ]
Fix Version/s	10.5 [ 23123 ]
Fix Version/s	10.6 [ 24028 ]
Fix Version/s	10.7 [ 24805 ]
Resolution		Fixed [ 1 ]
Status	Stalled [ 10000 ]	Closed [ 6 ]

Alexander Barkov made changes - 2022-04-08 06:00

Link

This issue relates to ~~MDEV-28062~~ [ ~~MDEV-28062~~ ]

Alexander Barkov made changes - 2022-04-08 06:05

Link

This issue relates to ~~MDEV-28062~~ [ ~~MDEV-28062~~ ]

Alexander Barkov made changes - 2022-04-08 08:25

Link

This issue duplicates ~~MDEV-28062~~ [ ~~MDEV-28062~~ ]

Alexander Barkov made changes - 2022-04-08 08:26

Link

This issue is duplicated by ~~MDEV-28062~~ [ ~~MDEV-28062~~ ]

Alexander Barkov made changes - 2022-04-08 08:26

Link

This issue duplicates ~~MDEV-28062~~ [ ~~MDEV-28062~~ ]

Alexander Barkov made changes - 2022-06-16 07:00

Link

This issue relates to MDEV-28498 [ MDEV-28498 ]

People

Assignee:: Alexander Barkov

Reporter:: Alexander Barkov

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 2022-03-16 07:53

Updated:: 2022-06-16 07:00

Resolved:: 2022-03-18 05:11

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Details

Description

Attachments

Issue Links

Activity

People

Dates

Git Integration