Details
-
Bug
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Fixed
-
10.2(EOL), 10.3(EOL), 10.4(EOL), 10.5, 10.6, 10.7(EOL), 10.8(EOL)
-
None
Description
I create a table with two similar ENUM columns, both using CHARACTER SET utf32:
DROP TABLE IF EXISTS t1; |
CREATE TABLE t1 ( |
c1 ENUM ('a','b') CHARACTER SET utf32 DEFAULT 'a', |
c2 ENUM ('a','b') CHARACTER SET utf32 DEFAULT 'a' |
);
|
SHOW CREATE TABLE t1; |
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|
| Table | Create Table |
|
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|
| t1 | CREATE TABLE `t1` (
|
`c1` enum('??','??') CHARACTER SET utf32 DEFAULT '??',
|
`c2` enum('??','??') CHARACTER SET utf32 DEFAULT '??'
|
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
|
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|
Notice, the SHOW CREATE returns garbage instead of ENUM values.
The problem happens in this piece of the code in table.cc:
if (interval_nr && charset->mbminlen > 1) |
{
|
/* Unescape UCS2 intervals from HEX notation */ |
TYPELIB *interval= share->intervals + interval_nr - 1;
|
unhex_type2(interval);
|
As the two TYPELIBs are equal, only one copy of this TYPELIB is stored in the FRM file. But unhex_type() is called two times.
Note, TYPELIBs for tricky character sets like utf32 are stored in HEX notation. So the same problem is repeatable if I use a latin1 ENUM column whose values are equal to HEX representations of the utf32 ENUM column:
DROP TABLE IF EXISTS t1; |
CREATE TABLE t1 ( |
c1 ENUM ('00000061','00000062') DEFAULT '00000061' COLLATE latin1_bin, |
c2 ENUM ('a','b') DEFAULT 'a' COLLATE utf32_general_ci |
);
|
SHOW CREATE TABLE t1; |
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|
| Table | Create Table |
|
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|
| t1 | CREATE TABLE `t1` (
|
`c1` enum('\0\0\0a','\0\0\0b') CHARACTER SET latin1 COLLATE latin1_bin DEFAULT '\0\0\0a',
|
`c2` enum('a','b') CHARACTER SET utf32 DEFAULT 'a'
|
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
|
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|
As in the previous example, only one copy of the TYPELIB is stored in the frm file (because they are binary equal).
But the unhex_type2() is called for this TYPELIB to unescape the utf32 column value. But the latin1 columns points to the same TYPELIB.
Attachments
Issue Links
- blocks
-
MDEV-28062 Assertion `(length % 4) == 0' failed in my_lengthsp_utf32 on INSERT..SELECT
-
- Closed
-
- is duplicated by
-
MDEV-28062 Assertion `(length % 4) == 0' failed in my_lengthsp_utf32 on INSERT..SELECT
-
- Closed
-
- relates to
-
MDEV-28498 Incorrect information in file: './test/t0.frm' on CREATE TABLE
-
- In Review
-
Activity
Field | Original Value | New Value |
---|---|---|
Link |
This issue blocks |
Priority | Major [ 3 ] | Critical [ 2 ] |
Description |
I create a table with two similar ENUM columns, both using CHARACTER SET utf32: {code:sql} SET sql_mode=''; DROP TABLE IF EXISTS t1; CREATE TABLE t1 ( ENABLED ENUM ('a','b') CHARACTER SET utf32 DEFAULT 'a', HISTORY ENUM ('a','b') CHARACTER SET utf32 DEFAULT 'a' ); SHOW CREATE TABLE t1; {code} {noformat} +-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Table | Create Table | +-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | t1 | CREATE TABLE `t1` ( `ENABLED` enum('??','??') CHARACTER SET utf32 DEFAULT '??', `HISTORY` enum('??','??') CHARACTER SET utf32 DEFAULT '??' ) ENGINE=InnoDB DEFAULT CHARSET=latin1 | +-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ {noformat} Notice, the SHOW CREATE returns garbage instead of ENUM values. The problem happens in this piece of the code in table.cc: {code:cpp} if (interval_nr && charset->mbminlen > 1) { /* Unescape UCS2 intervals from HEX notation */ TYPELIB *interval= share->intervals + interval_nr - 1; unhex_type2(interval); {code} As the two TYPELIBs are equal, only one copy of this TYPELIB is stored in the FRM file. But unhex_type() is called two times. Note, TYPELIBs for tricky character sets like utf32 are stored in HEX notation. So the same problem is repeatable if I use a latin1 ENUM column whose values are equal to HEX representations of the utf32 ENUM column: {code:sql} SET sql_mode=''; DROP TABLE IF EXISTS t1; CREATE TABLE t1 ( ENABLED ENUM ('00000061','00000062') DEFAULT '00000061' COLLATE latin1_bin, HISTORY ENUM ('a','b') DEFAULT 'a' COLLATE utf32_general_ci ); SHOW CREATE TABLE t1; {code} {noformat} +-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Table | Create Table | +-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | t1 | CREATE TABLE `t1` ( `ENABLED` enum('\0\0\0a','\0\0\0b') CHARACTER SET latin1 COLLATE latin1_bin DEFAULT '\0\0\0a', `HISTORY` enum('a','b') CHARACTER SET utf32 DEFAULT 'a' ) ENGINE=InnoDB DEFAULT CHARSET=latin1 | +-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ {noformat} As in the previous example, only one copy of the TYPELIB is stored in the frm file (because they are binary equal). But the unhex_type2() is called for this TYPELIB to unescape the utf32 column value. But the latin1 columns points to the same TYPELIB. |
I create a table with two similar ENUM columns, both using CHARACTER SET utf32:
{code:sql} SET sql_mode=''; DROP TABLE IF EXISTS t1; CREATE TABLE t1 ( c1 ENUM ('a','b') CHARACTER SET utf32 DEFAULT 'a', c2 ENUM ('a','b') CHARACTER SET utf32 DEFAULT 'a' ); SHOW CREATE TABLE t1; {code} {noformat} +-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Table | Create Table | +-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | t1 | CREATE TABLE `t1` ( `c1` enum('??','??') CHARACTER SET utf32 DEFAULT '??', `c2` enum('??','??') CHARACTER SET utf32 DEFAULT '??' ) ENGINE=InnoDB DEFAULT CHARSET=latin1 | +-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ {noformat} Notice, the SHOW CREATE returns garbage instead of ENUM values. The problem happens in this piece of the code in table.cc: {code:cpp} if (interval_nr && charset->mbminlen > 1) { /* Unescape UCS2 intervals from HEX notation */ TYPELIB *interval= share->intervals + interval_nr - 1; unhex_type2(interval); {code} As the two TYPELIBs are equal, only one copy of this TYPELIB is stored in the FRM file. But unhex_type() is called two times. Note, TYPELIBs for tricky character sets like utf32 are stored in HEX notation. So the same problem is repeatable if I use a latin1 ENUM column whose values are equal to HEX representations of the utf32 ENUM column: {code:sql} SET sql_mode=''; DROP TABLE IF EXISTS t1; CREATE TABLE t1 ( c1 ENUM ('00000061','00000062') DEFAULT '00000061' COLLATE latin1_bin, c2 ENUM ('a','b') DEFAULT 'a' COLLATE utf32_general_ci ); SHOW CREATE TABLE t1; {code} {noformat} +-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Table | Create Table | +-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | t1 | CREATE TABLE `t1` ( `c1` enum('\0\0\0a','\0\0\0b') CHARACTER SET latin1 COLLATE latin1_bin DEFAULT '\0\0\0a', `c2` enum('a','b') CHARACTER SET utf32 DEFAULT 'a' ) ENGINE=InnoDB DEFAULT CHARSET=latin1 | +-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ {noformat} As in the previous example, only one copy of the TYPELIB is stored in the frm file (because they are binary equal). But the unhex_type2() is called for this TYPELIB to unescape the utf32 column value. But the latin1 columns points to the same TYPELIB. |
Description |
I create a table with two similar ENUM columns, both using CHARACTER SET utf32:
{code:sql} SET sql_mode=''; DROP TABLE IF EXISTS t1; CREATE TABLE t1 ( c1 ENUM ('a','b') CHARACTER SET utf32 DEFAULT 'a', c2 ENUM ('a','b') CHARACTER SET utf32 DEFAULT 'a' ); SHOW CREATE TABLE t1; {code} {noformat} +-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Table | Create Table | +-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | t1 | CREATE TABLE `t1` ( `c1` enum('??','??') CHARACTER SET utf32 DEFAULT '??', `c2` enum('??','??') CHARACTER SET utf32 DEFAULT '??' ) ENGINE=InnoDB DEFAULT CHARSET=latin1 | +-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ {noformat} Notice, the SHOW CREATE returns garbage instead of ENUM values. The problem happens in this piece of the code in table.cc: {code:cpp} if (interval_nr && charset->mbminlen > 1) { /* Unescape UCS2 intervals from HEX notation */ TYPELIB *interval= share->intervals + interval_nr - 1; unhex_type2(interval); {code} As the two TYPELIBs are equal, only one copy of this TYPELIB is stored in the FRM file. But unhex_type() is called two times. Note, TYPELIBs for tricky character sets like utf32 are stored in HEX notation. So the same problem is repeatable if I use a latin1 ENUM column whose values are equal to HEX representations of the utf32 ENUM column: {code:sql} SET sql_mode=''; DROP TABLE IF EXISTS t1; CREATE TABLE t1 ( c1 ENUM ('00000061','00000062') DEFAULT '00000061' COLLATE latin1_bin, c2 ENUM ('a','b') DEFAULT 'a' COLLATE utf32_general_ci ); SHOW CREATE TABLE t1; {code} {noformat} +-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Table | Create Table | +-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | t1 | CREATE TABLE `t1` ( `c1` enum('\0\0\0a','\0\0\0b') CHARACTER SET latin1 COLLATE latin1_bin DEFAULT '\0\0\0a', `c2` enum('a','b') CHARACTER SET utf32 DEFAULT 'a' ) ENGINE=InnoDB DEFAULT CHARSET=latin1 | +-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ {noformat} As in the previous example, only one copy of the TYPELIB is stored in the frm file (because they are binary equal). But the unhex_type2() is called for this TYPELIB to unescape the utf32 column value. But the latin1 columns points to the same TYPELIB. |
I create a table with two similar ENUM columns, both using CHARACTER SET utf32:
{code:sql} DROP TABLE IF EXISTS t1; CREATE TABLE t1 ( c1 ENUM ('a','b') CHARACTER SET utf32 DEFAULT 'a', c2 ENUM ('a','b') CHARACTER SET utf32 DEFAULT 'a' ); SHOW CREATE TABLE t1; {code} {noformat} +-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Table | Create Table | +-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | t1 | CREATE TABLE `t1` ( `c1` enum('??','??') CHARACTER SET utf32 DEFAULT '??', `c2` enum('??','??') CHARACTER SET utf32 DEFAULT '??' ) ENGINE=InnoDB DEFAULT CHARSET=latin1 | +-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ {noformat} Notice, the SHOW CREATE returns garbage instead of ENUM values. The problem happens in this piece of the code in table.cc: {code:cpp} if (interval_nr && charset->mbminlen > 1) { /* Unescape UCS2 intervals from HEX notation */ TYPELIB *interval= share->intervals + interval_nr - 1; unhex_type2(interval); {code} As the two TYPELIBs are equal, only one copy of this TYPELIB is stored in the FRM file. But unhex_type() is called two times. Note, TYPELIBs for tricky character sets like utf32 are stored in HEX notation. So the same problem is repeatable if I use a latin1 ENUM column whose values are equal to HEX representations of the utf32 ENUM column: {code:sql} DROP TABLE IF EXISTS t1; CREATE TABLE t1 ( c1 ENUM ('00000061','00000062') DEFAULT '00000061' COLLATE latin1_bin, c2 ENUM ('a','b') DEFAULT 'a' COLLATE utf32_general_ci ); SHOW CREATE TABLE t1; {code} {noformat} +-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Table | Create Table | +-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | t1 | CREATE TABLE `t1` ( `c1` enum('\0\0\0a','\0\0\0b') CHARACTER SET latin1 COLLATE latin1_bin DEFAULT '\0\0\0a', `c2` enum('a','b') CHARACTER SET utf32 DEFAULT 'a' ) ENGINE=InnoDB DEFAULT CHARSET=latin1 | +-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ {noformat} As in the previous example, only one copy of the TYPELIB is stored in the frm file (because they are binary equal). But the unhex_type2() is called for this TYPELIB to unescape the utf32 column value. But the latin1 columns points to the same TYPELIB. |
Status | Open [ 1 ] | In Progress [ 3 ] |
Assignee | Alexander Barkov [ bar ] | Alexey Botchkov [ holyfoot ] |
Status | In Progress [ 3 ] | In Review [ 10002 ] |
Assignee | Alexey Botchkov [ holyfoot ] | Alexander Barkov [ bar ] |
Status | In Review [ 10002 ] | Stalled [ 10000 ] |
Fix Version/s | 10.3 [ 22126 ] | |
Fix Version/s | 10.4 [ 22408 ] | |
Fix Version/s | 10.5 [ 23123 ] | |
Fix Version/s | 10.6 [ 24028 ] | |
Fix Version/s | 10.7 [ 24805 ] |
issue.field.resolutiondate | 2022-03-18 05:11:08.0 | 2022-03-18 05:11:08.22 |
Fix Version/s | 10.2.44 [ 27514 ] | |
Fix Version/s | 10.3.35 [ 27512 ] | |
Fix Version/s | 10.4.25 [ 27510 ] | |
Fix Version/s | 10.5.16 [ 27508 ] | |
Fix Version/s | 10.6.8 [ 27506 ] | |
Fix Version/s | 10.7.4 [ 27504 ] | |
Fix Version/s | 10.8.3 [ 27502 ] | |
Fix Version/s | 10.2 [ 14601 ] | |
Fix Version/s | 10.3 [ 22126 ] | |
Fix Version/s | 10.4 [ 22408 ] | |
Fix Version/s | 10.5 [ 23123 ] | |
Fix Version/s | 10.6 [ 24028 ] | |
Fix Version/s | 10.7 [ 24805 ] | |
Resolution | Fixed [ 1 ] | |
Status | Stalled [ 10000 ] | Closed [ 6 ] |
Link |
This issue relates to |
Link |
This issue relates to |
Link |
This issue duplicates |
Link |
This issue is duplicated by |
Link |
This issue duplicates |
Link | This issue relates to MDEV-28498 [ MDEV-28498 ] |