Details
-
Bug
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Fixed
-
10.4(EOL), 10.5, 10.6
-
None
Description
set names utf8; |
create table t1 (a text collate utf8_general_ci unique); |
show create table t1; |
insert t1 values ('a'); |
insert t1 values ('ä'); |
select * from t1; |
select distinct * from t1; |
select 'a' = 'ä'; |
drop table t1; |
shows that one can insert two characters into a long unique index that compare equally.
It happens because calc_hash_for_unique() hashes string length in bytes together with the string content. So even if two utf8 strings hash to the same value, they can have different length in bytes and long unique hash will be different.
The effect of this bug is that if one has a table with
CREATE TABLE
...
xxx VARCHAR(255) COLLATE 'utf8mb3_general_ci',
...
UNIQUE INDEX `xxx` (column,...) USING HASH...
Then the following problems may occur:
- The table will accept duplicates for UTF8 strings that are different but should compare equal.
- If master and slave are of different versions where the bug is fixed on one of them or the hash value is different (see
MDEV-32093) then the following can also happen: - Insert works on master (which does not notice the duplicate key) but fails on slave (which notices the duplicate key)
Fix is to drop and add back the unique index or run ALTER TABLE xxx ENGINE=InnoDB (for InnoDB tables) to reconstruct the hash index.
Attachments
Issue Links
- blocks
-
MDEV-29954 Unique hash key on column prefix is computed incorrectly
-
- Closed
-
- causes
-
MDEV-32016 change the hash used for hash unique
-
- Open
-
-
MDEV-32093 long uniques break old->new replication
-
- Closed
-
-
MDEV-35297 Table will encounter rebuild during minor version upgrade if contains hashing index on TEXT type unique key
-
- Confirmed
-
- includes
-
MDEV-25779 long uniques aren't 32/64-bit portable
-
- Closed
-
- relates to
-
MDEV-371 Unique indexes for blobs
-
- Closed
-
-
MDEV-28190 sql_mode makes MDEV-371 virtual column expressions nondeterministic
-
- Closed
-
-
MDEV-25779 long uniques aren't 32/64-bit portable
-
- Closed
-
-
MDEV-29345 update case insensitive (large) unique key with insensitive change of value - duplicate key
-
- Closed
-
Activity
Field | Original Value | New Value |
---|---|---|
Affects Version/s | 10.4 [ 22408 ] | |
Affects Version/s | 10.5 [ 23123 ] | |
Affects Version/s | 10.6 [ 24028 ] |
Fix Version/s | 10.4 [ 22408 ] | |
Fix Version/s | 10.5 [ 23123 ] | |
Fix Version/s | 10.6 [ 24028 ] |
Assignee | Nikita Malyavin [ nikitamalyavin ] |
Description |
{code:sql}
set names utf8; create table t1 (a text collate utf8_general_ci unique); show create table t1; insert t1 values ('a'); insert t1 values ('ä'); select * from t1; select distinct * from t1; select 'a' = 'ä'; drop table t1; {code} shows that one can insert to characters into a long unique index that compare equally. It happens because {{calc_hash_for_unique()}} hashes string *length in bytes* together with the string content. So even if two utf8 strings hash to the same value, they can have different length in bytes and long unique hash will be different. |
{code:sql}
set names utf8; create table t1 (a text collate utf8_general_ci unique); show create table t1; insert t1 values ('a'); insert t1 values ('ä'); select * from t1; select distinct * from t1; select 'a' = 'ä'; drop table t1; {code} shows that one can insert two characters into a long unique index that compare equally. It happens because {{calc_hash_for_unique()}} hashes string *length in bytes* together with the string content. So even if two utf8 strings hash to the same value, they can have different length in bytes and long unique hash will be different. |
Link |
This issue relates to |
Priority | Major [ 3 ] | Critical [ 2 ] |
Fix Version/s | 10.7 [ 24805 ] | |
Fix Version/s | 10.8 [ 26121 ] | |
Fix Version/s | 10.9 [ 26905 ] |
Assignee | Nikita Malyavin [ nikitamalyavin ] | Alexander Barkov [ bar ] |
Link |
This issue relates to |
Status | Open [ 1 ] | In Progress [ 3 ] |
Status | In Progress [ 3 ] | In Review [ 10002 ] |
Assignee | Alexander Barkov [ bar ] | Sergei Golubchik [ serg ] |
Link |
This issue blocks |
Link | This issue causes MDEV-30095 [ MDEV-30095 ] |
Link | This issue causes MDEV-30095 [ MDEV-30095 ] |
Assignee | Sergei Golubchik [ serg ] | Alexander Barkov [ bar ] |
Status | In Review [ 10002 ] | Stalled [ 10000 ] |
Status | Stalled [ 10000 ] | In Progress [ 3 ] |
Assignee | Alexander Barkov [ bar ] | Sergei Golubchik [ serg ] |
Status | In Progress [ 3 ] | In Review [ 10002 ] |
Assignee | Sergei Golubchik [ serg ] | Alexander Barkov [ bar ] |
Status | In Review [ 10002 ] | Stalled [ 10000 ] |
Status | Stalled [ 10000 ] | In Progress [ 3 ] |
Component/s | Data types [ 13906 ] | |
Fix Version/s | 10.4.28 [ 28509 ] | |
Fix Version/s | 10.5.19 [ 28511 ] | |
Fix Version/s | 10.6.12 [ 28513 ] | |
Fix Version/s | 10.7.8 [ 28515 ] | |
Fix Version/s | 10.8.7 [ 28517 ] | |
Fix Version/s | 10.9.5 [ 28519 ] | |
Fix Version/s | 10.10.3 [ 28521 ] | |
Fix Version/s | 10.11.2 [ 28523 ] | |
Fix Version/s | 10.4 [ 22408 ] | |
Fix Version/s | 10.5 [ 23123 ] | |
Fix Version/s | 10.6 [ 24028 ] | |
Fix Version/s | 10.7 [ 24805 ] | |
Fix Version/s | 10.8 [ 26121 ] | |
Fix Version/s | 10.9 [ 26905 ] | |
Resolution | Fixed [ 1 ] | |
Status | In Progress [ 3 ] | Closed [ 6 ] |
Link | This issue causes MDEV-32016 [ MDEV-32016 ] |
Link |
This issue relates to |
Link |
This issue includes |
Link | This issue causes MENT-1935 [ MENT-1935 ] |
Link |
This issue causes |
Description |
{code:sql}
set names utf8; create table t1 (a text collate utf8_general_ci unique); show create table t1; insert t1 values ('a'); insert t1 values ('ä'); select * from t1; select distinct * from t1; select 'a' = 'ä'; drop table t1; {code} shows that one can insert two characters into a long unique index that compare equally. It happens because {{calc_hash_for_unique()}} hashes string *length in bytes* together with the string content. So even if two utf8 strings hash to the same value, they can have different length in bytes and long unique hash will be different. |
{code:sql}
set names utf8; create table t1 (a text collate utf8_general_ci unique); show create table t1; insert t1 values ('a'); insert t1 values ('ä'); select * from t1; select distinct * from t1; select 'a' = 'ä'; drop table t1; {code} shows that one can insert two characters into a long unique index that compare equally. It happens because {{calc_hash_for_unique()}} hashes string *length in bytes* together with the string content. So even if two utf8 strings hash to the same value, they can have different length in bytes and long unique hash will be different. The effect of this bug is that if one has a table with CREATE TABLE ... xxx VARCHAR(255) COLLATE 'utf8mb3_general_ci', ... UNIQUE INDEX `xxx` (column,...) USING HASH... Then the following problems may occur: - The table will accept duplicates even if the value should be compared as |
Description |
{code:sql}
set names utf8; create table t1 (a text collate utf8_general_ci unique); show create table t1; insert t1 values ('a'); insert t1 values ('ä'); select * from t1; select distinct * from t1; select 'a' = 'ä'; drop table t1; {code} shows that one can insert two characters into a long unique index that compare equally. It happens because {{calc_hash_for_unique()}} hashes string *length in bytes* together with the string content. So even if two utf8 strings hash to the same value, they can have different length in bytes and long unique hash will be different. The effect of this bug is that if one has a table with CREATE TABLE ... xxx VARCHAR(255) COLLATE 'utf8mb3_general_ci', ... UNIQUE INDEX `xxx` (column,...) USING HASH... Then the following problems may occur: - The table will accept duplicates even if the value should be compared as |
{code:sql}
set names utf8; create table t1 (a text collate utf8_general_ci unique); show create table t1; insert t1 values ('a'); insert t1 values ('ä'); select * from t1; select distinct * from t1; select 'a' = 'ä'; drop table t1; {code} shows that one can insert two characters into a long unique index that compare equally. It happens because {{calc_hash_for_unique()}} hashes string *length in bytes* together with the string content. So even if two utf8 strings hash to the same value, they can have different length in bytes and long unique hash will be different. The effect of this bug is that if one has a table with CREATE TABLE ... xxx VARCHAR(255) COLLATE 'utf8mb3_general_ci', ... UNIQUE INDEX `xxx` (column,...) USING HASH... Then the following problems may occur: - The table will accept duplicates for UTF8 strings that are different but should compare equal. |
Description |
{code:sql}
set names utf8; create table t1 (a text collate utf8_general_ci unique); show create table t1; insert t1 values ('a'); insert t1 values ('ä'); select * from t1; select distinct * from t1; select 'a' = 'ä'; drop table t1; {code} shows that one can insert two characters into a long unique index that compare equally. It happens because {{calc_hash_for_unique()}} hashes string *length in bytes* together with the string content. So even if two utf8 strings hash to the same value, they can have different length in bytes and long unique hash will be different. The effect of this bug is that if one has a table with CREATE TABLE ... xxx VARCHAR(255) COLLATE 'utf8mb3_general_ci', ... UNIQUE INDEX `xxx` (column,...) USING HASH... Then the following problems may occur: - The table will accept duplicates for UTF8 strings that are different but should compare equal. |
{code:sql}
set names utf8; create table t1 (a text collate utf8_general_ci unique); show create table t1; insert t1 values ('a'); insert t1 values ('ä'); select * from t1; select distinct * from t1; select 'a' = 'ä'; drop table t1; {code} shows that one can insert two characters into a long unique index that compare equally. It happens because {{calc_hash_for_unique()}} hashes string *length in bytes* together with the string content. So even if two utf8 strings hash to the same value, they can have different length in bytes and long unique hash will be different. The effect of this bug is that if one has a table with CREATE TABLE ... xxx VARCHAR(255) COLLATE 'utf8mb3_general_ci', ... UNIQUE INDEX `xxx` (column,...) USING HASH... Then the following problems may occur: - The table will accept duplicates for UTF8 strings that are different but should compare equal. - If master and slave are of different versions where the bug is fixed on one of them or the hash value is different (see - Insert works on master (which does not notice the duplicate key) but fails on slave (which notices the duplicate key) Fix is to drop and add back the unique index or run ALTER TABLE xxx ENGINE=InnoDB (for InnoDB tables) to reconstruct the hash index. |
Zendesk Related Tickets | 144370 136244 |
Link | This issue causes MDEV-35297 [ MDEV-35297 ] |
Neither `LENGTH` nor `CHAR_LENGTH` is valid. Consider these three characters; they are all "equal"
> SELECT 'ä'='a', 'ä'='a';
+----------+-----------+
| 'ä'='a' | 'ä'='a' |
+----------+-----------+
| 1 | 1 |
+----------+-----------+
1 row in set (0.01 sec)
They are
61 L LATIN SMALL LETTER A
C3A4 L LATIN SMALL LETTER A WITH DIAERESIS
61 plus CC88 NSM COMBINING DIAERESIS -- 3 bytes, 2 "characters"
It may be wise to pull the plug on `UNIQUE` for `TEXT` until this tough problem can be thought through thoroughly.
(BLOB should not have this problem.)