[MDEV-27653] long uniques don't work with unicode collations Created: 2022-01-27 Updated: 2024-01-28 Resolved: 2023-01-19 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Data types |
| Affects Version/s: | 10.4, 10.5, 10.6 |
| Fix Version/s: | 10.11.2, 10.4.28, 10.5.19, 10.6.12, 10.7.8, 10.8.7, 10.9.5, 10.10.3 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Sergei Golubchik | Assignee: | Alexander Barkov |
| Resolution: | Fixed | Votes: | 3 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
shows that one can insert two characters into a long unique index that compare equally. It happens because calc_hash_for_unique() hashes string length in bytes together with the string content. So even if two utf8 strings hash to the same value, they can have different length in bytes and long unique hash will be different. The effect of this bug is that if one has a table with Then the following problems may occur:
Fix is to drop and add back the unique index or run ALTER TABLE xxx ENGINE=InnoDB (for InnoDB tables) to reconstruct the hash index. |
| Comments |
| Comment by Rick James [ 2022-01-27 ] | |||||||||||||||||||
|
Neither `LENGTH` nor `CHAR_LENGTH` is valid. Consider these three characters; they are all "equal"
They are
It may be wise to pull the plug on `UNIQUE` for `TEXT` until this tough problem can be thought through thoroughly. | |||||||||||||||||||
| Comment by Marko Mäkelä [ 2022-09-08 ] | |||||||||||||||||||
|
For the record, I got this patch that should supposedly disable
I agree that it could be a good idea to simply disable In addition to what rjasdfiii mentioned, there are also things like this:
Contrary to what I expected, in this example the duplicates are identified correctly. | |||||||||||||||||||
| Comment by Alexander Barkov [ 2022-10-27 ] | |||||||||||||||||||
|
A similar problem is repeatable with this script:
Notice, the table contains two equal values, which is wrong. | |||||||||||||||||||
| Comment by Alexander Barkov [ 2022-10-27 ] | |||||||||||||||||||
|
One more problem: UNIQUE HASH keys on TIMESTAMP columns do not work in combination with CHECK TABLE if time_zone changes:
| |||||||||||||||||||
| Comment by Rick James [ 2022-10-27 ] | |||||||||||||||||||
|
Alexander – Can you (or someone else) explain what is going on to even catch the time_zone difference? Is TZ stored in the table definition? Is the timestamp "converted" at the wrong phase of the flow of the data? BTW, I don't get the error when testing with MySQL 8.0.31. Is there an MDEV to where MariaDB diverged from Oracle in TIMESTAMPs?
| |||||||||||||||||||
| Comment by Sergei Golubchik [ 2022-10-27 ] | |||||||||||||||||||
|
TIMESTAMP is stored as a number of seconds from epoch, in UTC. So it's a time zone independent. Time zone comes into effect when a timestamp is converted from a number to a string YYYY-MM-DD hh:mm:ss representation. At that point it's converted from UTC to the current time zone. The bug is that long unique was using the string representation of a timestamp, not its native numeric representation. In fact, it was using string representation for all columns and data types, but only for timestamp it has actually caused a bug, for other types it was just an unnecessary conversion. | |||||||||||||||||||
| Comment by Alexander Barkov [ 2022-10-28 ] | |||||||||||||||||||
|
Hello serg, Can you please review a patch: Thanks. | |||||||||||||||||||
| Comment by Alexander Barkov [ 2023-01-10 ] | |||||||||||||||||||
|
Hello serg, Please have a look into a new patch version: https://github.com/MariaDB/server/commit/06ce67c644b92a600b842744363e1d0acd8ce7d6 Thanks. | |||||||||||||||||||
| Comment by Sergei Golubchik [ 2023-01-19 ] | |||||||||||||||||||
|
https://github.com/MariaDB/server/commit/38d6d0dc6651c11f39344b0fe7b0cece85e5e5c8 is ok to push | |||||||||||||||||||
| Comment by Rick James [ 2024-01-28 ] | |||||||||||||||||||
|
https://dba.stackexchange.com/questions/335247/unique-key-on-varchar-is-ignored-for-maria-db complains about not using a UNIQUE index. | |||||||||||||||||||
| Comment by Sergei Golubchik [ 2024-01-28 ] | |||||||||||||||||||
|
it's a complain that the optimizer doesn't use an index to optimize the query. This is correct, optimizer doesn't, this is not implemented (yet). But the UNIQUE isn't ignored, the uniqueness constraint holds, it applies on inserts and updates. But the optimizer cannot use it for SELECT. |