[MDEV-6170] Incorrect ordering with utf8_bin and utf8mb4_bin collations Created: 2014-04-24 Updated: 2014-04-28 Resolved: 2014-04-25 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | None |
| Affects Version/s: | 10.0.10 |
| Fix Version/s: | 10.0.11 |
| Type: | Bug | Priority: | Major |
| Reporter: | Jeremy Cole | Assignee: | Alexander Barkov |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | character-set | ||
| Attachments: |
|
| Description |
|
In MySQL 5.6 the utf8_bin and utf8mb4_bin collations were broken such that they ordered by only the high byte of the utf8 character (meaning ASCII sorts as though all characters were 0, resulting in random sort). This was fixed in the following commit (and maybe others): https://github.com/darnaut/mysql-server/commit/5fe5a5b This issue affects MariaDB 10 as well. See the attached simple test for this sorting bug. |
| Comments |
| Comment by Jeremy Cole [ 2014-04-24 ] | ||||||||||||||
|
Note that with only that patch applied, I get a failure in main.ctype_utf32: assertion failed at sql/filesort.cc:927 in void make_sortkey(Sort_param*, uchar*, uchar*): sort_field->length >= length Obviously some other change is required but I am not sure what. | ||||||||||||||
| Comment by Jeremy Cole [ 2014-04-25 ] | ||||||||||||||
|
Also note that the assert being hit seems correct; commenting it out causes utf32 to sort incorrectly and the test fails for that reason. On MySQL 5.6, the assert is never reached because `from != to` in the enclosing if(), which I think means the string is being reallocated elsewhere... | ||||||||||||||
| Comment by Jeremy Cole [ 2014-04-25 ] | ||||||||||||||
|
Note that the upstream bug was this one: | ||||||||||||||
| Comment by Alexander Barkov [ 2014-04-25 ] | ||||||||||||||
|
An SQL script demonstrating the problem:
Notice, the order is not descending. | ||||||||||||||
| Comment by Alexander Barkov [ 2014-04-25 ] | ||||||||||||||
|
A similar script (with COLLATE moved to the table definition and with LOWER instead of COLLATE in ORDER BY):
The order is not descending. | ||||||||||||||
| Comment by Alexander Barkov [ 2014-04-25 ] | ||||||||||||||
|
Pushed into 10.0 | ||||||||||||||
| Comment by Pavel Ivanov [ 2014-04-25 ] | ||||||||||||||
|
I guess the patch in question is http://bazaar.launchpad.net/~maria-captains/maria/10.0/revision/4164. | ||||||||||||||
| Comment by Jeremy Cole [ 2014-04-25 ] | ||||||||||||||
|
I'd guess the replication unsafety thing was because the "CREATE TABLE t1 ENGINE=MEMORY AS SELECT REPEAT('a',5) AS a LIMIT 0" being used in the test was being marked as unsafe for replication, despite having a LIMIT 0 making it deterministic. However, fixing that bug within this fix is kind of unexpected (and I would note that the new behavior is not being specifically tested here). | ||||||||||||||
| Comment by Jeremy Cole [ 2014-04-25 ] | ||||||||||||||
|
Bar: Could you comment as to why MY_CS_STRNXFRM is not needed for my_charset_utf8mb4_bin? And/or why the other changes to filesort were not necessary here? Trying to understand the differences. | ||||||||||||||
| Comment by Alexander Barkov [ 2014-04-28 ] | ||||||||||||||
|
Jeremy, you're right. A fix for The new behaviour IS tested in binlog_stm_ctype_cp932. By the way, the old behaviour of "LIMIT 0" was not tested, | ||||||||||||||
| Comment by Alexander Barkov [ 2014-04-28 ] | ||||||||||||||
|
Jeremy, MY_CS_STRNXFRM is needed for my_charset_utf8mb4_bin,
MY_CS_STRNXFRM is needed for all collations (including utf8_bin and utf8mb4_bin)
These transformation are done for performance purposes: to reduce the length needed to sort data. | ||||||||||||||
| Comment by Jeremy Cole [ 2014-04-28 ] | ||||||||||||||
|
Bar, Yes, I see that MY_CS_STRNXFRM is present already for utf8mb4, I was looking at a too-old version in my case. I do however disagree that the new behavior is specifically tested; it's indirectly/incidentally tested as part of the ctype test. It should get a new test, with a decent name, to test specifically that LIMIT 0 causes the warning to not be generated while e.g. LIMIT 1 does not. It doesn't matter what was there before, it matters what you're changing and that we're improving the situation. This should get a new, specific test. |