[MDEV-15624] Changing the default character set to utf8mb4 changes query evaluation in a very surprising way Created: 2018-03-21 Updated: 2018-04-05 Resolved: 2018-04-04 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Character Sets |
| Affects Version/s: | 5.5, 10.0, 10.1, 10.2.13, 10.2, 10.3 |
| Fix Version/s: | 5.5.60, 10.0.35, 10.1.33, 10.2.15, 10.3.6 |
| Type: | Bug | Priority: | Major |
| Reporter: | Martin Häcker | Assignee: | Alexander Barkov |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | upstream-fixed | ||
| Environment: |
Darwin crest.fritz.box 17.4.0 Darwin Kernel Version 17.4.0: Sun Dec 17 09:19:54 PST 2017; root:xnu-4570.41.2~1/RELEASE_X86_64 x86_64 |
||
| Issue Links: |
|
||||||||
| Description |
|
Changing the default character set to utf8mb4 changes query evaluation in a very surprising way. (Please feel free to set a better title). This is actually a followup bug report to this: https://bitbucket.org/zzzeek/sqlalchemy/issues/4222/query-yields-different-result-via-sqla (see there for more detail) Here's the problem: This query: ```select seq, replace(uuid(), "-", "") from seq_0_to_9;``` has a completely different result, wether you connect to mysql with a character set of utf8 or utf8mb4. Here's an example:
This returns 10 UUIDs that are all the same. *I believe this to be wrong* Calling that same query with utf8 as the format yields 10 different uuids - which I believe to be correct:
This also happens on on at least on current Fedora. (I can look up the details tomorrow) On a further note, this result is |
| Comments |
| Comment by Elena Stepanova [ 2018-03-21 ] | ||||||||||||||||||||||||||||||||||||
|
Thanks for the report and test case.
There is a difference in explain which might be related:
| ||||||||||||||||||||||||||||||||||||
| Comment by Alexander Barkov [ 2018-04-03 ] | ||||||||||||||||||||||||||||||||||||
|
The problem is also repeatable with a real table instead of a sequence:
Notice, all values in the second row are the same, which is wrong.
Notice, all values in the second row are different, which is correct. | ||||||||||||||||||||||||||||||||||||
| Comment by Alexander Barkov [ 2018-04-04 ] | ||||||||||||||||||||||||||||||||||||
|
The same problem is repeatable with the INSERT function:
If I change the character set from utf8mb4 to utf8, it works fine. | ||||||||||||||||||||||||||||||||||||
| Comment by Alexander Barkov [ 2018-04-04 ] | ||||||||||||||||||||||||||||||||||||
|
The same problem is repeatable with UUID_SHORT() as an argument to any string function which enforces character set conversion of UUID_SHORT() to some other character set (e.g. to ucs2):
| ||||||||||||||||||||||||||||||||||||
| Comment by Alexander Barkov [ 2018-04-04 ] | ||||||||||||||||||||||||||||||||||||
|
The problem happens because Item_func_uuid_short::const_item() and Item_func_uuid::const_item() return true. They should return false. | ||||||||||||||||||||||||||||||||||||
| Comment by Alexander Barkov [ 2018-04-04 ] | ||||||||||||||||||||||||||||||||||||
|
The problem is repeatable with DISTINCT. All SELECT queries in the below scripts return one row, while they are expected to return three rows:
| ||||||||||||||||||||||||||||||||||||
| Comment by Martin Häcker [ 2018-04-04 ] | ||||||||||||||||||||||||||||||||||||
|
Out of interest, could you link the commit / the commits you fixed this in? I can't seem to find them at https://github.com/MariaDB/server | ||||||||||||||||||||||||||||||||||||
| Comment by Elena Stepanova [ 2018-04-04 ] | ||||||||||||||||||||||||||||||||||||
|
dwt, you should be able to see the commit link in the JIRA issue itself, right panel, "Development" section. https://github.com/mariadb/server/commit/6beb08c7b67ed7610e95c0350f9f93005db1e055 | ||||||||||||||||||||||||||||||||||||
| Comment by Martin Häcker [ 2018-04-05 ] | ||||||||||||||||||||||||||||||||||||
|
Got it, Thanks! | ||||||||||||||||||||||||||||||||||||
| Comment by Alexander Barkov [ 2018-04-05 ] | ||||||||||||||||||||||||||||||||||||
|
You didn't see it because https://github.com/MariaDB/server displays the current branch, with is 10.3. I pushed the change to 5.5, so it's visible here: https://github.com/MariaDB/server/tree/5.5 Note, the patch has not been propagated to 10.3 yet. It will be, when we merge from 5.5 up to 10.3 next time. |