[MDEV-15182] Incorrect compare latin extended unicode symbol with charset utf8mb4 Created: 2018-02-02 Updated: 2018-02-05 Resolved: 2018-02-05 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Character Sets |
| Affects Version/s: | 10.1.30 |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Minor |
| Reporter: | Stas | Assignee: | Alexander Barkov |
| Resolution: | Not a Bug | Votes: | 0 |
| Labels: | innodb | ||
| Environment: |
Linux centos 7 , windows 7 |
||
| Description |
|
Hello
execute
we get error But symbol S != Ş , it is different strings |
| Comments |
| Comment by Elena Stepanova [ 2018-02-05 ] | ||||||||||||
|
I think it's one of multiple mapping/matching/comparison rules for UTF8, I'll leave it to bar to point at the precise rule (or to say there isn't one).
| ||||||||||||
| Comment by Alexander Barkov [ 2018-02-05 ] | ||||||||||||
|
utf8mb4_general_ci is accent insensitive, so this is the expected behaviour. Please use utf8mb4_thai_520_w2 if you need accent sensitive comparison. This script demonstrates that Ş != S when using utf8mb4_thai_520_w2:
| ||||||||||||
| Comment by Alexander Barkov [ 2018-02-05 ] | ||||||||||||
|
Another option is to use utf8mb4_turkish_ci. It implements the following rules:
It's accent sensitive only for these specified letters, and accent insensitive for all other letters. Should work for Turkish (and most likely for Azerbaijani) |