[MDEV-12110] Inavlid polish collation Created: 2017-02-22 Updated: 2017-10-18 Resolved: 2017-10-18 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Character Sets |
| Affects Version/s: | 10.0.27-galera |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Major |
| Reporter: | Marek | Assignee: | Alexander Barkov |
| Resolution: | Not a Bug | Votes: | 0 |
| Labels: | None | ||
| Environment: |
openSUSE 42.2 (x86_64) |
||
| Description |
|
Sample table:
Now I don't understand why query:
returns only 1 row. Forcing utf8_unicode_ci shows 3 results, but for:
there is again 1 result. Ł and ł letters are not mapped to l. This should work according to http://unicode.org/repos/cldr/tags/latest/common/collation/pl.xml |
| Comments |
| Comment by Alexander Barkov [ 2017-02-27 ] | |||||||||
|
This query returns only one row:
because 'a' and 'ą' are different letters. 'ą' is greater than 'a' on the primary level. This query returns only one row:
because:
So utf8_polish_ci works in full accordance with: Now let's check utf8_unicode_ci: This query returns three rows:
because in the default weight table, 'a' and 'ą' are equal on the primary level. This query returns one one:
because 'l' and 'ł' were different letters in the default weight table for Unicode-4.x: Notice, the two letters have a different primary weight:
Notice, 0F2E vs 0F36. utf8_unicode_ci is based on Unicode-4.x, so it inherits all Unicode-4.x features. In later versions, Unicode changed the default weight table. In Unicode-5.x, 'l' and 'ł' are already primary equal letters:
Notice, they have the same primary weight of 1262. So you can use this query to get the desired result:
It returns three rows. |