[MDEV-21533] 'å' equals '[' in the latin1_swedish_ci collation Created: 2020-01-20 Updated: 2020-06-23 |
|
| Status: | Confirmed |
| Project: | MariaDB Server |
| Component/s: | None |
| Affects Version/s: | 5.5, 10.1, 10.2, 10.3, 10.4 |
| Fix Version/s: | 10.5 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Marek Gibney | Assignee: | Alexander Barkov |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | collation | ||
| Environment: |
Debian 10 |
||
| Issue Links: |
|
||||||||
| Description |
|
The following code outputs both rows. I think it should only output the one starting with 'å'. (Tested on 10.3.18)
|
| Comments |
| Comment by Marko Mäkelä [ 2020-01-20 ] |
|
I can confirm the issue with mysql-test-run.pl. But the .test file must be encoded in ISO-8859-1. If it were encoded in UTF-8, then it would be testing something else: 'Ã¥lesund' and 'Ã¥%'. In that case, the row staring with '[' cannot be found. That said, I don’t think that we can fix the bug easily. Collations are part of the file format. If we changed an existing collation, then any indexes that depend on the collation definition could appear corrupted. What we could do is to introduce a new collation that would fix this. Side note: There were 7-bit variants of the International Alphabet 5 (IA5, also known as ASCII). In the Swedish/Finnish one, the characters [\\\]{|} were replaced with ÄÖÖÖÅäöå (listing Ö several times only because Jira seems to want to display the backslash multiple times). I might have expected 'ä%' but not 'å%' to match '['. |
| Comment by Sergei Golubchik [ 2020-01-21 ] |
|
FYI: http://collation-charts.org/mysql60/mysql604.latin1_swedish_ci.html |
| Comment by Alexander Barkov [ 2020-03-02 ] |
|
A workaround is to use a different latin1 collation, e.g.:
Or an utf8 collation:
|
| Comment by Alexander Barkov [ 2020-06-01 ] |
|
We'll try to add a new correct collation latin1_swedish2_ci after fixing 10.5 major bugs. Fixing the existing one is not desirable: users will have to rebuild all indexes in case of a collation change. |