[MDEV-21533] 'å' equals '[' in the latin1_swedish_ci collation Created: 2020-01-20  Updated: 2020-06-23

Status: Confirmed
Project: MariaDB Server
Component/s: None
Affects Version/s: 5.5, 10.1, 10.2, 10.3, 10.4
Fix Version/s: 10.5

Type: Bug Priority: Minor
Reporter: Marek Gibney Assignee: Alexander Barkov
Resolution: Unresolved Votes: 0
Labels: collation
Environment:

Debian 10


Issue Links:
Relates
relates to MDEV-21436 Character comparison incorect Confirmed

 Description   

The following code outputs both rows. I think it should only output the one starting with 'å'. (Tested on 10.3.18)

CREATE TABLE test (
  name VARCHAR(100)
) CHARACTER SET latin1 COLLATE latin1_swedish_ci;
 
INSERT INTO test(name) VALUES('[hello world]');
INSERT INTO test(name) VALUES('ålesund');
 
SELECT * FROM test WHERE name LIKE 'å%';



 Comments   
Comment by Marko Mäkelä [ 2020-01-20 ]

I can confirm the issue with mysql-test-run.pl. But the .test file must be encoded in ISO-8859-1. If it were encoded in UTF-8, then it would be testing something else: 'Ã¥lesund' and 'Ã¥%'. In that case, the row staring with '[' cannot be found.

That said, I don’t think that we can fix the bug easily. Collations are part of the file format. If we changed an existing collation, then any indexes that depend on the collation definition could appear corrupted.

What we could do is to introduce a new collation that would fix this.

Side note: There were 7-bit variants of the International Alphabet 5 (IA5, also known as ASCII). In the Swedish/Finnish one, the characters [\\\]{|} were replaced with ÄÖÖÖÅäöå (listing Ö several times only because Jira seems to want to display the backslash multiple times). I might have expected 'ä%' but not 'å%' to match '['.

Comment by Sergei Golubchik [ 2020-01-21 ]

FYI: http://collation-charts.org/mysql60/mysql604.latin1_swedish_ci.html

Comment by Alexander Barkov [ 2020-03-02 ]

A workaround is to use a different latin1 collation, e.g.:

Or an utf8 collation:

Comment by Alexander Barkov [ 2020-06-01 ]

We'll try to add a new correct collation latin1_swedish2_ci after fixing 10.5 major bugs.

Fixing the existing one is not desirable: users will have to rebuild all indexes in case of a collation change.

Generated at Thu Feb 08 09:07:51 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.