[MDEV-9411] Wrong sort order when Collation = latin1_swedish_ci Created: 2016-01-13  Updated: 2017-11-05  Resolved: 2017-11-05

Status: Closed
Project: MariaDB Server
Component/s: Character Sets
Affects Version/s: 10.0.20
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Terje Rosenlund Assignee: Alexander Barkov
Resolution: Not a Bug Votes: 0
Labels: None
Environment:

Server: hostnamectl
Static hostname: intwebsrv1
Icon name: computer-vm
Chassis: vm
Machine ID: *****************************
Boot ID: *****************************
Virtualization: microsoft
Operating System: Debian GNU/Linux 8 (jessie)
Kernel: Linux 3.16.0-4-amd64
Architecture: x86-64

my.cnf:
[mysql]
default-character-set=utf8
[mysqld]
collation_server=utf8_unicode_ci
character_set_server=utf8
default-character-set=utf8
init_connect='SETcollation_connection=utf8_unicode_ci'
init_connect='SETNAMESutf8'

mySql:
Server version: 10.0.20-MariaDB-0+deb8u1-log (Debian)



 Description   

DROP TABLE IF EXISTS `table1`;
CREATE TABLE IF NOT EXISTS `table1` (
  `Tekst` varchar(255) CHARACTER SET latin1 NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
 
INSERT INTO `table1` (`Tekst`) VALUES
('A'),
('a'),
('b'),
('c'),
('d'),
('E'),
('e'),
('O'),
('o'),
('Æ'),
('Ø'),
('Å'),
('æ'),
('ø'),
('å'),
('x'),
('y'),
('z');
 
SHOW FULL COLUMNS FROM table1;	=> Collation = latin1_swedish_ci
 
USE test;
show variables like 'character%';	
 
=> 
Variable_name				Value
-----------------------------------
character_set_client		utf8mb4
character_set_connection	utf8mb4
character_set_database		latin1
character_set_filesystem		binary
character_set_results		utf8mb4
character_set_server		utf8
character_set_system		utf8
 
SELECT * FROM table1 ORDER BY Tekst ASC
 
Tekst
A
a
b
c
d
e
E
O
o
x
y
z
Å
å
æ
Æ
ø
Ø
 
Problem: 
1 - Å and å before æ (should be last)
2 - æ before Æ (flipped)
3 - ø before Ø (flipped)
4 - e before E (flipped)
 
ENGINE=InnoDB DEFAULT CHARSET=latin1; <= Gives same result

Is this is a bug or is it my setup that causes problems?



 Comments   
Comment by Elena Stepanova [ 2016-01-13 ]

If I understand this chart right, I could guess that the case-insensitive collation accounts for problems 2, 3, 4.
For the problem 1, the chart puts Å and å before æ – maybe it's one of the contradictions that the MySQL manual talks about:

Although it is frequently said that it is based on the Swedish/Finnish collation rules, there are Swedes and Finns who disagree with this statement.

I will redirect this question to the character set expert bar who can say for sure if it's a bug or not.

Comment by Alexander Barkov [ 2017-11-05 ]

Elena is right, for problems 2, 3,4 the reason is case insensitive collation. The order of equal characters is not predictable.
You can sort like this, to get a strict order:

SELECT .. ORDER BY Tekst ASC, BINARY Tekst;

For the problems N1:
We use Unicode's Common Locale Data Repository as an authority for the proper order of various language collations.
The Finnish collation definition can be found here: http://www.unicode.org/repos/cldr/trunk/common/collation/fi.xml

It says that the order is:
<å<<<Å<ä<<<Ä<<æ<<<Æ

So å is expected to go before æ.
latin1_swedish_ci looks correct.

Generated at Thu Feb 08 07:34:28 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.