[MDEV-14127] ORDER BY COLLATE 'utf8mb4_german2_ci' ASC is inconsistent with DESC Created: 2017-10-25  Updated: 2017-10-31  Resolved: 2017-10-30

Status: Closed
Project: MariaDB Server
Component/s: Character Sets
Affects Version/s: 5.5.57
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Mitur Sakoda Assignee: Alexander Barkov
Resolution: Not a Bug Votes: 0
Labels: None
Environment:

mysql Ver 14.14 Distrib 5.7.9, for osx10.9 (x86_64) using EditLine wrapper

Connection id: 2358
Current database: LIBRARY0
Current user: root@192.168.255.100
SSL: Not in use
Current pager: stdout
Using outfile: ''
Using delimiter: ;
Server version: 5.5.5-10.2.9-MariaDB-10.2.9+maria~trusty-log mariadb.org binary distribution
Protocol version: 10
Connection: tenant-mariadb.service.imanagecloud.com via TCP/IP
Server characterset: latin1
Db characterset: utf8mb4
Client characterset: utf8
Conn. characterset: utf8



 Description   

Sorting with COLLATE 'utf8mb4_german2_ci' gives inconsistent results with ASC vs. DESC, they are not the reverse of each other.

The details should be clear from the example below.

mysql> SELECT C1, C2, C3 FROM CHARSET_IT_1508963224106 WHERE C2 = 'testGermanSortDIN5007v2' ORDER BY C1 COLLATE 'utf8mb4_german2_ci' ASC;
+----------+-------------------------+------+
| C1       | C2                      | C3   |
+----------+-------------------------+------+
| Göbel    | testGermanSortDIN5007v2 |    0 |
| Goethe   | testGermanSortDIN5007v2 |    1 |
| Göthe    | testGermanSortDIN5007v2 |    2 |
| Götz     | testGermanSortDIN5007v2 |    3 |
| Goldmann | testGermanSortDIN5007v2 |    4 |
+----------+-------------------------+------+
5 rows in set (0.00 sec)
 
mysql> SELECT C1, C2, C3 FROM CHARSET_IT_1508963224106 WHERE C2 = 'testGermanSortDIN5007v2' ORDER BY C1 COLLATE 'utf8mb4_german2_ci' DESC;
+----------+-------------------------+------+
| C1       | C2                      | C3   |
+----------+-------------------------+------+
| Goldmann | testGermanSortDIN5007v2 |    4 |
| Götz     | testGermanSortDIN5007v2 |    3 |
| Goethe   | testGermanSortDIN5007v2 |    1 |
| Göthe    | testGermanSortDIN5007v2 |    2 |
| Göbel    | testGermanSortDIN5007v2 |    0 |
+----------+-------------------------+------+
5 rows in set (0.00 sec)
 
mysql> show create table CHARSET_IT_1508963224106;
+--------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table                    | Create Table                                                                                                                                                                                                                                                       |
+--------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| CHARSET_IT_1508963224106 | CREATE TABLE `CHARSET_IT_1508963224106` (
  `C1` varchar(256) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `C2` varchar(256) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `C3` int(11) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci |
+--------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)



 Comments   
Comment by Elena Stepanova [ 2017-10-30 ]

The result is easily reproducible, but I'm not sure it's a bug.

In german2 (German phone book order), 'ö' and 'oe' are synonyms, thus 'Goethe' and 'Göthe' are equal values and can be placed in any order.

I'll leave it to the character set expert bar to confirm (or not).

Comment by Alexander Barkov [ 2017-10-30 ]

This is not a bug. As Elena wrote, 'Goethe' and 'Göthe' are equal values.
The order of equal values is not predicable.

To change the mutual order of equal values, try this:

SELECT C1, C2, C3 FROM CHARSET_IT_1508963224106 WHERE C2 = 'testGermanSortDIN5007v2' ORDER BY C1 COLLATE 'utf8mb4_german2_ci' DESC, BINARY c1 DESC;

Comment by Mitur Sakoda [ 2017-10-31 ]

That worked. Thanks!

Generated at Thu Feb 08 08:11:08 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.