[MDEV-13981] Implement SQL2014 compliant LIKE (utf8mb4_unicode_ci produces different result '=' vs. LIKE) Created: 2017-10-02  Updated: 2017-10-02

Status: Open
Project: MariaDB Server
Component/s: Character Sets
Fix Version/s: None

Type: Task Priority: Major
Reporter: Roman Stingler (Inactive) Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None


 Description   

the problem is that the mentiond collation doesn't return the expected result in the LIKE query

MariaDB [(none)]> SELECT "ss" LIKE "ß" COLLATE utf8mb4_unicode_ci;
+-------------------------------------------+
| "ss" LIKE "ß" COLLATE utf8mb4_unicode_ci  |
+-------------------------------------------+
|                                         0 |
+-------------------------------------------+
1 row in set (0.00 sec)
 
MariaDB [(none)]> SELECT "ss" = "ß" COLLATE utf8mb4_unicode_ci;
+----------------------------------------+
| "ss" = "ß" COLLATE utf8mb4_unicode_ci  |
+----------------------------------------+
|                                      1 |
+----------------------------------------+
1 row in set (0.00 sec)

it just happens with the ß character, other umlauts are fine

MariaDB [(none)]> SELECT "o" = "ö" COLLATE utf8mb4_unicode_ci;
+---------------------------------------+
| "o" = "ö" COLLATE utf8mb4_unicode_ci  |
+---------------------------------------+
|                                     1 |
+---------------------------------------+
1 row in set (0.00 sec)
 
MariaDB [(none)]> SELECT "o" LIKE "ö" COLLATE utf8mb4_unicode_ci;
+------------------------------------------+
| "o" LIKE "ö" COLLATE utf8mb4_unicode_ci  |
+------------------------------------------+
|                                        1 |
+------------------------------------------+
1 row in set (0.00 sec)

Also the expected behaviour on the general collation is fine

MariaDB [(none)]> SELECT "s" LIKE "ß" COLLATE utf8mb4_general_ci;
+------------------------------------------+
| "s" LIKE "ß" COLLATE utf8mb4_general_ci  |
+------------------------------------------+
|                                        1 |
+------------------------------------------+
1 row in set (0.00 sec)
 
MariaDB [(none)]> SELECT "s" = "ß" COLLATE utf8mb4_general_ci;
+---------------------------------------+
| "s" = "ß" COLLATE utf8mb4_general_ci  |
+---------------------------------------+
|                                     1 |
+---------------------------------------+
1 row in set (0.00 sec)



 Comments   
Comment by Alexander Barkov [ 2017-10-02 ]

This is not a bug. This implementation is intentional.

This is a quote from SQL2011, section <like predicate>:

If the i-th substring specifier of PCV is a single character specifier, then the i-th substring
of MCV contains exactly 1 (one) character that is equal to the character represented by the
single character specifier according to the collation of the <like predicate>.

This clearly states that comparison is done one-character-to-one-character.
Therefore:

'ss' LIKE 'ß'

can never return true.

However, I just noticed that in the latest Standard version (SQL2014), this section was changed to:

If the i-th substring specifier of PCV is sequence of single character specifiers, then the
i-th substring of MCV contains 1 (one) or more characters and
'PCVi' = 'MCVi' COLLATE LCN
is True.

So since SQL2014, it's supposed to return true for:

'ss' LIKE 'ß'

But still:

'ß' LIKE 'ss'

is supposed to return false. Looks weird for me.

Comment by Axel Schwenke [ 2017-10-02 ]

Notice that the MySQL manual has an example just like the above to demonstrate that LIKE and operator == may return different results under certain circumstances.

Comment by Roman Stingler (Inactive) [ 2017-10-02 ]

Programmers should ask themselves if it an expected behaviour.
In my opinion it isn't.

Firstly I have a collation which treats ß like ss, but only in certain circumstances and only in a specific way and ......

Comment by Alexander Barkov [ 2017-10-02 ]

Sorry, I misread the quote from SQL2014. It actually says that both:

'ss' LIKE 'ß'
'ß' LIKE 'ss'

are true.

Comment by Alexander Barkov [ 2017-10-02 ]

Sounds like a reasonable feature request: implement SQL2014 compliant LIKE.

Generated at Thu Feb 08 08:09:54 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.