[MDEV-27042] UCA: Resetting contractions to ignorable does not work well Created: 2021-11-14  Updated: 2021-11-29  Resolved: 2021-11-24

Status: Closed
Project: MariaDB Server
Component/s: Character Sets
Affects Version/s: 10.2, 10.3, 10.4, 10.5, 10.6, 10.7, 10.8
Fix Version/s: 10.8.0

Type: Bug Priority: Major
Reporter: Alexander Barkov Assignee: Alexander Barkov
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Blocks
blocks MDEV-27009 Add UCA-14.0.0 collations Closed

 Description   

I patch the character set configuration file Index.xml as follows:

diff --git a/Index.xml.orig b/Index.xml
index cec3bfc..c69047e 100644
--- a/Index.xml.orig
+++ b/Index.xml
@@ -540,6 +540,19 @@ To make maintaining easier please:
     <flag>binary</flag>
     <flag>compiled</flag>
   </collation>
+  <collation name="utf8mb3_phone_ci" id="352">
+    <rules>
+      <reset>\u0000</reset>
+        <i>\u0020</i> <!-- space -->
+        <i>\u0028</i> <!-- left parenthesis -->
+        <i>\u0029</i> <!-- right parenthesis -->
+        <i>\u002B</i> <!-- plus -->
+        <i>\u002D</i> <!-- hyphen -->
+        <i>tel.</i>
+    </rules>
+  </collation>
 </charset>
 
 <charset name="ucs2">

I.e. I want to make ignorable:

  • some punctuation characters
  • The string "tel."

Now I run this script:

CREATE OR REPLACE TABLE t1
(
  phone VARCHAR(64) CHARACTER SET utf8mb3 COLLATE utf8mb3_phone_ci
);
INSERT INTO t1 VALUES ('123'),('tel.123');
SELECT * FROM t1 WHERE phone='123';

+-------+
| phone |
+-------+
| 123   |
+-------+

Looks wrong. It should return both lines.

Now I run:

SELECT phone, HEX(WEIGHT_STRING(phone)) FROM t1;

+---------+---------------------------+
| phone   | HEX(WEIGHT_STRING(phone)) |
+---------+---------------------------+
| 123     | 0E2A0E2B0E2C              |
| tel.123 |                           |
+---------+---------------------------+

It also looks wrong: the weight string in the second line should be equal to the weight string in the first line.



 Comments   
Comment by Sergei Golubchik [ 2021-11-24 ]

pushed into preview-10.8-MDEV-27009-uca-1400

Comment by Alexander Barkov [ 2021-11-24 ]

Pushed into 10.8 separately, so it does not have to wait for the whole MDEV-27009 work to be finished.

Generated at Thu Feb 08 09:49:54 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.