[MDEV-15716] ExtractValue and UpdateValue are case sensitive. So working with XML tags is a headache Created: 2018-03-29  Updated: 2018-04-03  Resolved: 2018-04-03

Status: Closed
Project: MariaDB Server
Component/s: XML Functions
Affects Version/s: 10.2.12, 10.3.5
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Francisco Dueñas Assignee: Alexander Barkov
Resolution: Not a Bug Votes: 0
Labels: Compatibility
Environment:

Windows 10 64 bit



 Description   

ExtractValue and UpdateValue are case sensitive. So for looking for a certaing Tag in the correct case can be a difficult task.

readin MySQL 5.5 and above Manuals,
https://dev.mysql.com/doc/refman/5.7/en/xml-functions.html

They say that:
(Previously, binary—that is, case-sensitive—comparison was always used.)

So how to make those functions to work case insensitive

Also we should improve These function's documentation for how o use those functions



 Comments   
Comment by Elena Stepanova [ 2018-04-02 ]

I also fail to understand what that note in MySQL manual means. The complete fragment says

This function uses the current SQL collation for making comparisons with contains(), performing the same collation aggregation as other string functions (such as CONCAT()), in taking into account the collation coercibility of their arguments; see Section 10.8.4, “Collation Coercibility in Expressions”, for an explanation of the rules governing this behavior.

(Previously, binary—that is, case-sensitive—comparison was always used.)

Which reads to me like ExtractValue should be using the current connection collation while extracting the value. But I don't see it happen (even on MySQL 5.7):

5.7.21

MySQL [(none)]> show variables like 'collat%';
+----------------------+-------------------+
| Variable_name        | Value             |
+----------------------+-------------------+
| collation_connection | utf8_general_ci   |
| collation_database   | latin1_swedish_ci |
| collation_server     | latin1_swedish_ci |
+----------------------+-------------------+
3 rows in set (0.01 sec)
 
MySQL [(none)]> SELECT ExtractValue('<A>ccc<b>ddd</b></A>', '/a');
+--------------------------------------------+
| ExtractValue('<A>ccc<b>ddd</b></A>', '/a') |
+--------------------------------------------+
|                                            |
+--------------------------------------------+
1 row in set (0.00 sec)
 
MySQL [(none)]> SELECT ExtractValue('<a>ccc<b>ddd</b></a>', '/a');
+--------------------------------------------+
| ExtractValue('<a>ccc<b>ddd</b></a>', '/a') |
+--------------------------------------------+
| ccc                                        |
+--------------------------------------------+
1 row in set (0.00 sec)

At the same time,

MySQL [(none)]> SELECT 'A' = 'a';
+-----------+
| 'A' = 'a' |
+-----------+
|         1 |
+-----------+
1 row in set (0.01 sec)

Hopefully bar can clarify what should work how here.

Comment by Alexander Barkov [ 2018-04-03 ]

Tag names are always case sensitive in XML, no matter the current collation is. The current behavior is correct and expected.

The manual article is about XPath function contains(). This script demonstrates how the current collation affects XPath contains():

+--------------------------------------------------+
| ExtractValue('<a>xxx</a>', 'contains(/a,"XXX")') |
+--------------------------------------------------+
| 1                                                |
+--------------------------------------------------+
1 row in set (0.00 sec)
 
MariaDB [test]> SET NAMES utf8 COLLATE utf8_bin; SELECT ExtractValue('<a>xxx</a>', 'contains(/a,"XXX")');
Query OK, 0 rows affected (0.00 sec)
 
+--------------------------------------------------+
| ExtractValue('<a>xxx</a>', 'contains(/a,"XXX")') |
+--------------------------------------------------+
| 0                                                |
+--------------------------------------------------+
1 row in set (0.00 sec)

Generated at Thu Feb 08 08:23:29 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.