[MDEV-352] parse errors in loadable UCA / LDML collations are silently ignored Created: 2012-06-17  Updated: 2014-12-01  Resolved: 2014-11-18

Status: Closed
Project: MariaDB Server
Component/s: Character Sets
Affects Version/s: 5.5.24
Fix Version/s: 10.0.6

Type: Bug Priority: Major
Reporter: Hartmut Holzgraefe Assignee: Alexander Barkov
Resolution: Fixed Votes: 0
Labels: upstream-fixed


 Description   

[12 Jun 21:22] Hartmut Holzgraefe

Description:
When defining a UCA collation using LDML syntax in share/charsets/Index.xml any syntax errors in the collation definitions lead to the collation not being available after mysqld restart without providing any startup error message about the parse failure whatsoever.

How to repeat:

  • restart mysqld and verify that columns using utf8_phone_ci can be used
  • now add a parse error, e.g. by simply removing the 'u' after the backslash in one of the unicode code point definitions, like replacing

<reset>\u0000</reset>

with

<reset>\0000</reset>

  • restart the server once more
  • verify that utf8_phone_ci can't be used anymore
  • check the mysqld error log for any collation related error message

=> there is none

Suggested fix:
Report errors found while parsing the loadable collations during startup to the mysqld error log



 Comments   
Comment by Hartmut Holzgraefe [ 2012-06-17 ]

See also http://bugs.mysql.com/bug.php?id=65593

Comment by Sergei Golubchik [ 2012-06-20 ]

http://lists.mysql.com/commits/144273

Comment by Sergei Golubchik [ 2013-01-01 ]

fixed in mysql-5.6.6

Comment by Elena Stepanova [ 2014-11-09 ]

bar,
If it's still relevant after aall changes you've made regarding character sets, maybe it makes sense to merge this fix of yours?

If the bug is irrelevant to the current version, please just close it.

Comment by Alexander Barkov [ 2014-11-18 ]

The problem does not present in 10.0, as the fix for MySQL Bug #65593 was earlier merged into 10.0.5 under terms of MDEV-4928.
Collation definition syntax errors are now reported to stderr without problems.

Note, 10.0 also extends the limits of the reset and shift sequences,
so it won't be possible to see the error using the example from this bug report:
<reset>\0000</reset>

It's now a valid sequence, it means resetting to a sequence of five characters (backslash followed by four zeros).

Hmm, perhaps literal use of the backslash character should have been disallowed from the very beginning... Would be helpful to avoid typos like this.

Generated at Thu Feb 08 06:28:06 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.