[MDEV-8362] dash '-' is not recognized in charset armscii8 on select where query Created: 2015-06-23  Updated: 2015-07-14  Resolved: 2015-07-14

Status: Closed
Project: MariaDB Server
Component/s: Character Sets
Affects Version/s: 5.1.67, 5.2.14, 5.3.12, 5.5, 10.0, 10.1
Fix Version/s: 10.1.6

Type: Bug Priority: Critical
Reporter: Yingyu Cheng Assignee: Alexander Barkov
Resolution: Fixed Votes: 0
Labels: upstream
Environment:
  1. mysql --version
    mysql Ver 15.1 Distrib 5.5.39-MariaDB, for debian-linux-gnu (x86_64) using readline 5.1

Attachments: File bugtest.sql    
Sprint: 10.1.6-2

 Description   

It looks like that, the db server could not query if the value has a dash "-" inside, as far as I know, the affected charset is armscii8.

For more see the repo command as below, and the db dump is attached:

MariaDB [bugtest]> create table test(columnname varchar(64) CHARACTER SET armscii8);
Query OK, 0 rows affected (0.07 sec)
 
MariaDB [bugtest]> insert into test values ('abc-def');
Query OK, 1 row affected (0.04 sec)
 
MariaDB [bugtest]> select * from test where columnname = 'abc-def';
Empty set (0.00 sec)
 
MariaDB [bugtest]> select * from test where columnname like 'abc%';
+------------+
| columnname |
+------------+
| abc-def    |
+------------+
1 row in set (0.00 sec)
 



 Comments   
Comment by Elena Stepanova [ 2015-06-24 ]

Thanks for the report.

Same on MySQL 5.7, so if it's a bug, it's an upstream issue.
bar,
It does look like a bug to me, but I don't know how much this charset is supported.
If you decide it should be fixed, but prefer to treat it as an upstream bug, please report it at bugs.mysql.com (or maybe you know it has already been reported?). Alternatively, it can be fixed directly in MariaDB.

Comment by Alexander Barkov [ 2015-06-26 ]

It seems that the problem happens during utf8-to-armscii8 conversion because
the following ASCII characters have double encoding in the 8-bit range (0x80..0xFF):

0xA4   U+0029   RIGHT PARENTHESIS
0xA5   U+0028   LEFT PARENTHESIS
0xA9   U+002E   FULL STOP
0xAB   U+002C   COMMA
0xAC   U+002D   HYPHEN-MINUS
0xFF   U+0027   APOSTROPHE

So utf8 dash '-' is erroneously converted to armscii 0xAC instead of 0x2D:

MariaDB [test]> SELECT HEX(CONVERT(_utf8 0x2D USING armscii8));
+-----------------------------------------+
| HEX(CONVERT(_utf8 0x2D USING armscii8)) |
+-----------------------------------------+
| AC                                      |
+-----------------------------------------+
1 row in set (0.00 sec)

This should be fixed.

There is also a problem in the collation definition. It should probably sort the double coded characters as equal (e.g. armscii 0x2D should be equal to 0xAC).

Comment by Yingyu Cheng [ 2015-06-27 ]

@Elena Stepanova, I did not test or report to upstream MySQL. Do I need to do that? Or maybe you can help?

Comment by Elena Stepanova [ 2015-06-27 ]

winguse,
No problem, if we decide it's worth trying, then either bar or I will do that.

Generated at Thu Feb 08 07:26:36 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.