[MDEV-6776] ujis and eucjmps erroneously accept 0x8EA0 as a valid byte sequence Created: 2014-09-24  Updated: 2015-01-28  Resolved: 2014-09-25

Status: Closed
Project: MariaDB Server
Component/s: Character Sets
Affects Version/s: 5.5.39, 10.0.13
Fix Version/s: 10.0.14

Type: Bug Priority: Minor
Reporter: Alexander Barkov Assignee: Alexander Barkov
Resolution: Fixed Votes: 0
Labels: None


 Description   

Byte sequence 0x8EA0 is erroneously accepted as a valid ujis/eucjpms code:

DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (a VARCHAR(10) CHARACTER SET ujis);
INSERT INTO t1 VALUES (0x8EA0);
SELECT HEX(a), CHAR_LENGTH(a) FROM t1;

returns:

+--------+----------------+
| HEX(a) | CHAR_LENGTH(a) |
+--------+----------------+
| 8EA0   |              2 |
+--------+----------------+

This is wrong. The correct code ranges for ujis are:

  [x00-x7F]                     # ASCII/JIS-Roman (one-byte/character)  
  [x8E][xA1-xDF]                # half-width katakana (two bytes/char)  
  [x8F][xA1-xFE][xA1-xFE]       # JIS X 0212-1990 (three bytes/char)  
  [xA1-xFE][xA1-xFE]            # JIS X 0208:1997 (two bytes/char)

The same problem is observed with eucjpms.


Generated at Thu Feb 08 07:14:30 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.