[MDEV-13081] Support the GB18030 encoding of Unicode Created: 2017-06-13  Updated: 2017-06-13  Resolved: 2017-06-13

Status: Closed
Project: MariaDB Server
Component/s: Character Sets
Fix Version/s: N/A

Type: Task Priority: Major
Reporter: Marko Mäkelä Assignee: Unassigned
Resolution: Duplicate Votes: 0
Labels: character-set, collation

Issue Links:
Duplicate
is duplicated by MDEV-7495 GB18030 Open

 Description   

MySQL 5.7 WL#4024 introduced support for the Chinese national standard GB18030, an encoding for Unicode.

For Chinese, Japanese, Korean (CJK), GB18030 can be an interesting option, because unlike UTF-8, it only needs 2 (not 3) bytes per CJK character, and unlike UTF-16, it only needs 1 byte per ASCII character (not 2). The price that you have to pay is that non-CJK, non-ASCII characters will require a longer encoding than in UTF-8 or UTF-16.

Because MariaDB 10.2 already incorporates the InnoDB of MySQL 5.7, the InnoDB adjustments for this should already be in place. The only missing bit should be that fts_is_charset_cjk() should return true for gb18030, to choose a hash-based internal partitioning scheme of the fulltext index.


Generated at Thu Feb 08 08:02:45 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.