[MDEV-6353] my_ismbchar() and my_mbcharlen() refactoring Created: 2014-06-17 Updated: 2020-05-05 Resolved: 2016-05-17 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Character Sets |
| Fix Version/s: | 10.2.1 |
| Type: | Task | Priority: | Minor |
| Reporter: | Alexander Barkov | Assignee: | Alexander Barkov |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | refactoring | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||
| Sprint: | 10.2.0-6, 10.2.0-7, 10.2.0-9, 10.2.0-10, 10.2.0-11, 10.2.1-1, 10.2.1-2 | ||||||||||||||||||||||||||||||||||||
| Description |
|
Currently MY_CHARSET_HANDLER has two functions to handle
This API is not flexible enough. 1. Problems to detect invalid bytes sequences: mbcharlen() reports invalid bytes as characters with length=1. It's not possible to detect invalid byte sequences using this API, MDEV-6218 Wrong result of CHAR_LENGTH(non-BMP-character) with 3-byte utf8 2. The first byte is not always enough to detect a character length. This API should be changed into a single function:
For performance purposes, the caller must supply a string consisting of
The function will return the same codes that mb_wc() does: 1. Positive numbers on success: 1a. 1 in case of a one-byte character found starting at "str" 2. Non-positive number on error: Note, the function will ask the caller for as few more bytes as possible. a. charlen(0xFE) will return MY_CS_TOOSMALL, asking for one more byte only. b. charlen(0xFEFE) will return 2, meaning a two byte-character. c. charlen(0xFE30) will return MY_CS_TOOSMALL2, asking for two more bytes. d. charlen(0xFE3081) will return MY_CS_TOOSMALL, askibg for one more byte. e. charlen(0xFE308130) will return 4, meaning a four-byte character. Note, the affected charset handler functions are currently almost not used directly:
Instead, they are used through the macros my_mbcharlen() and my_ismbchar():
This is very fortunate. my_ismbchar(cs, a, b) will call cs->cset->charlen(), but then will
This is to return 0 for all cases where the old macros my_ismbchar()
my_mbcharlen(cs, ch) will return 1 for single byte characters and for Later, when adding gb18030, will replace some calls for my_ismbchar() and my_mbcharlen() |