Some functions in MY_CHARSET_HANDLER are not good enough and new more powerful functions have been added as replacements. This task is to clean-up MY_CHARSET_HANDLER, to remove the functions that have replacements.
We'll try to preserve API as much as possible, in case some plugins use the old functions (but ABI will change!).
1. Remove ismbchar() from MY_CHARSET_HANDLER:
and fix the code to use a new function added in 10.1 instead:
charlen() is a more powerful replacement for ismbchar(), as it can additionally:
- distinguish between a valid single byte (return value 1) character vs a broken byte (return value 0)
- report incomplete characters (premature end-of-line) with return values MY_CS_TOOSMALXXX
For API compatibility purposes, the macros my_ismbchar() can be restored as a wrapper
function around cs->cset->charlen() instead of cs->cset->ismbchar(), something like this:
2. Remove mbcharlen() from MY_CHARSET_HANDLER:
and add a new function added in 10.1 instead:
Which will return a combination of flags, e.g.:
- the byte is a stand-anlone valid character
- the byte is a MB2 head
- the byte is a MB3 head
- the byte is a MB4 head
- the byte is a MB5 head
- the byte is a MB2 tail
- the byte is a MB3 tail
- the byte is a MB4 tail
- the byte is a MB5 tail
- the byte is MB23 continuation (e.g. the second byte in a 3-byte character)
- the byte is MB24 continuation (e.g. the second byte in a 4-byte character)
- the byte is MB34 continuation (e.g. the third byte in a 4-byte character)
- the byte is MBxy continuation (for all possible x and y combinations)
- and maybe some other flags
For API compatibility purposes, the old macros my_mbcharlen() can be rewritten as a wrapper around cs->cset->byte_property().
3. Remove well_formed_len:
and use a new function added in 10.1 instead:
The new function is a replacement for well_formed_len() and numchars() at the same time, it can return:
a. "number of characters" as a return value
b. "number of bytes" which is directly calculated from status->m_source_end_pos.
c. "there are bad bytes" in status->m_well_formed_error_pos, or NULL if no bad bytes.