Under terms of this task, we'll reorganize the code implementing Unicode collations (such as utf8_unicode_ci) to a new style which will change virtual function calls to inline-ing:
- There is one copy of every collation function (strnncoll(), strnncollsp(), hash_sort(), strnnxfrm()), e.g.:
- Character set-specific routines are passed in scanner_handler
- scanner_handler->next() is called virtually (i.e. via a pointer to a function)
- scanner_handler->next() itself calls cs->cset->mb_wc() virtually
- There are multiple implementations of the functions, one function per character set.
- There is a shared file ctype-uca.ic, which is included multiple times, one time per each character set.
- Character set specific information is passed in macros:
- There are inline my_mb_wc_CSNAME_quick() implementations in new header files: ctype-utf8.h, ctype-ucs2.h, ctype-utf16.h, ctype-utf32.h
The old version generated smaller amount of executable code, but was slower.
The new version will generate more code, but will be much faster: there will be no any virtual function calls. All calls inside new functions will be done either using inline or at least statically.
- Add fast paths to handle ASCII characters
- Add dedicated MY_COLLATION_HANDLERs for collations with no contractions (for utf8 and for utf8mb4 character sets). The choice between the full-featured handler and the "no contraction" handler should be made at the collation initialization time.