Details
-
Task
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
None
Description
axel found a performance bottleneck in the Unicode collation implementation (MDEV-16413).
Under terms of this task, we'll reorganize the code implementing Unicode collations (such as utf8_unicode_ci) to a new style which will change virtual function calls to inline-ing:
The old style
- There is one copy of every collation function (strnncoll(), strnncollsp(), hash_sort(), strnnxfrm()), e.g.:
static int my_strnncollsp_uca(CHARSET_INFO *cs,
my_uca_scanner_handler *scanner_handler,
const uchar *s, size_t slen,
const uchar *t, size_t tlen)
- Character set-specific routines are passed in scanner_handler
- scanner_handler->next() is called virtually (i.e. via a pointer to a function)
- scanner_handler->next() itself calls cs->cset->mb_wc() virtually
The new style
- There are multiple implementations of the functions, one function per character set.
- There is a shared file ctype-uca.ic, which is included multiple times, one time per each character set.
- Character set specific information is passed in macros:
#include "ctype-utf8.h"
#define MY_FUNCTION_NAME(x) my_uca_ ## x ## _utf8mb3
#define MY_MB_WC(scanner, wc, beg, end) (my_mb_wc_utf8mb3_quick(wc, beg, end))
#define MY_LIKE_RANGE my_like_range_mb
#include "ctype-uca.ic"
- There are inline my_mb_wc_CSNAME_quick() implementations in new header files: ctype-utf8.h, ctype-ucs2.h, ctype-utf16.h, ctype-utf32.h
The old version generated smaller amount of executable code, but was slower.
The new version will generate more code, but will be much faster: there will be no any virtual function calls. All calls inside new functions will be done either using inline or at least statically.
Part#2: additional changes:
- Add fast paths to handle ASCII characters
- Add dedicated MY_COLLATION_HANDLERs for collations with no contractions (for utf8 and for utf8mb4 character sets). The choice between the full-featured handler and the "no contraction" handler should be made at the collation initialization time.
Attachments
Issue Links
- blocks
-
MDEV-16413 test performance of distinct range queries
- Closed
- relates to
-
MDEV-17502 Change Unicode xxx_general_ci and xxx_bin collation implementation to "inline" style
- Closed
-
MDEV-33621 Unify duplicate code in my_wildcmp_uca_impl() and my_wildcmp_unicode_impl()
- Closed