Details
-
Task
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
None
Description
This task is similar to MDEV-17474, but for general_ci and _bin collations.
The current implementation my_strnxfrm_unicode_internal() has some bottlenecks:
- it uses cs->cset->mb_wc() virtual calls
- it accesses to cs->state and cs->caseinfo
We'll change the code by adding new strnxfrm-family function templates into strings/strcoll.ic.
Functions my_strnxfrm_unicode_internal(), my_strnxfrm_unicode(), my_strnxfrm_unicode_nopad() will migrate from strings/ctype-utf8.c to such function templates in strings/strcoll.ic.
Every collation will include strings/strcoll.ic and pass specific parameters, such as mb_wc() and UNICASE data related.
Additionally, we'll add fast paths for ASCII data.
After these changes, the template instantiation (e.g. for utf8_general_ci) will look like this:
#define MY_FUNCTION_NAME(x) my_ ## x ## _utf8_general_ci
|
#define DEFINE_STRNXFRM_UNICODE
|
#define DEFINE_STRNXFRM_UNICODE_NOPAD
|
#define MY_MB_WC(cs, pwc, s, e) my_mb_wc_utf8mb3_quick(pwc, s, e)
|
#define OPTIMIZE_ASCII 1
|
#define UNICASE_MAXCHAR MY_UNICASE_INFO_DEFAULT_MAXCHAR
|
#define UNICASE_PAGE0 my_unicase_default_page00
|
#define UNICASE_PAGES my_unicase_default_pages
|
...
|
#include "strcoll.ic" |
The template included in this example will:
- use my_mb_wc_utf8mb3_quick() directly (inline or at least statically), instead of a virtual call.
- use MY_UNICASE_INFO_DEFAULT_MAXCHAR, my_unicase_default_page00, my_unicase_default_pages directly, without dereferencing members of CHARSET_INFO.
- enable fast path for ASCII
Attachments
Issue Links
- blocks
-
MDEV-16413 test performance of distinct range queries
- Closed
- relates to
-
MDEV-17474 Change Unicode collation implementation from "handler" to "inline" style
- Closed
-
MDEV-33621 Unify duplicate code in my_wildcmp_uca_impl() and my_wildcmp_unicode_impl()
- Closed