Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-17474

Change Unicode collation implementation from "handler" to "inline" style

    Details

      Description

      Axel Schwenke found a performance bottleneck in the Unicode collation implementation (MDEV-16413).

      Under terms of this task, we'll reorganize the code implementing Unicode collations (such as utf8_unicode_ci) to a new style which will change virtual function calls to inline-ing:

      The old style

      • There is one copy of every collation function (strnncoll(), strnncollsp(), hash_sort(), strnnxfrm()), e.g.:

        static int my_strnncollsp_uca(CHARSET_INFO *cs, 
                                      my_uca_scanner_handler *scanner_handler,
                                      const uchar *s, size_t slen,
                                      const uchar *t, size_t tlen)
        

      • Character set-specific routines are passed in scanner_handler
      • scanner_handler->next() is called virtually (i.e. via a pointer to a function)
      • scanner_handler->next() itself calls cs->cset->mb_wc() virtually

      The new style

      • There are multiple implementations of the functions, one function per character set.
      • There is a shared file ctype-uca.ic, which is included multiple times, one time per each character set.
      • Character set specific information is passed in macros:

        #include "ctype-utf8.h"
        #define MY_FUNCTION_NAME(x)   my_uca_ ## x ## _utf8mb3
        #define MY_MB_WC(scanner, wc, beg, end) (my_mb_wc_utf8mb3_quick(wc, beg, end))
        #define MY_LIKE_RANGE my_like_range_mb
        #include "ctype-uca.ic"
        

      • There are inline my_mb_wc_CSNAME_quick() implementations in new header files: ctype-utf8.h, ctype-ucs2.h, ctype-utf16.h, ctype-utf32.h

      The old version generated smaller amount of executable code, but was slower.
      The new version will generate more code, but will be much faster: there will be no any virtual function calls. All calls inside new functions will be done either using inline or at least statically.

      Part#2: additional changes:

      • Add fast paths to handle ASCII characters
      • Add dedicated MY_COLLATION_HANDLERs for collations with no contractions (for utf8 and for utf8mb4 character sets). The choice between the full-featured handler and the "no contraction" handler should be made at the collation initialization time.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                bar Alexander Barkov
                Reporter:
                bar Alexander Barkov
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: