Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-17502

Change Unicode xxx_general_ci and xxx_bin collation implementation to "inline" style

Details

    Description

      This task is similar to MDEV-17474, but for general_ci and _bin collations.

      The current implementation my_strnxfrm_unicode_internal() has some bottlenecks:

      • it uses cs->cset->mb_wc() virtual calls
      • it accesses to cs->state and cs->caseinfo

      We'll change the code by adding new strnxfrm-family function templates into strings/strcoll.ic.
      Functions my_strnxfrm_unicode_internal(), my_strnxfrm_unicode(), my_strnxfrm_unicode_nopad() will migrate from strings/ctype-utf8.c to such function templates in strings/strcoll.ic.

      Every collation will include strings/strcoll.ic and pass specific parameters, such as mb_wc() and UNICASE data related.

      Additionally, we'll add fast paths for ASCII data.

      After these changes, the template instantiation (e.g. for utf8_general_ci) will look like this:

      #define MY_FUNCTION_NAME(x)      my_ ## x ## _utf8_general_ci
      #define DEFINE_STRNXFRM_UNICODE
      #define DEFINE_STRNXFRM_UNICODE_NOPAD
      #define MY_MB_WC(cs, pwc, s, e)  my_mb_wc_utf8mb3_quick(pwc, s, e)
      #define OPTIMIZE_ASCII           1
      #define UNICASE_MAXCHAR          MY_UNICASE_INFO_DEFAULT_MAXCHAR
      #define UNICASE_PAGE0            my_unicase_default_page00
      #define UNICASE_PAGES            my_unicase_default_pages
      ...
      #include "strcoll.ic"
      

      The template included in this example will:

      • use my_mb_wc_utf8mb3_quick() directly (inline or at least statically), instead of a virtual call.
      • use MY_UNICASE_INFO_DEFAULT_MAXCHAR, my_unicase_default_page00, my_unicase_default_pages directly, without dereferencing members of CHARSET_INFO.
      • enable fast path for ASCII

      Attachments

        Issue Links

          Activity

            bar Alexander Barkov added a comment - - edited

            Performance statistics:

            Short range searches with ORDER BY

            DROP TABLE IF EXISTS t1;
            CREATE TABLE t1 (pk SERIAL, field CHAR(120) CHARACTER SET utf8 COLLATE utf8_general_ci);
            INSERT INTO t1 (field) VALUES ('a'),('b'),('c'),('d');
            INSERT t1 (field)
            WITH  RECURSIVE int_seq AS (
              SELECT 1 AS val
              UNION ALL
              SELECT val + 1
              FROM int_seq
              WHERE val < 1000
            ) SELECT 'a' FROM int_seq;
             
            DROP PROCEDURE IF EXISTS p1;
            DELIMITER $$
            CREATE PROCEDURE p1()
            BEGIN
              DECLARE a INT DEFAULT 100000;
              WHILE (a > 0)
              DO
                SELECT DISTINCT field INTO @a FROM t1 WHERE pk BETWEEN 1 AND 11 ORDER BY field LIMIT 1;
                SET a=a-1;
              END WHILE;
            END;
            $$
            DELIMITER ;
            CALL p1;
            

            • 7.99 sec - MySQL-8.0
            • 7.83 sec - MariaDB-10.4 before MDEV-17502
            • 7.59 sec - MariaDB-10.4 after MDEV-17502

            Micro benchmark for WEIGHT_STRING() for utf8_general_ci

            SET NAMES utf8 COLLATE utf8_general_ci;
            SET @a=CONCAT('a', REPEAT(' ',359));
            SELECT BENCHMARK(500000, WEIGHT_STRING(@a,1024,960,128));
            

            • 0.74 sec - MySQL-8.0
            • 0.84 sec - MariaDB-10.4 before MDEV-17502
            • 0.41 sec - MariaDB-10.4 after MDEV-17502

            Micro benchmark for WEIGHT_STRING() for utf8_bin

            SET NAMES utf8 COLLATE utf8_bin;
            SET @a=CONCAT('a', REPEAT(' ',359));
            SELECT BENCHMARK(500000, WEIGHT_STRING(@a,1024,960,128));
            

            • 0.55 sec - MySQL-8.0
            • 0.60 sec - MariaDB-10.4 before MDEV-17502
            • 0.37 sec - MariaDB-10.4 after MDEV-17502
            bar Alexander Barkov added a comment - - edited Performance statistics: Short range searches with ORDER BY DROP TABLE IF EXISTS t1; CREATE TABLE t1 (pk SERIAL, field CHAR (120) CHARACTER SET utf8 COLLATE utf8_general_ci); INSERT INTO t1 (field) VALUES ( 'a' ),( 'b' ),( 'c' ),( 'd' ); INSERT t1 (field) WITH RECURSIVE int_seq AS ( SELECT 1 AS val UNION ALL SELECT val + 1 FROM int_seq WHERE val < 1000 ) SELECT 'a' FROM int_seq;   DROP PROCEDURE IF EXISTS p1; DELIMITER $$ CREATE PROCEDURE p1() BEGIN DECLARE a INT DEFAULT 100000; WHILE (a > 0) DO SELECT DISTINCT field INTO @a FROM t1 WHERE pk BETWEEN 1 AND 11 ORDER BY field LIMIT 1; SET a=a-1; END WHILE; END ; $$ DELIMITER ; CALL p1; 7.99 sec - MySQL-8.0 7.83 sec - MariaDB-10.4 before MDEV-17502 7.59 sec - MariaDB-10.4 after MDEV-17502 Micro benchmark for WEIGHT_STRING() for utf8_general_ci SET NAMES utf8 COLLATE utf8_general_ci; SET @a=CONCAT( 'a' , REPEAT( ' ' ,359)); SELECT BENCHMARK(500000, WEIGHT_STRING(@a,1024,960,128)); 0.74 sec - MySQL-8.0 0.84 sec - MariaDB-10.4 before MDEV-17502 0.41 sec - MariaDB-10.4 after MDEV-17502 Micro benchmark for WEIGHT_STRING() for utf8_bin SET NAMES utf8 COLLATE utf8_bin; SET @a=CONCAT( 'a' , REPEAT( ' ' ,359)); SELECT BENCHMARK(500000, WEIGHT_STRING(@a,1024,960,128)); 0.55 sec - MySQL-8.0 0.60 sec - MariaDB-10.4 before MDEV-17502 0.37 sec - MariaDB-10.4 after MDEV-17502

            People

              bar Alexander Barkov
              bar Alexander Barkov
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.