Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-17511

Improve performance for ORDER BY with a CHAR(N) CHARACTER SET utf8_unicode_ci

Details

    Description

      Note, this problem is repeatable with all UCA collations with PAD SPACE attribute. This MDEV uses utf8_uncide_ci as an example for explanation.

      There is a bottleneck in these functions:

      • my_uca_strnxfrm_no_contractions_utf8mb3
      • my_uca_strnxfrm_onelevel_internal_no_contractions_utf8mb3
        called from Field_string::sort_string() in this scenario:

      CREATE OR REPLACE TABLE t1 (a CHAR(10) CHARACTER SET utf8 COLLATE utf8_unicode_ci);
      INSERT INTO t1 VALUES ('a'),('b'),('c'),('d');
      SELECT * FROM t1 ORDER BY a;
      

      Generating weights for trailing spaces (which almost always present in case of CHAR) seems to be CPU hungry.
      my_uca_strnxfrm_onelevel_internal_no_contractions_utf8mb3() scans trailing spaces as normal characters and so it calls my_uca_scanner_next_no_contractions_utf8mb3() for every trailing space and then calculate its weight using UCA weights.

      It should be faster to trip trailing spaces in my_uca_strnxfrm_no_contractions_utf8mb3() before calling my_uca_strnxfrm_onelevel_internal_no_contractions_utf8mb3(). If we because of this change return a too short key, the caller will append weights for implicit spaces anyway, up to the desired key size. This will effectively generate exactly the same sortable key result.

      Appending weights for implicit spaces is much less CPU hungry that a loop with scanner_next calls.

      Attachments

        Issue Links

          Activity

            DROP TABLE IF EXISTS t1;
            CREATE TABLE t1 (pk SERIAL, field CHAR(120) CHARACTER SET utf8 COLLATE utf8_unicode_ci);
            INSERT INTO t1 (field) VALUES ('a'),('b'),('c'),('d');
            INSERT t1 (field)
            WITH  RECURSIVE int_seq AS (
              SELECT 1 AS val
              UNION ALL
              SELECT val + 1
              FROM int_seq
              WHERE val < 1000
            ) SELECT 'a' FROM int_seq;
             
            DROP PROCEDURE IF EXISTS p1;
            DELIMITER $$
            CREATE PROCEDURE p1()
            BEGIN
              DECLARE a INT DEFAULT 100000;
              WHILE (a > 0)
              DO
                SELECT DISTINCT field INTO @a FROM t1 WHERE pk BETWEEN 1 AND 11 ORDER BY field LIMIT 1;
                SET a=a-1;
              END WHILE;
            END;
            $$
            DELIMITER ;
            CALL p1;
            

            • 9.37 sec - MariaDB-10.4 before MDEV-17511
            • 8.75 sec - MariaDB-10.4 after MDEV-17511
            • 8.30 sec - MySQL-8.0

            SET NAMES utf8 COLLATE utf8_unicode_ci;
            SET @a=CONCAT('a', REPEAT(' ',359));
            SELECT BENCHMARK(500000, WEIGHT_STRING(@a,1024,960,128));
            

            bar Alexander Barkov added a comment - DROP TABLE IF EXISTS t1; CREATE TABLE t1 (pk SERIAL, field CHAR (120) CHARACTER SET utf8 COLLATE utf8_unicode_ci); INSERT INTO t1 (field) VALUES ( 'a' ),( 'b' ),( 'c' ),( 'd' ); INSERT t1 (field) WITH RECURSIVE int_seq AS ( SELECT 1 AS val UNION ALL SELECT val + 1 FROM int_seq WHERE val < 1000 ) SELECT 'a' FROM int_seq;   DROP PROCEDURE IF EXISTS p1; DELIMITER $$ CREATE PROCEDURE p1() BEGIN DECLARE a INT DEFAULT 100000; WHILE (a > 0) DO SELECT DISTINCT field INTO @a FROM t1 WHERE pk BETWEEN 1 AND 11 ORDER BY field LIMIT 1; SET a=a-1; END WHILE; END ; $$ DELIMITER ; CALL p1; 9.37 sec - MariaDB-10.4 before MDEV-17511 8.75 sec - MariaDB-10.4 after MDEV-17511 8.30 sec - MySQL-8.0 SET NAMES utf8 COLLATE utf8_unicode_ci; SET @a=CONCAT( 'a' , REPEAT( ' ' ,359)); SELECT BENCHMARK(500000, WEIGHT_STRING(@a,1024,960,128)); 0.85 sec MariaDB-10.4 before MDEV-17511 0.34 sec MariaDB-10.4 after MDEV-17511

            People

              bar Alexander Barkov
              bar Alexander Barkov
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.