[MDEV-17534] Implement fast path for ASCII range in strnxfrm_onelevel_internal() Created: 2018-10-24  Updated: 2018-10-24  Resolved: 2018-10-24

Status: Closed
Project: MariaDB Server
Component/s: Character Sets
Fix Version/s: 10.4.0

Type: Task Priority: Major
Reporter: Alexander Barkov Assignee: Alexander Barkov
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Blocks
blocks MDEV-16413 test performance of distinct range qu... Closed

 Description   

During benchmarking, Axel revealed a bottleneck inside MY_FUNCTION_NAME(strnxfrm_onelevel_internal) in ctype-uca.ic.

The problem is that calling my_uca_scanner_init_any() followed by a loop of MY_FUNCTION_NAME(scanner_next)() calls is costly.

Under terms of this task we'll add a fast path for the ASCII range into this strnxfrm_onelevel_internal().

  • In the best scenario, when the input string is a pure ASCII, the execution will not even reach my_uca_scanner_init_any().
  • In a worse scenario, when there is an ASCII prefix followed by a non-ASCII suffix, the ASCII part will be processed faster.


 Comments   
Comment by Alexander Barkov [ 2018-10-24 ]

Micro-benchmark statistics

SET @a=CONCAT(REPEAT(_utf8'a' COLLATE utf8_unicode_ci, 360));
SELECT BENCHMARK(500000, WEIGHT_STRING(@a,1024,960,128));

Macro-benchmark statistics for short range ORDER BY queries

DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (pk SERIAL, field CHAR(120) CHARACTER SET utf8 COLLATE utf8_unicode_ci);
INSERT t1 (field)
WITH  RECURSIVE int_seq AS (
  SELECT 1 AS val
  UNION ALL
  SELECT val + 1
  FROM int_seq
  WHERE val < 1000
) SELECT CONCAT('a',val%255,REPEAT('b',100)) FROM int_seq;
DROP PROCEDURE IF EXISTS p1;
DELIMITER $$
CREATE PROCEDURE p1()
BEGIN
  DECLARE a INT DEFAULT 100000;
  WHILE (a > 0)
  DO
    SELECT field INTO @a FROM t1 WHERE pk BETWEEN 1 AND 11 ORDER BY field LIMIT 1;
    SET a=a-1;
  END WHILE;
END;
$$
DELIMITER ;
CALL p1;

Generated at Thu Feb 08 08:37:12 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.