[MDEV-17511] Improve performance for ORDER BY with a CHAR(N) CHARACTER SET utf8_unicode_ci - Jira

XML

Word

Printable

Details

Type: Task
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Fix Version/s: 10.4.0
Component/s: Character Sets
Labels:
None

Description

Note, this problem is repeatable with all UCA collations with PAD SPACE attribute. This MDEV uses utf8_uncide_ci as an example for explanation.

There is a bottleneck in these functions:

my_uca_strnxfrm_no_contractions_utf8mb3
my_uca_strnxfrm_onelevel_internal_no_contractions_utf8mb3
called from Field_string::sort_string() in this scenario:

CREATE OR REPLACE TABLE t1 (a CHAR(10) CHARACTER SET utf8 COLLATE utf8_unicode_ci);

INSERT INTO t1 VALUES ('a'),('b'),('c'),('d');

SELECT * FROM t1 ORDER BY a;

Generating weights for trailing spaces (which almost always present in case of CHAR) seems to be CPU hungry.
my_uca_strnxfrm_onelevel_internal_no_contractions_utf8mb3() scans trailing spaces as normal characters and so it calls my_uca_scanner_next_no_contractions_utf8mb3() for every trailing space and then calculate its weight using UCA weights.

It should be faster to trip trailing spaces in my_uca_strnxfrm_no_contractions_utf8mb3() before calling my_uca_strnxfrm_onelevel_internal_no_contractions_utf8mb3(). If we because of this change return a too short key, the caller will append weights for implicit spaces anyway, up to the desired key size. This will effectively generate exactly the same sortable key result.

Appending weights for implicit spaces is much less CPU hungry that a loop with scanner_next calls.

Attachments

Issue Links

blocks

MDEV-16413 test performance of distinct range queries

Closed

Activity

People

Assignee:: Alexander Barkov

Reporter:: Alexander Barkov

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 2018-10-21 00:59

Updated:: 2018-10-21 17:40

Resolved:: 2018-10-21 17:40

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

0.5d

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.