Details
-
Task
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
None
-
None
Description
The patch for MDEV-27266 introduced a new function my_uca_level_booster_simple_prefix_cmp(). It calculates the length of a simple prefix.
It efficiently skips equal prefixes consisting of simple mapping when one or two bytes produce one or two weights.
Whenever it comes to a position with non-simple mapping or with non-equal weight, it breaks and returns the length of the equal prefix.
The caller function after that initializes two instances of my_uca_scanner and goes to a less efficient comparison loop.
In cases when my_uca_level_booster_simple_prefix_cmp() returns because of a non-simple mapping found, going to the slower loop is OK.
But in cases when my_uca_level_booster_simple_prefix_cmp() breaks at some position because:
- both strings have simple mappings at this position
- but weights are different
going to the slower loop is not really necessary. The weights can be compared right inside my_uca_level_booster_simple_prefix_cmp().
Let's do the following:
1. Rename the function and change its signature from
static size_t |
my_uca_level_booster_equal_prefix_length(const MY_UCA_LEVEL_BOOSTER *booster, |
const uchar *s, size_t slen, |
const uchar *t, size_t tlen) |
to
static int |
my_uca_level_booster_simple_prefix_cmp(my_uca_scanner *sscanner,
|
my_uca_scanner *tscanner,
|
const MY_UCA_LEVEL_BOOSTER *booster) |
so the function returns the difference between two strings as follows:
- a negative number if the string pointed by sscanner appeared to be slower
- a positive number if the string pointed by sscanner appeared to be greater
- 0 if a complex mapping was found
The caller code will change as follows:
- The two my_uca_scanner instances will now be initialized before calling my_uca_level_booster_simple_prefix_cmp()
- On a non-zero value returned from my_uca_level_booster_simple_prefix_cmp(), the caller will exit
- On zero returned my_uca_level_booster_simple_prefix_cmp() the caller will continue with a slower loop
The attached patch MDEV-32340.v07.diff is a working prototype.
Benchmarking
Micro-benchmarking can be done using this SQL script:
-- Warning up
|
SET NAMES utf8mb4 COLLATE utf8mb4_uca1400_ai_ci; |
DO BENCHMARK(10000000,strcmp('xxxx','xxxx')); |
|
-- Benchmarking
|
DO BENCHMARK(10000000,strcmp('','')); |
DO BENCHMARK(10000000,strcmp('a','a')); |
DO BENCHMARK(10000000,strcmp('aa','aa')); |
DO BENCHMARK(10000000,strcmp('aaa','aaa')); |
DO BENCHMARK(10000000,strcmp('aaaa','aaaa')); |
DO BENCHMARK(10000000,strcmp('aaaaa','aaaaa')); |
DO BENCHMARK(10000000,strcmp('aaaaaa','aaaaaa')); |
DO BENCHMARK(10000000,strcmp('aaaaaaa','aaaaaaa')); |
DO BENCHMARK(10000000,strcmp('aaaaaaaa','aaaaaaaa')); |
DO BENCHMARK(10000000,strcmp('aaaaaaaaa','aaaaaaaaa')); |
DO BENCHMARK(10000000,strcmp('я','я')); |
DO BENCHMARK(10000000,strcmp('яя','яя')); |
DO BENCHMARK(10000000,strcmp('яяя','яяя')); |
DO BENCHMARK(10000000,strcmp('яяяя','яяяя')); |
DO BENCHMARK(10000000,strcmp('яяяяя','яяяяя')); |
DO BENCHMARK(10000000,strcmp('ss','ß')); |
DO BENCHMARK(10000000,strcmp('ssss','ßß')); |
DO BENCHMARK(10000000,strcmp('ssssss','ßßß')); |
DO BENCHMARK(10000000,strcmp('ắ','ắ')); |
DO BENCHMARK(10000000,strcmp('ắắ','ắắ')); |
DO BENCHMARK(10000000,strcmp('ắắắ','ắắắ')); |
DO BENCHMARK(10000000,strcmp('ắắắắ','ắắắắ')); |
DO BENCHMARK(10000000,strcmp('ắắắắắ','ắắắắắ')); |
DO BENCHMARK(10000000,strcmp('👍','👍')); |
DO BENCHMARK(10000000,strcmp('👍👍','👍👍')); |
DO BENCHMARK(10000000,strcmp('👍👍👍','👍👍👍')); |
DO BENCHMARK(10000000,strcmp('👍👍👍👍','👍👍👍👍')); |
DO BENCHMARK(10000000,strcmp('👍👍👍👍👍','👍👍👍👍👍')); |
Benchmarking shows a very good performance improvement on 1-byte and 2-byte characters with insignificant slowdown on 3-byte and 4-byte characters:
- Column#1 - the function passed to BENCHMARK()
- Column#2 - time spent before the change
- Column#3 - time spent after the change
# 1-byte characters: Notable improvement
|
strcmp('','') 0.175 0.103
|
strcmp('a','a') 0.233 0.116
|
strcmp('aa','aa') 0.180 0.114
|
strcmp('aaa','aaa') 0.238 0.122
|
strcmp('aaaa','aaaa') 0.192 0.127
|
strcmp('aaaaa','aaaaa') 0.256 0.137
|
strcmp('aaaaaa','aaaaaa') 0.207 0.140
|
strcmp('aaaaaaa','aaaaaaa') 0.271 0.147
|
strcmp('aaaaaaaa','aaaaaaaa') 0.220 0.152
|
strcmp('aaaaaaaaa','aaaaaaaaa') 0.295 0.162
|
|
# 2-byte characters: Notable improvement
|
strcmp('я','я') 0.184 0.115
|
strcmp('яя','яя') 0.193 0.127
|
strcmp('яяя','яяя') 0.208 0.141
|
strcmp('яяяя','яяяя') 0.219 0.154
|
strcmp('яяяяя','яяяяя') 0.231 0.167
|
|
# Two 1-byte characters vs 2-byte character with expansion: Notable improvement
|
strcmp('ss','ß') 0.185 0.116
|
strcmp('ssss','ßß') 0.199 0.127
|
strcmp('ssssss','ßßß') 0.208 0.141
|
|
-- 3-byte characters: Insignificant slowdown
|
strcmp('ắ','ắ') 0.306 0.317
|
strcmp('ắắ','ắắ') 0.433 0.450
|
strcmp('ắắắ','ắắắ') 0.571 0.582
|
strcmp('ắắắắ','ắắắắ') 0.698 0.711
|
strcmp('ắắắắắ','ắắắắắ') 0.822 0.839
|
|
-- 4-byte characters: Insignificant slowdown
|
strcmp('👍','👍') 0.331 0.342
|
strcmp('👍👍','👍👍') 0.479 0.501
|
strcmp('👍👍👍','👍👍👍') 0.639 0.649
|
strcmp('👍👍👍👍','👍👍👍👍') 0.787 0.801
|
strcmp('👍👍👍👍👍','👍👍👍👍👍') 1.043 1.053
|
Attachments
Issue Links
- relates to
-
MDEV-27265 Improve contraction performance in UCA collations
- Closed
-
MDEV-27266 Improve UCA collation performance for utf8mb3 and utf8mb4
- Closed