|
The patch for MDEV-27266 introduced a new function my_uca_level_booster_simple_prefix_cmp(). It calculates the length of a simple prefix.
It efficiently skips equal prefixes consisting of simple mapping when one or two bytes produce one or two weights.
Whenever it comes to a position with non-simple mapping or with non-equal weight, it breaks and returns the length of the equal prefix.
The caller function after that initializes two instances of my_uca_scanner and goes to a less efficient comparison loop.
In cases when my_uca_level_booster_simple_prefix_cmp() returns because of a non-simple mapping found, going to the slower loop is OK.
But in cases when my_uca_level_booster_simple_prefix_cmp() breaks at some position because:
- both strings have simple mappings at this position
- but weights are different
going to the slower loop is not really necessary. The weights can be compared right inside my_uca_level_booster_simple_prefix_cmp().
Let's do the following:
1. Rename the function and change its signature from
static size_t
|
my_uca_level_booster_equal_prefix_length(const MY_UCA_LEVEL_BOOSTER *booster,
|
const uchar *s, size_t slen,
|
const uchar *t, size_t tlen)
|
to
static int
|
my_uca_level_booster_simple_prefix_cmp(my_uca_scanner *sscanner,
|
my_uca_scanner *tscanner,
|
const MY_UCA_LEVEL_BOOSTER *booster)
|
so the function returns the difference between two strings as follows:
- a negative number if the string pointed by sscanner appeared to be slower
- a positive number if the string pointed by sscanner appeared to be greater
- 0 if a complex mapping was found
The caller code will change as follows:
- The two my_uca_scanner instances will now be initialized before calling my_uca_level_booster_simple_prefix_cmp()
- On a non-zero value returned from my_uca_level_booster_simple_prefix_cmp(), the caller will exit
- On zero returned my_uca_level_booster_simple_prefix_cmp() the caller will continue with a slower loop
The attached patch MDEV-32340.v07.diff is a working prototype.
Benchmarking
Micro-benchmarking can be done using this SQL script:
-- Warning up
|
SET NAMES utf8mb4 COLLATE utf8mb4_uca1400_ai_ci;
|
DO BENCHMARK(10000000,strcmp('xxxx','xxxx'));
|
|
-- Benchmarking
|
DO BENCHMARK(10000000,strcmp('',''));
|
DO BENCHMARK(10000000,strcmp('a','a'));
|
DO BENCHMARK(10000000,strcmp('aa','aa'));
|
DO BENCHMARK(10000000,strcmp('aaa','aaa'));
|
DO BENCHMARK(10000000,strcmp('aaaa','aaaa'));
|
DO BENCHMARK(10000000,strcmp('aaaaa','aaaaa'));
|
DO BENCHMARK(10000000,strcmp('aaaaaa','aaaaaa'));
|
DO BENCHMARK(10000000,strcmp('aaaaaaa','aaaaaaa'));
|
DO BENCHMARK(10000000,strcmp('aaaaaaaa','aaaaaaaa'));
|
DO BENCHMARK(10000000,strcmp('aaaaaaaaa','aaaaaaaaa'));
|
DO BENCHMARK(10000000,strcmp('я','я'));
|
DO BENCHMARK(10000000,strcmp('яя','яя'));
|
DO BENCHMARK(10000000,strcmp('яяя','яяя'));
|
DO BENCHMARK(10000000,strcmp('яяяя','яяяя'));
|
DO BENCHMARK(10000000,strcmp('яяяяя','яяяяя'));
|
DO BENCHMARK(10000000,strcmp('ss','ß'));
|
DO BENCHMARK(10000000,strcmp('ssss','ßß'));
|
DO BENCHMARK(10000000,strcmp('ssssss','ßßß'));
|
DO BENCHMARK(10000000,strcmp('ắ','ắ'));
|
DO BENCHMARK(10000000,strcmp('ắắ','ắắ'));
|
DO BENCHMARK(10000000,strcmp('ắắắ','ắắắ'));
|
DO BENCHMARK(10000000,strcmp('ắắắắ','ắắắắ'));
|
DO BENCHMARK(10000000,strcmp('ắắắắắ','ắắắắắ'));
|
DO BENCHMARK(10000000,strcmp('',''));
|
DO BENCHMARK(10000000,strcmp('',''));
|
DO BENCHMARK(10000000,strcmp('',''));
|
DO BENCHMARK(10000000,strcmp('',''));
|
DO BENCHMARK(10000000,strcmp('',''));
|
Benchmarking shows a very good performance improvement on 1-byte and 2-byte characters with insignificant slowdown on 3-byte and 4-byte characters:
- Column#1 - the function passed to BENCHMARK()
- Column#2 - time spent before the change
- Column#3 - time spent after the change
# 1-byte characters: Notable improvement
|
strcmp('','') 0.175 0.103
|
strcmp('a','a') 0.233 0.116
|
strcmp('aa','aa') 0.180 0.114
|
strcmp('aaa','aaa') 0.238 0.122
|
strcmp('aaaa','aaaa') 0.192 0.127
|
strcmp('aaaaa','aaaaa') 0.256 0.137
|
strcmp('aaaaaa','aaaaaa') 0.207 0.140
|
strcmp('aaaaaaa','aaaaaaa') 0.271 0.147
|
strcmp('aaaaaaaa','aaaaaaaa') 0.220 0.152
|
strcmp('aaaaaaaaa','aaaaaaaaa') 0.295 0.162
|
|
# 2-byte characters: Notable improvement
|
strcmp('я','я') 0.184 0.115
|
strcmp('яя','яя') 0.193 0.127
|
strcmp('яяя','яяя') 0.208 0.141
|
strcmp('яяяя','яяяя') 0.219 0.154
|
strcmp('яяяяя','яяяяя') 0.231 0.167
|
|
# Two 1-byte characters vs 2-byte character with expansion: Notable improvement
|
strcmp('ss','ß') 0.185 0.116
|
strcmp('ssss','ßß') 0.199 0.127
|
strcmp('ssssss','ßßß') 0.208 0.141
|
|
-- 3-byte characters: Insignificant slowdown
|
strcmp('ắ','ắ') 0.306 0.317
|
strcmp('ắắ','ắắ') 0.433 0.450
|
strcmp('ắắắ','ắắắ') 0.571 0.582
|
strcmp('ắắắắ','ắắắắ') 0.698 0.711
|
strcmp('ắắắắắ','ắắắắắ') 0.822 0.839
|
|
-- 4-byte characters: Insignificant slowdown
|
strcmp('','') 0.331 0.342
|
strcmp('','') 0.479 0.501
|
strcmp('','') 0.639 0.649
|
strcmp('','') 0.787 0.801
|
strcmp('','') 1.043 1.053
|
|