[MDEV-32340] Improve performance of my_uca_level_booster_simple_prefix_cmp Created: 2023-10-03  Updated: 2023-12-22

Status: Open
Project: MariaDB Server
Component/s: Character Sets
Fix Version/s: 11.5

Type: Task Priority: Major
Reporter: Alexander Barkov Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None

Attachments: File MDEV-32340.v07.diff    
Issue Links:
Relates
relates to MDEV-27265 Improve contraction performance in UC... Closed
relates to MDEV-27266 Improve UCA collation performance for... Closed

 Description   

The patch for MDEV-27266 introduced a new function my_uca_level_booster_simple_prefix_cmp(). It calculates the length of a simple prefix.
It efficiently skips equal prefixes consisting of simple mapping when one or two bytes produce one or two weights.

Whenever it comes to a position with non-simple mapping or with non-equal weight, it breaks and returns the length of the equal prefix.
The caller function after that initializes two instances of my_uca_scanner and goes to a less efficient comparison loop.

In cases when my_uca_level_booster_simple_prefix_cmp() returns because of a non-simple mapping found, going to the slower loop is OK.

But in cases when my_uca_level_booster_simple_prefix_cmp() breaks at some position because:

  • both strings have simple mappings at this position
  • but weights are different
    going to the slower loop is not really necessary. The weights can be compared right inside my_uca_level_booster_simple_prefix_cmp().

Let's do the following:
1. Rename the function and change its signature from

static size_t
my_uca_level_booster_equal_prefix_length(const MY_UCA_LEVEL_BOOSTER *booster,
                                         const uchar *s, size_t slen,
                                         const uchar *t, size_t tlen)

to

static int
my_uca_level_booster_simple_prefix_cmp(my_uca_scanner *sscanner,
                                       my_uca_scanner *tscanner,
                                       const MY_UCA_LEVEL_BOOSTER *booster)

so the function returns the difference between two strings as follows:

  • a negative number if the string pointed by sscanner appeared to be slower
  • a positive number if the string pointed by sscanner appeared to be greater
  • 0 if a complex mapping was found

The caller code will change as follows:

  • The two my_uca_scanner instances will now be initialized before calling my_uca_level_booster_simple_prefix_cmp()
  • On a non-zero value returned from my_uca_level_booster_simple_prefix_cmp(), the caller will exit
  • On zero returned my_uca_level_booster_simple_prefix_cmp() the caller will continue with a slower loop

The attached patch MDEV-32340.v07.diff is a working prototype.

Benchmarking

Micro-benchmarking can be done using this SQL script:

-- Warning up
SET NAMES utf8mb4 COLLATE utf8mb4_uca1400_ai_ci;
DO BENCHMARK(10000000,strcmp('xxxx','xxxx'));
 
-- Benchmarking
DO BENCHMARK(10000000,strcmp('',''));
DO BENCHMARK(10000000,strcmp('a','a'));
DO BENCHMARK(10000000,strcmp('aa','aa'));
DO BENCHMARK(10000000,strcmp('aaa','aaa'));
DO BENCHMARK(10000000,strcmp('aaaa','aaaa'));
DO BENCHMARK(10000000,strcmp('aaaaa','aaaaa'));
DO BENCHMARK(10000000,strcmp('aaaaaa','aaaaaa'));
DO BENCHMARK(10000000,strcmp('aaaaaaa','aaaaaaa'));
DO BENCHMARK(10000000,strcmp('aaaaaaaa','aaaaaaaa'));
DO BENCHMARK(10000000,strcmp('aaaaaaaaa','aaaaaaaaa'));
DO BENCHMARK(10000000,strcmp('я','я'));
DO BENCHMARK(10000000,strcmp('яя','яя'));
DO BENCHMARK(10000000,strcmp('яяя','яяя'));
DO BENCHMARK(10000000,strcmp('яяяя','яяяя'));
DO BENCHMARK(10000000,strcmp('яяяяя','яяяяя'));
DO BENCHMARK(10000000,strcmp('ss','ß'));
DO BENCHMARK(10000000,strcmp('ssss','ßß'));
DO BENCHMARK(10000000,strcmp('ssssss','ßßß'));
DO BENCHMARK(10000000,strcmp('ắ','ắ'));
DO BENCHMARK(10000000,strcmp('ắắ','ắắ'));
DO BENCHMARK(10000000,strcmp('ắắắ','ắắắ'));
DO BENCHMARK(10000000,strcmp('ắắắắ','ắắắắ'));
DO BENCHMARK(10000000,strcmp('ắắắắắ','ắắắắắ'));
DO BENCHMARK(10000000,strcmp('��','��'));
DO BENCHMARK(10000000,strcmp('����','����'));
DO BENCHMARK(10000000,strcmp('������','������'));
DO BENCHMARK(10000000,strcmp('��������','��������'));
DO BENCHMARK(10000000,strcmp('����������','����������'));

Benchmarking shows a very good performance improvement on 1-byte and 2-byte characters with insignificant slowdown on 3-byte and 4-byte characters:

  • Column#1 - the function passed to BENCHMARK()
  • Column#2 - time spent before the change
  • Column#3 - time spent after the change

# 1-byte characters: Notable improvement
strcmp('','')                      0.175  0.103
strcmp('a','a')                    0.233  0.116
strcmp('aa','aa')                  0.180  0.114
strcmp('aaa','aaa')                0.238  0.122
strcmp('aaaa','aaaa')              0.192  0.127
strcmp('aaaaa','aaaaa')            0.256  0.137
strcmp('aaaaaa','aaaaaa')          0.207  0.140
strcmp('aaaaaaa','aaaaaaa')        0.271  0.147 
strcmp('aaaaaaaa','aaaaaaaa')      0.220  0.152 
strcmp('aaaaaaaaa','aaaaaaaaa')    0.295  0.162
 
# 2-byte characters: Notable improvement
strcmp('я','я')                    0.184 0.115
strcmp('яя','яя')                  0.193 0.127
strcmp('яяя','яяя')                0.208 0.141
strcmp('яяяя','яяяя')              0.219 0.154
strcmp('яяяяя','яяяяя')            0.231 0.167
 
# Two 1-byte characters vs 2-byte character with expansion: Notable improvement
strcmp('ss','ß')                   0.185 0.116
strcmp('ssss','ßß')                0.199 0.127
strcmp('ssssss','ßßß')             0.208 0.141
 
-- 3-byte characters: Insignificant slowdown
strcmp('ắ','ắ')                    0.306 0.317
strcmp('ắắ','ắắ')                  0.433 0.450
strcmp('ắắắ','ắắắ')                0.571 0.582
strcmp('ắắắắ','ắắắắ')              0.698 0.711
strcmp('ắắắắắ','ắắắắắ')            0.822 0.839
 
-- 4-byte characters: Insignificant slowdown
strcmp('��','��')                  0.331 0.342
strcmp('����','����')              0.479 0.501
strcmp('������','������')          0.639 0.649
strcmp('��������','��������')      0.787 0.801
strcmp('����������','����������')  1.043 1.053


Generated at Thu Feb 08 10:30:36 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.