Type:
Task
Priority:
Major
Resolution:
Unresolved
Fix Version/s:
None
The patch for MDEV-27266 introduced a new function my_uca_level_booster_simple_prefix_cmp(). It calculates the length of a simple prefix.
It efficiently skips equal prefixes consisting of simple mapping when one or two bytes produce one or two weights.
Whenever it comes to a position with non-simple mapping or with non-equal weight, it breaks and returns the length of the equal prefix.
The caller function after that initializes two instances of my_uca_scanner and goes to a less efficient comparison loop.
In cases when my_uca_level_booster_simple_prefix_cmp() returns because of a non-simple mapping found, going to the slower loop is OK.
But in cases when my_uca_level_booster_simple_prefix_cmp() breaks at some position because:
both strings have simple mappings at this position
but weights are different
going to the slower loop is not really necessary. The weights can be compared right inside my_uca_level_booster_simple_prefix_cmp().
Let's do the following:
1. Rename the function and change its signature from
static size_t
my_uca_level_booster_equal_prefix_length( const MY_UCA_LEVEL_BOOSTER *booster,
const uchar *s, size_t slen,
const uchar *t, size_t tlen)
to
static int
my_uca_level_booster_simple_prefix_cmp(my_uca_scanner *sscanner,
my_uca_scanner *tscanner,
const MY_UCA_LEVEL_BOOSTER *booster)
so the function returns the difference between two strings as follows:
a negative number if the string pointed by sscanner appeared to be slower
a positive number if the string pointed by sscanner appeared to be greater
0 if a complex mapping was found
The caller code will change as follows:
The two my_uca_scanner instances will now be initialized before calling my_uca_level_booster_simple_prefix_cmp()
On a non-zero value returned from my_uca_level_booster_simple_prefix_cmp(), the caller will exit
On zero returned my_uca_level_booster_simple_prefix_cmp() the caller will continue with a slower loop
The attached patch MDEV-32340 .v07.diff is a working prototype.
Benchmarking
Micro-benchmarking can be done using this SQL script:
-- Warning up
SET NAMES utf8mb4 COLLATE utf8mb4_uca1400_ai_ci;
DO BENCHMARK(10000000,strcmp( 'xxxx' , 'xxxx' ));
-- Benchmarking
DO BENCHMARK(10000000,strcmp( '' , '' ));
DO BENCHMARK(10000000,strcmp( 'a' , 'a' ));
DO BENCHMARK(10000000,strcmp( 'aa' , 'aa' ));
DO BENCHMARK(10000000,strcmp( 'aaa' , 'aaa' ));
DO BENCHMARK(10000000,strcmp( 'aaaa' , 'aaaa' ));
DO BENCHMARK(10000000,strcmp( 'aaaaa' , 'aaaaa' ));
DO BENCHMARK(10000000,strcmp( 'aaaaaa' , 'aaaaaa' ));
DO BENCHMARK(10000000,strcmp( 'aaaaaaa' , 'aaaaaaa' ));
DO BENCHMARK(10000000,strcmp( 'aaaaaaaa' , 'aaaaaaaa' ));
DO BENCHMARK(10000000,strcmp( 'aaaaaaaaa' , 'aaaaaaaaa' ));
DO BENCHMARK(10000000,strcmp( 'я' , 'я' ));
DO BENCHMARK(10000000,strcmp( 'яя' , 'яя' ));
DO BENCHMARK(10000000,strcmp( 'яяя' , 'яяя' ));
DO BENCHMARK(10000000,strcmp( 'яяяя' , 'яяяя' ));
DO BENCHMARK(10000000,strcmp( 'яяяяя' , 'яяяяя' ));
DO BENCHMARK(10000000,strcmp( 'ss' , 'ß' ));
DO BENCHMARK(10000000,strcmp( 'ssss' , 'ßß' ));
DO BENCHMARK(10000000,strcmp( 'ssssss' , 'ßßß' ));
DO BENCHMARK(10000000,strcmp( 'ắ' , 'ắ' ));
DO BENCHMARK(10000000,strcmp( 'ắắ' , 'ắắ' ));
DO BENCHMARK(10000000,strcmp( 'ắắắ' , 'ắắắ' ));
DO BENCHMARK(10000000,strcmp( 'ắắắắ' , 'ắắắắ' ));
DO BENCHMARK(10000000,strcmp( 'ắắắắắ' , 'ắắắắắ' ));
DO BENCHMARK(10000000,strcmp( '👍' , '👍' ));
DO BENCHMARK(10000000,strcmp( '👍👍' , '👍👍' ));
DO BENCHMARK(10000000,strcmp( '👍👍👍' , '👍👍👍' ));
DO BENCHMARK(10000000,strcmp( '👍👍👍👍' , '👍👍👍👍' ));
DO BENCHMARK(10000000,strcmp( '👍👍👍👍👍' , '👍👍👍👍👍' ));
Benchmarking shows a very good performance improvement on 1-byte and 2-byte characters with insignificant slowdown on 3-byte and 4-byte characters:
Column#1 - the function passed to BENCHMARK()
Column#2 - time spent before the change
Column#3 - time spent after the change
# 1-byte characters: Notable improvement
strcmp('','') 0.175 0.103
strcmp('a','a') 0.233 0.116
strcmp('aa','aa') 0.180 0.114
strcmp('aaa','aaa') 0.238 0.122
strcmp('aaaa','aaaa') 0.192 0.127
strcmp('aaaaa','aaaaa') 0.256 0.137
strcmp('aaaaaa','aaaaaa') 0.207 0.140
strcmp('aaaaaaa','aaaaaaa') 0.271 0.147
strcmp('aaaaaaaa','aaaaaaaa') 0.220 0.152
strcmp('aaaaaaaaa','aaaaaaaaa') 0.295 0.162
# 2-byte characters: Notable improvement
strcmp('я','я') 0.184 0.115
strcmp('яя','яя') 0.193 0.127
strcmp('яяя','яяя') 0.208 0.141
strcmp('яяяя','яяяя') 0.219 0.154
strcmp('яяяяя','яяяяя') 0.231 0.167
# Two 1-byte characters vs 2-byte character with expansion: Notable improvement
strcmp('ss','ß') 0.185 0.116
strcmp('ssss','ßß') 0.199 0.127
strcmp('ssssss','ßßß') 0.208 0.141
-- 3-byte characters: Insignificant slowdown
strcmp('ắ','ắ') 0.306 0.317
strcmp('ắắ','ắắ') 0.433 0.450
strcmp('ắắắ','ắắắ') 0.571 0.582
strcmp('ắắắắ','ắắắắ') 0.698 0.711
strcmp('ắắắắắ','ắắắắắ') 0.822 0.839
-- 4-byte characters: Insignificant slowdown
strcmp('👍','👍') 0.331 0.342
strcmp('👍👍','👍👍') 0.479 0.501
strcmp('👍👍👍','👍👍👍') 0.639 0.649
strcmp('👍👍👍👍','👍👍👍👍') 0.787 0.801
strcmp('👍👍👍👍👍','👍👍👍👍👍') 1.043 1.053
{"report":{"fcp":924.0999999046326,"ttfb":162.5,"pageVisibility":"visible","entityId":125517,"key":"jira.project.issue.view-issue","isInitial":true,"threshold":1000,"elementTimings":{},"userDeviceMemory":8,"userDeviceProcessors":64,"apdex":0.5,"journeyId":"bc55681a-9eaa-4153-8212-74be0d1961fc","navigationType":0,"readyForUser":1014.2999999523163,"redirectCount":0,"resourceLoadedEnd":1302.0999999046326,"resourceLoadedStart":171,"resourceTiming":[{"duration":314.90000009536743,"initiatorType":"link","name":"https://jira.mariadb.org/s/2c21342762a6a02add1c328bed317ffd-CDN/lu2bu7/820016/12ta74/0a8bac35585be7fc6c9cc5a0464cd4cf/_/download/contextbatch/css/_super/batch.css","startTime":171,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":171,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":485.90000009536743,"responseStart":0,"secureConnectionStart":0},{"duration":314.7000000476837,"initiatorType":"link","name":"https://jira.mariadb.org/s/7ebd35e77e471bc30ff0eba799ebc151-CDN/lu2bu7/820016/12ta74/8679b4946efa1a0bb029a3a22206fb5d/_/download/contextbatch/css/jira.browse.project,project.issue.navigator,jira.view.issue,jira.general,jira.global,atl.general,-_super/batch.css?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&slack-enabled=true","startTime":171.20000004768372,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":171.20000004768372,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":485.90000009536743,"responseStart":0,"secureConnectionStart":0},{"duration":324.19999980926514,"initiatorType":"script","name":"https://jira.mariadb.org/s/fbf975c0cce4b1abf04784eeae9ba1f4-CDN/lu2bu7/820016/12ta74/0a8bac35585be7fc6c9cc5a0464cd4cf/_/download/contextbatch/js/_super/batch.js?locale=en","startTime":171.40000009536743,"connectEnd":171.40000009536743,"connectStart":171.40000009536743,"domainLookupEnd":171.40000009536743,"domainLookupStart":171.40000009536743,"fetchStart":171.40000009536743,"redirectEnd":0,"redirectStart":0,"requestStart":171.40000009536743,"responseEnd":495.59999990463257,"responseStart":495.59999990463257,"secureConnectionStart":171.40000009536743},{"duration":350.7999999523163,"initiatorType":"script","name":"https://jira.mariadb.org/s/099b33461394b8015fc36c0a4b96e19f-CDN/lu2bu7/820016/12ta74/8679b4946efa1a0bb029a3a22206fb5d/_/download/contextbatch/js/jira.browse.project,project.issue.navigator,jira.view.issue,jira.general,jira.global,atl.general,-_super/batch.js?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&locale=en&slack-enabled=true","startTime":171.90000009536743,"connectEnd":171.90000009536743,"connectStart":171.90000009536743,"domainLookupEnd":171.90000009536743,"domainLookupStart":171.90000009536743,"fetchStart":171.90000009536743,"redirectEnd":0,"redirectStart":0,"requestStart":171.90000009536743,"responseEnd":522.7000000476837,"responseStart":522.7000000476837,"secureConnectionStart":171.90000009536743},{"duration":354.30000019073486,"initiatorType":"script","name":"https://jira.mariadb.org/s/94c15bff32baef80f4096a08aceae8bc-CDN/lu2bu7/820016/12ta74/c92c0caa9a024ae85b0ebdbed7fb4bd7/_/download/contextbatch/js/atl.global,-_super/batch.js?locale=en","startTime":172.09999990463257,"connectEnd":172.09999990463257,"connectStart":172.09999990463257,"domainLookupEnd":172.09999990463257,"domainLookupStart":172.09999990463257,"fetchStart":172.09999990463257,"redirectEnd":0,"redirectStart":0,"requestStart":172.09999990463257,"responseEnd":526.4000000953674,"responseStart":526.4000000953674,"secureConnectionStart":172.09999990463257},{"duration":354.80000019073486,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2bu7/820016/12ta74/1.0/_/download/batch/jira.webresources:calendar-en/jira.webresources:calendar-en.js","startTime":172.09999990463257,"connectEnd":172.09999990463257,"connectStart":172.09999990463257,"domainLookupEnd":172.09999990463257,"domainLookupStart":172.09999990463257,"fetchStart":172.09999990463257,"redirectEnd":0,"redirectStart":0,"requestStart":172.09999990463257,"responseEnd":526.9000000953674,"responseStart":526.9000000953674,"secureConnectionStart":172.09999990463257},{"duration":355.09999990463257,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2bu7/820016/12ta74/1.0/_/download/batch/jira.webresources:calendar-localisation-moment/jira.webresources:calendar-localisation-moment.js","startTime":172.20000004768372,"connectEnd":172.20000004768372,"connectStart":172.20000004768372,"domainLookupEnd":172.20000004768372,"domainLookupStart":172.20000004768372,"fetchStart":172.20000004768372,"redirectEnd":0,"redirectStart":0,"requestStart":172.20000004768372,"responseEnd":527.2999999523163,"responseStart":527.2999999523163,"secureConnectionStart":172.20000004768372},{"duration":355.7000000476837,"initiatorType":"link","name":"https://jira.mariadb.org/s/b04b06a02d1959df322d9cded3aeecc1-CDN/lu2bu7/820016/12ta74/a2ff6aa845ffc9a1d22fe23d9ee791fc/_/download/contextbatch/css/jira.global.look-and-feel,-_super/batch.css","startTime":172.29999995231628,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":172.29999995231628,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":528,"responseStart":0,"secureConnectionStart":0},{"duration":355.39999985694885,"initiatorType":"script","name":"https://jira.mariadb.org/rest/api/1.0/shortcuts/820016/47140b6e0a9bc2e4913da06536125810/shortcuts.js?context=issuenavigation&context=issueaction","startTime":172.40000009536743,"connectEnd":172.40000009536743,"connectStart":172.40000009536743,"domainLookupEnd":172.40000009536743,"domainLookupStart":172.40000009536743,"fetchStart":172.40000009536743,"redirectEnd":0,"redirectStart":0,"requestStart":172.40000009536743,"responseEnd":527.7999999523163,"responseStart":527.7999999523163,"secureConnectionStart":172.40000009536743},{"duration":355.7000000476837,"initiatorType":"link","name":"https://jira.mariadb.org/s/3ac36323ba5e4eb0af2aa7ac7211b4bb-CDN/lu2bu7/820016/12ta74/d176f0986478cc64f24226b3d20c140d/_/download/contextbatch/css/com.atlassian.jira.projects.sidebar.init,-_super,-project.issue.navigator,-jira.view.issue/batch.css?jira.create.linked.issue=true","startTime":172.5,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":172.5,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":528.2000000476837,"responseStart":0,"secureConnectionStart":0},{"duration":355.7000000476837,"initiatorType":"script","name":"https://jira.mariadb.org/s/3339d87fa2538a859872f2df449bf8d0-CDN/lu2bu7/820016/12ta74/d176f0986478cc64f24226b3d20c140d/_/download/contextbatch/js/com.atlassian.jira.projects.sidebar.init,-_super,-project.issue.navigator,-jira.view.issue/batch.js?jira.create.linked.issue=true&locale=en","startTime":172.59999990463257,"connectEnd":172.59999990463257,"connectStart":172.59999990463257,"domainLookupEnd":172.59999990463257,"domainLookupStart":172.59999990463257,"fetchStart":172.59999990463257,"redirectEnd":0,"redirectStart":0,"requestStart":172.59999990463257,"responseEnd":528.2999999523163,"responseStart":528.2999999523163,"secureConnectionStart":172.59999990463257},{"duration":1117.5,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2bu7/820016/12ta74/1.0/_/download/batch/jira.webresources:bigpipe-js/jira.webresources:bigpipe-js.js","startTime":183.70000004768372,"connectEnd":183.70000004768372,"connectStart":183.70000004768372,"domainLookupEnd":183.70000004768372,"domainLookupStart":183.70000004768372,"fetchStart":183.70000004768372,"redirectEnd":0,"redirectStart":0,"requestStart":183.70000004768372,"responseEnd":1301.2000000476837,"responseStart":1301.2000000476837,"secureConnectionStart":183.70000004768372},{"duration":1118.3999998569489,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2bu7/820016/12ta74/1.0/_/download/batch/jira.webresources:bigpipe-init/jira.webresources:bigpipe-init.js","startTime":183.70000004768372,"connectEnd":183.70000004768372,"connectStart":183.70000004768372,"domainLookupEnd":183.70000004768372,"domainLookupStart":183.70000004768372,"fetchStart":183.70000004768372,"redirectEnd":0,"redirectStart":0,"requestStart":183.70000004768372,"responseEnd":1302.0999999046326,"responseStart":1302.0999999046326,"secureConnectionStart":183.70000004768372},{"duration":570.7000000476837,"initiatorType":"xmlhttprequest","name":"https://jira.mariadb.org/rest/webResources/1.0/resources","startTime":726,"connectEnd":726,"connectStart":726,"domainLookupEnd":726,"domainLookupStart":726,"fetchStart":726,"redirectEnd":0,"redirectStart":0,"requestStart":726,"responseEnd":1296.7000000476837,"responseStart":1296.7000000476837,"secureConnectionStart":726},{"duration":440.89999985694885,"initiatorType":"script","name":"https://www.google-analytics.com/analytics.js","startTime":902.9000000953674,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":902.9000000953674,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":1343.7999999523163,"responseStart":0,"secureConnectionStart":0}],"fetchStart":0,"domainLookupStart":0,"domainLookupEnd":0,"connectStart":0,"connectEnd":0,"requestStart":3,"responseStart":162,"responseEnd":183,"domLoading":166,"domInteractive":1356,"domContentLoadedEventStart":1356,"domContentLoadedEventEnd":1395,"domComplete":2071,"loadEventStart":2071,"loadEventEnd":2072,"userAgent":"Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)","marks":[{"name":"bigPipe.sidebar-id.start","time":1341},{"name":"bigPipe.sidebar-id.end","time":1341.7999999523163},{"name":"bigPipe.activity-panel-pipe-id.start","time":1342},{"name":"bigPipe.activity-panel-pipe-id.end","time":1344.4000000953674},{"name":"activityTabFullyLoaded","time":1403.2000000476837}],"measures":[],"correlationId":"2ccd1565f2a040","effectiveType":"4g","downlink":9.5,"rtt":0,"serverDuration":89,"dbReadsTimeInMs":9,"dbConnsTimeInMs":16,"applicationHash":"9d11dbea5f4be3d4cc21f03a88dd11d8c8687422","experiments":[]}}