[MDEV-37863] utf8mb4 is 21% slower on rt_order_ranges benchmark - Jira

XML

Word

Printable

Details

Type: Bug
Status: In Review (View Workflow)
Priority: Critical
Resolution: Unresolved
Affects Version/s: 11.8, 12.1(EOL), 12.2, 12.3
Fix Version/s: 12.3
Component/s: OTHER
Labels:
None

Bug Category:
Related to performance
Sprint:
Q2/2026 Server Maintenance

Description

utf8mb4 is about 21 percent slower than latin1 on the rt_order_ranges test of mariadb-benchmarks.

The below instruction tells how to reproduce the problem.

The installation step:

Install a release build of 11.8
Start the server. Note, you need max-length-for-sort-data=2048 to be set. With the default value of 1024 the results with utf8mb4 are much worse because data does not fit into the buffer so the server choses a different plan.
Install sysbench, e.g.:
sudo dnf install sysbench

Install mariadb-benchmarks

git clone https://github.com/hgxl64/mariadb-benchmarks

Data preparation

Create two databases:

mariadb << END

CREATE DATABASE sbtest_latin1_char CHARACTER SET latin1;

CREATE DATABASE sbtest_utf8mb4_char CHARACTER SET utf8mb4;

END

Change the current directory to the regressiontest/lua directory of mariadb-benchmarks

Prepare the latin1 database for sysbench testing

sysbench rt_order_ranges.lua --tables=1 --table-size=1000000 \

--mysql-socket=/tmp/mysql.sock \

--mysql-user=bar --mysql-db=sbtest_latin1_char prepare

Prepare the utf8mb4 database for sysbench testing
Notice we create the table with no records and then copy the records from the latin1 table, to make sure the data is equal:

sysbench rt_order_ranges.lua --tables=1 --table-size=0 \

--mysql-socket=/tmp/mysql.sock \

--mysql-user=bar --mysql-db=sbtest_utf8mb4_char prepare

mariadb << END

INSERT INTO sbtest_utf8mb4_char.sbtest1

  SELECT * FROM sbtest_latin1_char.sbtest1;

END

Preparing to run benchmarks

Create a shell script run.sh with this code in regressiontest/lua directory of mariadb-benchmarks:

#!/bin/sh

echo # Running COMB $COMB

sysbench rt_order_ranges.lua --tables=1 --table-size=1000000 \

 --rand-type=uniform --range-size=1000 --events=5000 --time=0 \

 --mysql-socket=/tmp/mysql.sock \

 --threads=36 \

 --mysql-user=bar \

 --mysql-db=sbtest_$1 \

  run >$1.out

Create a shell script run2.sh in the same directory:

#!/bin/bash

if [ x$1 = 'x' ]

then

  COMB=latin1_char

else

  COMB=$1

fi

echo # COMB=$COMB

export COMB

rm -rf res_$COMB

rm -f perf.data perf.script perf.data.old

perf record -a -F 99 -g -p $(pgrep -x mariadbd) -- ./run.sh $COMB 2>$COMB.out

perf script > perf.script

stackcollapse-perf.pl perf.script | flamegraph.pl > $COMB.svg

rm -f perf.data perf.script perf.data.old

mkdir res_$COMB

mv $COMB.out $COMB.svg res_$COMB

Running and analyzing benchmarks

Run benchmarks:

./run2.sh latin1_char

./run2.sh utf8mb4_char

Comparing results
grep queries: res*/*.out

On my desktop it display these results:

res_latin1_char/latin1_char.out:    queries:      5036   (23382.17 per sec.)

res_utf8mb4_char/utf8mb4_char.out:    queries:    5036   (19226.93 per sec.)

This means utf8mb4 is about 21 percent slower than latin1 (queries per second relation):

23382.17÷19226.93 = 1.216115625

Attachments

Activity

People

Assignee:: Sergei Golubchik

Reporter:: Alexander Barkov

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 2025-10-15 05:44

Updated:: 2026-01-23 15:54

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.