Details
-
Bug
-
Status: In Review (View Workflow)
-
Critical
-
Resolution: Unresolved
-
11.8, 12.1, 12.2, 12.3
-
None
-
Related to performance
Description
utf8mb4 is about 21 percent slower than latin1 on the rt_order_ranges test of mariadb-benchmarks.
The below instruction tells how to reproduce the problem.
The installation step:
- Install a release build of 11.8
- Start the server. Note, you need max-length-for-sort-data=2048 to be set. With the default value of 1024 the results with utf8mb4 are much worse because data does not fit into the buffer so the server choses a different plan.
- Install sysbench, e.g.:
sudo dnf install sysbench
- Install mariadb-benchmarks
git clone https://github.com/hgxl64/mariadb-benchmarks
Data preparation
- Create two databases:
mariadb << END
CREATE DATABASE sbtest_latin1_char CHARACTER SET latin1;
CREATE DATABASE sbtest_utf8mb4_char CHARACTER SET utf8mb4;
END
- Change the current directory to the regressiontest/lua directory of mariadb-benchmarks
- Prepare the latin1 database for sysbench testing
sysbench rt_order_ranges.lua --tables=1 --table-size=1000000 \
--mysql-socket=/tmp/mysql.sock \
--mysql-user=bar --mysql-db=sbtest_latin1_char prepare
- Prepare the utf8mb4 database for sysbench testing
Notice we create the table with no records and then copy the records from the latin1 table, to make sure the data is equal:sysbench rt_order_ranges.lua --tables=1 --table-size=0 \
--mysql-socket=/tmp/mysql.sock \
--mysql-user=bar --mysql-db=sbtest_utf8mb4_char prepare
mariadb << END
INSERT INTO sbtest_utf8mb4_char.sbtest1
SELECT * FROM sbtest_latin1_char.sbtest1;
END
Preparing to run benchmarks
- Create a shell script run.sh with this code in regressiontest/lua directory of mariadb-benchmarks:
#!/bin/sh
echo # Running COMB $COMB
sysbench rt_order_ranges.lua --tables=1 --table-size=1000000 \
--rand-type=uniform --range-size=1000 --events=5000 --time=0 \
--mysql-socket=/tmp/mysql.sock \
--threads=36 \
--mysql-user=bar \
--mysql-db=sbtest_$1 \
run >$1.out
- Create a shell script run2.sh in the same directory:
#!/bin/bash
if [ x$1 = 'x' ]
then
COMB=latin1_char
else
COMB=$1
fi
echo # COMB=$COMB
export COMB
rm -rf res_$COMB
rm -f perf.data perf.script perf.data.old
perf record -a -F 99 -g -p $(pgrep -x mariadbd) -- ./run.sh $COMB 2>$COMB.out
perf script > perf.script
stackcollapse-perf.pl perf.script | flamegraph.pl > $COMB.svg
rm -f perf.data perf.script perf.data.old
mkdir res_$COMB
mv $COMB.out $COMB.svg res_$COMB
Running and analyzing benchmarks
- Run benchmarks:
./run2.sh latin1_char
./run2.sh utf8mb4_char
- Comparing results
grep queries: res*/*.out
On my desktop it display these results:
res_latin1_char/latin1_char.out: queries: 5036 (23382.17 per sec.)
|
res_utf8mb4_char/utf8mb4_char.out: queries: 5036 (19226.93 per sec.)
|
This means utf8mb4 is about 21 percent slower than latin1 (queries per second relation):
23382.17÷19226.93 = 1.216115625
|