There are two points about collating performance:
1. scalability with thread number. As this machine can do 32 threads, performance would peak for 32 benchmark threads. Performance should stay stable for higher thread counts, but often doesn't. Example: utf8_general_ci
# data set 01 -> mariadb-10.1.36
|
# data set 02 -> mariadb-10.2.18
|
# data set 03 -> mariadb-10.3.9
|
# data set 04 -> mysql-5.6.36
|
# data set 05 -> mysql-5.7.21
|
# data set 06 -> mysql-8.0.11
|
|
# read-only (short ranges)
|
#thd 01 02 03 04 05 06
|
1 7308.8 8184.7 8020.1 6151.6 7234.8 7262.0
|
2 14090 15229 15308 12181 13546 14339
|
4 25085 28512 29831 21528 26559 27801
|
8 43274 51316 51550 39246 43164 51494
|
16 76782 97134 92415 71065 75132 98776
|
32 104524 143690 137077 100247 117379 144925
|
64 105551 121672 123965 101364 119981 144997
|
128 101807 123353 121965 101324 114877 144768
|
256 109214 122711 121956 100715 113369 144280
|
A clear peak at 32 threads for 10.3, but then it goes down.
2. the performance related to MySQL 8.0. In some cases also related to earlier MariaDB releases. Example above: 10.2 is faster than 10.3. 8.0 is a lot faster.
Both issues seem to be heavier for the unicode collations (compared to general) and for utf8mb compared to utf8.
Reproduced. Selecting distinct rows from short (10 rows) ranges is about the same speed in 10.2 and 10.3 while MySQL got much faster in 8.0 vs. 5.7. The effect is more visible in utf8mb4 and in the *_unicode_ci collations.
MariaDB spends a lot of cpu time in my_uca_scanner_next_any