Details
-
Bug
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
10.5.10, 10.5.11
-
None
-
None
-
None
-
CentOS 7
Description
After recently upgrading MariaDB we started seeing some of our queries' performance significantly decreased.
Given these two tables
MariaDB [(none)]> show create table `region`.`hr_blocked_by_users` \G
|
*************************** 1. row ***************************
|
Table: hr_blocked_by_users
|
Create Table: CREATE TABLE `hr_blocked_by_users` (
|
`user_id` int(11) unsigned NOT NULL,
|
`hr_id` int(11) unsigned NOT NULL,
|
`date` int(10) unsigned NOT NULL DEFAULT 0,
|
PRIMARY KEY (`user_id`,`hr_id`),
|
KEY `user_id__date` (`user_id`,`date`),
|
KEY `hr_id` (`hr_id`)
|
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=DYNAMIC
|
1 row in set (0.01 sec)
|
|
|
MariaDB [(none)]> show create table `users_ru`.`reg_users_resumes` \G
|
*************************** 1. row ***************************
|
Table: reg_users_resumes
|
Create Table: CREATE TABLE `reg_users_resumes` (
|
`id_user` int(10) unsigned NOT NULL DEFAULT 0,
|
`id_resume` int(10) unsigned NOT NULL DEFAULT 0,
|
PRIMARY KEY (`id_resume`),
|
KEY `id_user` (`id_user`)
|
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=DYNAMIC
|
1 row in set (0.00 sec)
|
and a query
SELECT
|
`rur`.`id_resume`
|
FROM
|
`region`.`hr_blocked_by_users` `blocked`
|
JOIN `users_ru`.`reg_users_resumes` `rur` ON blocked.user_id = rur.id_user
|
WHERE
|
blocked.hr_id = '3675742'
|
we've seen 30-80% slowdown after upgrading from 10.3.22 to 10.5.10.
Query plans seem almost identical, but execution time is much faster for 10.3.22.
-- 10.3.22
|
ANALYZE FORMAT=JSON SELECT SQL_NO_CACHE `rur`.`id_resume` FROM region.`hr_blocked_by_users` `blocked` JOIN users_ru.`reg_users_resumes` `rur` ON blocked.user_id = rur.id_user WHERE blocked.hr_id = '3018027'\G
|
*************************** 1. row ***************************
|
ANALYZE: {
|
"query_block": {
|
"select_id": 1,
|
"r_loops": 1,
|
"r_total_time_ms": 93.691,
|
"table": {
|
"table_name": "blocked",
|
"access_type": "ref",
|
"possible_keys": ["PRIMARY", "user_id__date", "hr_id"],
|
"key": "hr_id",
|
"key_length": "4",
|
"used_key_parts": ["hr_id"],
|
"ref": ["const"],
|
"r_loops": 1,
|
"rows": 69858,
|
"r_rows": 30940,
|
"r_total_time_ms": 5.2686,
|
"filtered": 100,
|
"r_filtered": 100,
|
"using_index": true
|
},
|
"table": {
|
"table_name": "rur",
|
"access_type": "ref",
|
"possible_keys": ["id_user"],
|
"key": "id_user",
|
"key_length": "4",
|
"used_key_parts": ["id_user"],
|
"ref": ["region.blocked.user_id"],
|
"r_loops": 30940,
|
"rows": 1,
|
"r_rows": 1.4677,
|
"r_total_time_ms": 82.421,
|
"filtered": 100,
|
"r_filtered": 100,
|
"using_index": true
|
}
|
}
|
}
|
-- 10.5.10
|
ANALYZE FORMAT=JSON SELECT SQL_NO_CACHE `rur`.`id_resume` FROM region.`hr_blocked_by_users` `blocked` JOIN users_ru.`reg_users_resumes` `rur` ON blocked.user_id = rur.id_user WHERE blocked.hr_id = '3018027'\G
|
*************************** 1. row ***************************
|
ANALYZE: {
|
"query_block": {
|
"select_id": 1,
|
"r_loops": 1,
|
"r_total_time_ms": 168.1655,
|
"table": {
|
"table_name": "blocked",
|
"access_type": "ref",
|
"possible_keys": ["PRIMARY", "user_id__date", "hr_id"],
|
"key": "hr_id",
|
"key_length": "4",
|
"used_key_parts": ["hr_id"],
|
"ref": ["const"],
|
"r_loops": 1,
|
"rows": 66354,
|
"r_rows": 30940,
|
"r_table_time_ms": 13.15574,
|
"r_other_time_ms": 3.069465273,
|
"filtered": 100,
|
"r_filtered": 100,
|
"using_index": true
|
},
|
"table": {
|
"table_name": "rur",
|
"access_type": "ref",
|
"possible_keys": ["id_user"],
|
"key": "id_user",
|
"key_length": "4",
|
"used_key_parts": ["id_user"],
|
"ref": ["region.blocked.user_id"],
|
"r_loops": 30940,
|
"rows": 1,
|
"r_rows": 1.4677117,
|
"r_table_time_ms": 145.4823186,
|
"r_other_time_ms": 6.440280838,
|
"filtered": 100,
|
"r_filtered": 100,
|
"using_index": true
|
}
|
}
|
}
|
Also, in both cases almost all of the accounted time in a query profile is attributed to "Sending data" phase.
-- 10.3.22
|
SHOW PROFILE ALL;
|
+------------------------+----------+----------+------------+-------------------+---------------------+--------------+---------------+---------------+-------------------+-------------------+-------------------+-------+-----------------+---------------+-------------+
|
| Status | Duration | CPU_user | CPU_system | Context_voluntary | Context_involuntary | Block_ops_in | Block_ops_out | Messages_sent | Messages_received | Page_faults_major | Page_faults_minor | Swaps | Source_function | Source_file | Source_line |
|
+------------------------+----------+----------+------------+-------------------+---------------------+--------------+---------------+---------------+-------------------+-------------------+-------------------+-------+-----------------+---------------+-------------+
|
| Starting | 0.000072 | 0.000029 | 0.000030 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | NULL | NULL | NULL |
|
| Checking permissions | 0.000015 | 0.000011 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | <unknown> | sql_parse.cc | 6506 |
|
| Opening tables | 0.000022 | 0.000022 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | <unknown> | sql_base.cc | 4076 |
|
| After opening tables | 0.000007 | 0.000007 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | <unknown> | sql_base.cc | 4326 |
|
| System lock | 0.000007 | 0.000007 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | <unknown> | lock.cc | 335 |
|
| Table lock | 0.000007 | 0.000007 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | <unknown> | lock.cc | 340 |
|
| Init | 0.000020 | 0.000020 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | <unknown> | sql_select.cc | 4273 |
|
| Optimizing | 0.000021 | 0.000021 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | <unknown> | sql_select.cc | 1552 |
|
| Statistics | 0.000079 | 0.000050 | 0.000030 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 0 | <unknown> | sql_select.cc | 1940 |
|
| Preparing | 0.000028 | 0.000027 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | <unknown> | sql_select.cc | 2012 |
|
| Executing | 0.000006 | 0.000006 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | <unknown> | sql_select.cc | 3910 |
|
| Sending data | 0.094987 | 0.076359 | 0.025799 | 220 | 0 | 0 | 24 | 0 | 0 | 0 | 0 | 0 | <unknown> | sql_select.cc | 4092 |
|
| End of update loop | 0.000052 | 0.000084 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | <unknown> | sql_select.cc | 4308 |
|
| Query end | 0.000009 | 0.000013 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | <unknown> | sql_parse.cc | 6041 |
|
| Commit | 0.000007 | 0.000000 | 0.000014 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | <unknown> | sql_parse.cc | 6104 |
|
| Closing tables | 0.000006 | 0.000000 | 0.000013 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | <unknown> | sql_base.cc | 728 |
|
| Unlocking tables | 0.000006 | 0.000007 | 0.000004 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | <unknown> | lock.cc | 426 |
|
| Closing tables | 0.000014 | 0.000029 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | <unknown> | lock.cc | 439 |
|
| Starting cleanup | 0.000007 | 0.000023 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | <unknown> | sql_parse.cc | 6176 |
|
| Freeing items | 0.000011 | 0.000030 | 0.000030 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | <unknown> | sql_parse.cc | 7835 |
|
| Updating status | 0.000020 | 0.000100 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | <unknown> | sql_parse.cc | 2408 |
|
| Reset for next command | 0.000015 | 0.000043 | 0.000030 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | <unknown> | sql_parse.cc | 2432 |
|
+------------------------+----------+----------+------------+-------------------+---------------------+--------------+---------------+---------------+-------------------+-------------------+-------------------+-------+-----------------+---------------+-------------+
|
-- 10.5.10
|
SHOW PROFILE ALL;
|
+------------------------+----------+----------+------------+-------------------+---------------------+--------------+---------------+---------------+-------------------+-------------------+-------------------+-------+-----------------+---------------+-------------+
|
| Status | Duration | CPU_user | CPU_system | Context_voluntary | Context_involuntary | Block_ops_in | Block_ops_out | Messages_sent | Messages_received | Page_faults_major | Page_faults_minor | Swaps | Source_function | Source_file | Source_line |
|
+------------------------+----------+----------+------------+-------------------+---------------------+--------------+---------------+---------------+-------------------+-------------------+-------------------+-------+-----------------+---------------+-------------+
|
| Starting | 0.000062 | 0.000150 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | NULL | NULL | NULL |
|
| checking permissions | 0.000010 | 0.000031 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | <unknown> | sql_parse.cc | 6760 |
|
| Opening tables | 0.000028 | 0.000083 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | <unknown> | sql_base.cc | 4207 |
|
| After opening tables | 0.000010 | 0.000030 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | <unknown> | sql_base.cc | 4464 |
|
| System lock | 0.000027 | 0.000000 | 0.000080 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | <unknown> | lock.cc | 337 |
|
| table lock | 0.000013 | 0.000037 | 0.000003 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | <unknown> | lock.cc | 342 |
|
| init | 0.000019 | 0.000056 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | <unknown> | sql_select.cc | 4735 |
|
| Optimizing | 0.000017 | 0.000053 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | <unknown> | sql_select.cc | 1833 |
|
| Statistics | 0.000078 | 0.000235 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | <unknown> | sql_select.cc | 2292 |
|
| Preparing | 0.000031 | 0.000010 | 0.000084 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | <unknown> | sql_select.cc | 2367 |
|
| Executing | 0.000010 | 0.000027 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | <unknown> | sql_select.cc | 4303 |
|
| Sending data | 0.167589 | 0.375163 | 0.104263 | 1239 | 2 | 32 | 88 | 0 | 0 | 0 | 92 | 0 | <unknown> | sql_select.cc | 4499 |
|
| End of update loop | 0.000110 | 0.000192 | 0.000083 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | <unknown> | sql_select.cc | 4779 |
|
| Query end | 0.000010 | 0.000038 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | <unknown> | sql_parse.cc | 6067 |
|
| Commit | 0.000011 | 0.000042 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | <unknown> | sql_parse.cc | 6113 |
|
| closing tables | 0.000009 | 0.000037 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | <unknown> | sql_base.cc | 784 |
|
| Unlocking tables | 0.000010 | 0.000040 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | <unknown> | lock.cc | 429 |
|
| closing tables | 0.000021 | 0.000084 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | <unknown> | lock.cc | 442 |
|
| Starting cleanup | 0.000009 | 0.000034 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | <unknown> | sql_parse.cc | 6179 |
|
| Freeing items | 0.000015 | 0.000061 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | <unknown> | sql_parse.cc | 8116 |
|
| Updating status | 0.000035 | 0.000163 | 0.000084 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | <unknown> | sql_parse.cc | 2472 |
|
| Reset for next command | 0.000026 | 0.000170 | 0.000000 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | <unknown> | sql_parse.cc | 2496 |
|
+------------------------+----------+----------+------------+-------------------+---------------------+--------------+---------------+---------------+-------------------+-------------------+-------------------+-------+-----------------+---------------+-------------+
|
To be fair, these servers are not quite identical, but I believe the difference is not enough to explain this much of a difference.
Server with 10.3.22:
- Intel(R) Xeon(R) Gold 6134 CPU @ 3.20GHz
- 512G RAM
- MariaDB 10.3.22 (official CentOS 7 rpm package)
- innodb_buffer_pool_size=400G
- is a replica, replication threads are running
- no other workload
Server with 10.5.10:
- Intel(R) Xeon(R) CPU E7-8891 v4 @ 2.80GHz
- 512G RAM
- MariaDB 10.5.10 (official CentOS 7 rpm package)
- innodb_buffer_pool_size=400G
- is a replica, replication threads are running
- read-only workload
Unfortunately, I couldn't reproduce the issue in a "clean" manner.
I tried the following ways of reproducing:
- loading these tables from sql dump into a fresh 10.5.10 docker container with similar settings;
- loading these tables from sql dump into a fresh 10.3.22 docker container with similar settings and performing an upgrade to 10.5.10;
- importing tablespaces from a read-only replica.
In all of these cases performance did not differ from a container running 10.3.22.
Also, I tried to test the performance of this query on different replicas.
For measurement these scripts were used:
(
|
# warmup |
for i in $(seq 1 3); do |
mysql -rN -h $MYSQL_HOST < $SQL_FILE > /dev/null |
done |
echo "r_total_time_ms"; |
for i in $(seq 1 50); do |
(
|
echo "ANALYZE FORMAT=JSON" |
cat $SQL_FILE |
) | mysql -rN -h $MYSQL_HOST | jq .query_block.r_total_time_ms
|
done |
) | python stat.py
|
# cat stat.py
|
import pandas as pd |
import sys |
|
|
df = pd.read_csv(sys.stdin) |
col = df['r_total_time_ms'] |
mean = col.mean() |
std = col.std() |
print("Time (mean ± σ): {:4.2f} ± {:4.2f}".format(mean, std)) |
print("Range (-3σ ... +3σ) {:4.2f} ... {:4.2f}".format(mean - 3 * std, mean + 3 * std)) |
print(" (min ... max): {:4.2f} ... {:4.2f}".format(col.min(), col.max())) |
Results were the following:
server #1
|
Intel(R) Xeon(R) Gold 6134 CPU @ 3.20GHz
|
512G RAM
|
MariaDB 10.3.22 (official rpm package)
|
innodb_buffer_pool_size=400G
|
active replica
|
no other workload
|
|
|
Time (mean ± σ): 89.20 ± 2.03
|
Range (-3σ ... +3σ) 83.10 ... 95.31
|
(min ... max): 86.81 ... 98.48
|
server #2
|
Intel(R) Xeon(R) CPU E7-8891 v4 @ 2.80GHz
|
512G RAM
|
MariaDB 10.5.10 (official rpm package)
|
innodb_buffer_pool_size=400G
|
active replica
|
'work_mirror' workload
|
|
|
Time (mean ± σ): 187.52 ± 6.04
|
Range (-3σ ... +3σ) 169.40 ... 205.64
|
(min ... max): 175.38 ... 208.00
|
server #3
|
Intel(R) Xeon(R) CPU E5-2667 v2 @ 3.30GHz
|
128G RAM
|
MariaDB 10.5.11 (official rpm package)
|
innodb_buffer_pool_size=80G
|
active replica
|
read-only workload
|
|
|
Time (mean ± σ): 127.07 ± 2.38
|
Range (-3σ ... +3σ) 119.92 ... 134.21
|
(min ... max): 119.95 ... 133.93
|
server #4
|
Intel(R) Xeon(R) CPU E5-2667 v2 @ 3.30GHz
|
128G RAM
|
MariaDB 10.5.11 (official rpm package)
|
innodb_buffer_pool_size=80G
|
active replica
|
no workload
|
|
|
Time (mean ± σ): 117.29 ± 3.17
|
Range (-3σ ... +3σ) 107.79 ... 126.79
|
(min ... max): 111.47 ... 123.27
|
Performance becomes somewhat better when performance_schema is disabled (or even with performance-schema-instrument='stage/%=OFF')
server #4 with performance-schema-instrument='stage/%=OFF'
|
Time (mean ± σ): 110.02 ± 1.79
|
Range (-3σ ... +3σ) 104.66 ... 115.39
|
(min ... max): 104.99 ... 113.35
|
What's puzzling for me is that, as I mentioned earlier, this doesn't manifest in a docker environment.
developer machine
|
Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz
|
CentOS 7 + mainline kernel (5.2.2)
|
32G RAM
|
MariaDB 10.5.10/10.3.22 (official docker image)
|
innodb_buffer_pool_size=24G
|
no replication
|
no workload
|
|
|
10.3.22
|
Time (mean ± σ): 72.17 ± 0.71
|
Range (-3σ ... +3σ) 70.05 ... 74.29
|
(min ... max): 71.03 ... 74.48
|
|
|
10.5.10
|
Time (mean ± σ): 71.20 ± 1.13
|
Range (-3σ ... +3σ) 67.81 ... 74.59
|
(min ... max): 69.95 ... 77.26
|
Because of that I'm inclined to think that the issue may very well be not in the MariaDB code itself, but that MariaDB 10.5 requires some system settings to be changed.