[MDEV-10064] performance regression with threadpool - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Not a Bug
Affects Version/s: 10.0.25, 10.1.14
Fix Version/s: 10.1.25
Component/s: OTHER
Labels:
- threadpool
Environment:
Ubuntu x86_64

Sprint:
10.0.26, 10.0.28, 5.5.55, 10.0.30

Description

Enabling the thread pool leads to about 5% performance loss in MariaDB 10.0 and 10.1, but not in MariaDB 5.5. I tested 5.5.49 vs. 10.0.25 vs. 10.1.14.

The benchmark is sysbench OLTP read-only with 1000 point-selects per transaction. The benchmark machine has 16 cores (32 hyperthreads).

my.cnf:

[mysqld]

max_connections = 1300

table_open_cache = 2600

query_cache_type = 0

innodb_buffer_pool_size = 512M

innodb_buffer_pool_instances = 10

innodb_adaptive_hash_index_partitions = 20

thread_handling=pool-of-threads

See attached spread sheet for numbers.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

one_thread.txt
2016-06-23 18:13
77 kB
Vladislav Vaintroub
pool.txt
2016-06-23 18:13
77 kB
Vladislav Vaintroub
threadpool.ods
2016-05-17 12:09
56 kB
Axel Schwenke
tp10.png
2016-05-17 11:29
60 kB
Axel Schwenke
tp1000.png
2016-05-17 11:29
57 kB
Axel Schwenke

Activity

Ascending order - Click to sort in descending order

View 5 older comments

Vladislav Vaintroub added a comment - 2016-06-28 11:29 - edited

Ok, I measured some more , with and without taskset. So, one can see what my appear as very slight regression if taskset is not used ,specifically for threadpool case. But, this is a phantom regression . Indeed, as mentioned elsewhere (e.g in threadpool documentation, in the section of how to run benchmarks), benchmark driver seems to take a bigger share of the overall CPU. Concretely, in this case in 10.1, without pinning, you can get a situation where sysbench-0.5 is using 10 CPUs out of 32, while mysql is using 22 CPUs, as shown by "top". The idle time is 0%, there are 32 CPUs, that are all busy. However, mysqld can do more, if affinitized (use up to 24 CPUs, which results in better throughput, but then sysbench needs to be restricted).In all of my affinitized test, threadpool outperforms thread-per-connection (the later can be affinitized or not). In all of overall tests, threadpool continues to scale above 1024 concurrent selects.

Either there is something I do wrong on my end, or I'd say that the benchmarks were not run properly, and the same hardware can do better, and outperform thread-per-connection in all aspects, including raw throughput, if the benchmark would run using taskset, as mentioned in the threadpool documentation.
taskset really makes a visible difference.

I shared my results in
https://docs.google.com/spreadsheets/d/12KPobxrP89BzrevPaCoGxGUPnI4kuLWRtTLjTfPJw78/edit#gid=0

axel, I'm reasssigning this back. Could you please confirm, my findings (and, in this case, I think the MDEV can be closed), or tell my whether I do something wrong.

I shared details how I run the benchmarks, including sysbench and mysqld parameters (including the taskset params) in this comment

https://jira.mariadb.org/browse/MDEV-10064?focusedCommentId=84510&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-84510

Vladislav Vaintroub added a comment - 2016-06-28 11:29 - edited Ok, I measured some more , with and without taskset. So, one can see what my appear as very slight regression if taskset is not used ,specifically for threadpool case. But, this is a phantom regression . Indeed, as mentioned elsewhere (e.g in threadpool documentation, in the section of how to run benchmarks), benchmark driver seems to take a bigger share of the overall CPU. Concretely, in this case in 10.1, without pinning, you can get a situation where sysbench-0.5 is using 10 CPUs out of 32, while mysql is using 22 CPUs, as shown by "top". The idle time is 0%, there are 32 CPUs, that are all busy. However, mysqld can do more, if affinitized (use up to 24 CPUs, which results in better throughput, but then sysbench needs to be restricted).In all of my affinitized test, threadpool outperforms thread-per-connection (the later can be affinitized or not). In all of overall tests, threadpool continues to scale above 1024 concurrent selects. Either there is something I do wrong on my end, or I'd say that the benchmarks were not run properly, and the same hardware can do better, and outperform thread-per-connection in all aspects, including raw throughput, if the benchmark would run using taskset, as mentioned in the threadpool documentation. taskset really makes a visible difference. I shared my results in https://docs.google.com/spreadsheets/d/12KPobxrP89BzrevPaCoGxGUPnI4kuLWRtTLjTfPJw78/edit#gid=0 axel , I'm reasssigning this back. Could you please confirm, my findings (and, in this case, I think the MDEV can be closed), or tell my whether I do something wrong. I shared details how I run the benchmarks, including sysbench and mysqld parameters (including the taskset params) in this comment https://jira.mariadb.org/browse/MDEV-10064?focusedCommentId=84510&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-84510

Marko Mäkelä added a comment - 2017-03-01 17:06

If the origin of this regression is suspected to be this Percona XtraDB commit, then I presume that the condition

	} else if (free_len > max_free_len / 5) {

should be preserved intact.

Marko Mäkelä added a comment - 2017-03-01 17:06 If the origin of this regression is suspected to be this Percona XtraDB commit , then I presume that the condition } else if (free_len > max_free_len / 5) { should be preserved intact.

Vladislav Vaintroub added a comment - 2017-03-01 17:11

marko I think the comment belongs to ~~MDEV-10409~~, not this one

Vladislav Vaintroub added a comment - 2017-03-01 17:11 marko I think the comment belongs to MDEV-10409 , not this one

Axel Schwenke added a comment - 2017-07-20 13:22

Added a test case to the regression test suite to test thread pool behavior for all MariaDB releases, starting with 5.5.

Axel Schwenke added a comment - 2017-07-20 13:22 Added a test case to the regression test suite to test thread pool behavior for all MariaDB releases, starting with 5.5.

Axel Schwenke added a comment - 2017-07-25 09:32

Could not find any regression with a 16:16 splitting of hyperthreads. Performance with threadpool enabled is flat over releases and performance at high thread counts is slightly better with threadpool enabled vs. one-thread-per-connection

Axel Schwenke added a comment - 2017-07-25 09:32 Could not find any regression with a 16:16 splitting of hyperthreads. Performance with threadpool enabled is flat over releases and performance at high thread counts is slightly better with threadpool enabled vs. one-thread-per-connection

MariaDB Server

performance regression with threadpool

Details

Description

Attachments

Attachments

Activity

People

Dates

Git Integration