[MDEV-10064] performance regression with threadpool Created: 2016-05-13 Updated: 2017-07-25 Resolved: 2017-07-25 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | OTHER |
| Affects Version/s: | 10.0.25, 10.1.14 |
| Fix Version/s: | 10.1.25 |
| Type: | Bug | Priority: | Major |
| Reporter: | Axel Schwenke | Assignee: | Axel Schwenke |
| Resolution: | Not a Bug | Votes: | 0 |
| Labels: | threadpool | ||
| Environment: |
Ubuntu x86_64 |
||
| Attachments: |
|
| Sprint: | 10.0.26, 10.0.28, 5.5.55, 10.0.30 |
| Description |
|
Enabling the thread pool leads to about 5% performance loss in MariaDB 10.0 and 10.1, but not in MariaDB 5.5. I tested 5.5.49 vs. 10.0.25 vs. 10.1.14. The benchmark is sysbench OLTP read-only with 1000 point-selects per transaction. The benchmark machine has 16 cores (32 hyperthreads). my.cnf:
See attached spread sheet for numbers. |
| Comments |
| Comment by Vladislav Vaintroub [ 2016-05-13 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
axel, could you share your benchmark scripts too ? there are some relevant parameter as to number of rows, number of tables etc.I'd like to reproduce the results exactly as described | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Axel Schwenke [ 2016-05-17 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I had a new set of benchmark runs, now going up to 4K benchmark threads (128x overloading the machine). I also did another round with only 10 selects per trx, which is nearly the original OLTP workload. In general I see degradation for 4K threads, both with and without thread pool. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Vladislav Vaintroub [ 2016-06-23 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I had a run of 10 point-selects per transaction, and results for 10.0.26 were nowhere near, on a supposedly similar machine In my tests, thread-per-connection crumbles badly after 1K users. While pool-of-threads exceeds the numbers in tp10.png by a wide margin (reaching its max ~400K qps at concurrency 126-512, and slowly going down to ~300K qps at concurrency 8192, sysbench 0.4 shows much better results even). Granted, I do my tests manually, since I was not able to get the automated scripts running easily, but this should do no difference. I took I hope the same benchmarks spec close 20 tables, 50000 rows each (1 Mio rows overall), and cnf file and sysbench parameters below. All tests were run on a warm database (after cleanup/prepare) here is sysbench 0.5 call , which I ran for N=4 ... 8192
Here is what I have in my.ini (adapted from my.cnf.01, added more prepared statements limit and large max-connections)
I start mysqld (pool-of-threads) with
I start mysqld(thread-per-connection) with
Taskset is used to produce the best results (and if no taskset is used, results are still about the same) Anyway, here are the results I get Pool-of-threads
With all information given above, it should be easily reproducible on another machine I guess. I will look into 10.1 tomorrow. Perhaps, it was a good XtraDB merge in 10.0 that introduced that difference, there were a lot of changes in os0sync etc. Maybe this turned out to give that boost for the threadpool, I dunno. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Vladislav Vaintroub [ 2016-06-23 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Attached raw output from sysbench pool.txt (for pool-of-threads test) and one_thread.txt (thread-per-connection) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Vladislav Vaintroub [ 2016-06-23 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Ah, also perhaps a relevant detail . this is default compilation, cmake . && make . | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Vladislav Vaintroub [ 2016-06-28 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Ok, I measured some more , with and without taskset. So, one can see what my appear as very slight regression if taskset is not used ,specifically for threadpool case. But, this is a phantom regression . Indeed, as mentioned elsewhere (e.g in threadpool documentation, in the section of how to run benchmarks), benchmark driver seems to take a bigger share of the overall CPU. Concretely, in this case in 10.1, without pinning, you can get a situation where sysbench-0.5 is using 10 CPUs out of 32, while mysql is using 22 CPUs, as shown by "top". The idle time is 0%, there are 32 CPUs, that are all busy. However, mysqld can do more, if affinitized (use up to 24 CPUs, which results in better throughput, but then sysbench needs to be restricted).In all of my affinitized test, threadpool outperforms thread-per-connection (the later can be affinitized or not). In all of overall tests, threadpool continues to scale above 1024 concurrent selects. Either there is something I do wrong on my end, or I'd say that the benchmarks were not run properly, and the same hardware can do better, and outperform thread-per-connection in all aspects, including raw throughput, if the benchmark would run using taskset, as mentioned in the threadpool documentation. I shared my results in axel, I'm reasssigning this back. Could you please confirm, my findings (and, in this case, I think the MDEV can be closed), or tell my whether I do something wrong. I shared details how I run the benchmarks, including sysbench and mysqld parameters (including the taskset params) in this comment | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2017-03-01 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
If the origin of this regression is suspected to be this Percona XtraDB commit, then I presume that the condition
should be preserved intact. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Vladislav Vaintroub [ 2017-03-01 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
marko I think the comment belongs to | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Axel Schwenke [ 2017-07-20 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Added a test case to the regression test suite to test thread pool behavior for all MariaDB releases, starting with 5.5. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Axel Schwenke [ 2017-07-25 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Could not find any regression with a 16:16 splitting of hyperthreads. Performance with threadpool enabled is flat over releases and performance at high thread counts is slightly better with threadpool enabled vs. one-thread-per-connection |