Here are the first benchmark setup that I run on c59xlarge AWS instance(36 cores). Dataset is flights that is available here.. There were 13 mln records(to fit everything into memory). I run a single query that hits PP harder than EM in 5 threads using sysbench(see .lua script attached). Here is the query.
select s from (select count(*) as s from flights group by tail_num)sub;
|
Here are some results(see the attached latency distribution histograms also).
develop-6, 44d326ef
General statistics:
|
total time: 54.7672s
|
total number of events: 500
|
|
Latency (ms):
|
min: 321.53
|
avg: 547.31
|
max: 968.99
|
95th percentile: 682.06
|
sum: 273653.81
|
|
MCOL-5044-3, 61a1242b
General statistics:
|
total time: 47.2456s
|
total number of events: 500
|
|
Latency (ms):
|
min: 212.46
|
avg: 471.78
|
max: 708.09
|
95th percentile: 601.29
|
sum: 235890.47
|
The total time, 95 percentile are 12% better with the fair scheduling policy.
To be precise a mixed workload doesn't have significant positive effect as a PP-heavy queries, the improvement lies within statistical error.
There is a scheduling policy in the current implementation of a thread pool. It has 3 fixed priorities to allow to favor queries with a higher priority running primitive jobs in PrimProc. The scheduling policy picks a number(3 with default settings) of morsel tasks for execution out of a common queue. This scheduling doesn't fit for multiple parallel queries workload pattern b/c it tend to allocate all threads to run primitive jobs that belongs to a query that reaches PP first.
The main idea is to replace existing scheduler policy AKA thread pool with a fair scheduling policy. Here is the model:
FairThreadPool picks a primitive job that belongs to a transaction with a lowest combined cost of completed primitive jobs. (Who wants tech details about the implementation plz look at the commits)