[MCOL-5044] Improve PP thread pool with a fair scheduler - Jira

Details

Type: New Feature
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 6.3.1
Fix Version/s: 22.08.1
Component/s: PrimProc
Labels:
None

Sprint:
2021-17

Description

PP now uses PriorityThreadPool(PTP) that has 3 priority queues and supports Job re-scheduling(can happen if there is no space to service a primitive job request from EM).
PTP doesn't equally distribute computation resources b/w primitive jobs that belongs to different queries so that the variance of query timings is linear to number of simultaneous queries.
The suggestion is to implement a scheduling component that will try to fairly process mentioned primitive jobs.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

latency-histogram-develop-6(left)-fair-scheduler(right).png
270 kB
2022-05-26 14:43
slap4.lua
0.7 kB
2022-05-26 14:42

Issue Links

blocks

MCOL-4593 Multiple concurrent queries with aggregates are bottlenecked, result in lack of user scalability

Stalled

relates to

MCOL-4691 Major Regression: Selects with aggregates 2x slower in 5.x than in 1.2 (due to collation support)

Closed

MCOL-5043 Reduce a number of pre-spawned ExeMgr threads

Stalled

MCOL-5045 Computational resources and Workload aware primitive job scheduler in EM.

Open

Activity

Ascending order - Click to sort in descending order

Roman created issue - 2022-04-06 12:00

Roman made changes - 2022-04-06 12:01

Field	Original Value	New Value
Link		This issue blocks MCOL-4593 [ MCOL-4593 ]

Roman made changes - 2022-04-06 12:24

Link

This issue relates to MCOL-5043 [ MCOL-5043 ]

Roman made changes - 2022-04-06 12:24

Link

This issue relates to MCOL-5045 [ MCOL-5045 ]

Todd Stoffel (Inactive) made changes - 2022-04-07 18:45

Rank

Ranked higher

Todd Stoffel (Inactive) made changes - 2022-04-07 19:02

Status

Open [ 1 ]

Confirmed [ 10101 ]

Todd Stoffel (Inactive) made changes - 2022-04-07 19:03

Status

Confirmed [ 10101 ]

Open [ 1 ]

Roman made changes - 2022-05-17 14:53

Status

Open [ 1 ]

Confirmed [ 10101 ]

Roman made changes - 2022-05-17 14:53

Assignee

Roman [ drrtuy ]

Roman made changes - 2022-05-25 13:28

Status

Confirmed [ 10101 ]

In Progress [ 3 ]

David Hall (Inactive) made changes - 2022-05-25 17:42

Fix Version/s		22.08 [ 26904 ]
Fix Version/s	6.4.1 [ 26046 ]

David Hall (Inactive) made changes - 2022-05-25 17:43

Link

This issue relates to ~~MCOL-4691~~ [ ~~MCOL-4691~~ ]

Todd Stoffel (Inactive) made changes - 2022-05-25 17:59

Fix Version/s		22.08.1 [ 28206 ]
Fix Version/s	22.08 [ 26904 ]

Roman added a comment - 2022-05-26 14:28

There is a scheduling policy in the current implementation of a thread pool. It has 3 fixed priorities to allow to favor queries with a higher priority running primitive jobs in PrimProc. The scheduling policy picks a number(3 with default settings) of morsel tasks for execution out of a common queue. This scheduling doesn't fit for multiple parallel queries workload pattern b/c it tend to allocate all threads to run primitive jobs that belongs to a query that reaches PP first.
The main idea is to replace existing scheduler policy AKA thread pool with a fair scheduling policy. Here is the model:

every primitive job has a cost(initial version has a fixed cost) that is based on a set of operations and columns involved(the model doesn't take a morsel size into account b/c they are roughly the same in terms of records number).
every primitive job belongs to a certain transaction(!!! We are talking about SELECTs only b/c it is PPP. Every SELECT is a separate txn !!!)
FairThreadPool picks a primitive job that belongs to a transaction with a lowest combined cost of completed primitive jobs. (Who wants tech details about the implementation plz look at the commits)

Roman added a comment - 2022-05-26 14:28 There is a scheduling policy in the current implementation of a thread pool. It has 3 fixed priorities to allow to favor queries with a higher priority running primitive jobs in PrimProc. The scheduling policy picks a number(3 with default settings) of morsel tasks for execution out of a common queue. This scheduling doesn't fit for multiple parallel queries workload pattern b/c it tend to allocate all threads to run primitive jobs that belongs to a query that reaches PP first. The main idea is to replace existing scheduler policy AKA thread pool with a fair scheduling policy. Here is the model: every primitive job has a cost(initial version has a fixed cost) that is based on a set of operations and columns involved(the model doesn't take a morsel size into account b/c they are roughly the same in terms of records number). every primitive job belongs to a certain transaction(!!! We are talking about SELECTs only b/c it is PPP. Every SELECT is a separate txn !!!) FairThreadPool picks a primitive job that belongs to a transaction with a lowest combined cost of completed primitive jobs. (Who wants tech details about the implementation plz look at the commits)

Roman added a comment - 2022-05-26 14:41

Here are the first benchmark setup that I run on c59xlarge AWS instance(36 cores). Dataset is flights that is available here.. There were 13 mln records(to fit everything into memory). I run a single query that hits PP harder than EM in 5 threads using sysbench(see .lua script attached). Here is the query.

select s from (select count(*) as s from flights group by tail_num)sub;

Here are some results(see the attached latency distribution histograms also).
develop-6, 44d326ef

General statistics:

    total time:                          54.7672s

    total number of events:              500

Latency (ms):

         min:                                  321.53

         avg:                                  547.31

         max:                                  968.99

         95th percentile:                      682.06

         sum:                               273653.81

~~MCOL-5044~~-3, 61a1242b

 General statistics:

    total time:                          47.2456s

    total number of events:              500

Latency (ms):

         min:                                  212.46

         avg:                                  471.78

         max:                                  708.09

         95th percentile:                      601.29

         sum:                               235890.47

The total time, 95 percentile are 12% better with the fair scheduling policy.

To be precise a mixed workload doesn't have significant positive effect as a PP-heavy queries, the improvement lies within statistical error.

Roman added a comment - 2022-05-26 14:41 Here are the first benchmark setup that I run on c59xlarge AWS instance(36 cores). Dataset is flights that is available here. . There were 13 mln records(to fit everything into memory). I run a single query that hits PP harder than EM in 5 threads using sysbench(see .lua script attached). Here is the query. select s from (select count(*) as s from flights group by tail_num)sub; Here are some results(see the attached latency distribution histograms also). develop-6, 44d326ef General statistics: total time: 54.7672s total number of events: 500 Latency (ms): min: 321.53 avg: 547.31 max: 968.99 95th percentile: 682.06 sum: 273653.81 MCOL-5044 -3, 61a1242b General statistics: total time: 47.2456s total number of events: 500 Latency (ms): min: 212.46 avg: 471.78 max: 708.09 95th percentile: 601.29 sum: 235890.47 The total time, 95 percentile are 12% better with the fair scheduling policy. To be precise a mixed workload doesn't have significant positive effect as a PP-heavy queries, the improvement lies within statistical error.

Roman made changes - 2022-05-26 14:42

Attachment

slap4.lua [ 63932 ]

Roman made changes - 2022-05-26 14:42

Attachment

Снимок экрана от 2022-05-26 17-09-56.png [ 63933 ]

Roman made changes - 2022-05-26 14:42

Attachment

Снимок экрана от 2022-05-26 17-09-56.png [ 63933 ]

Roman made changes - 2022-05-26 14:43

Attachment

latency-histogram-develop-6(left)-fair-scheduler(right).png [ 63934 ]

Roman made changes - 2022-05-27 16:42

Sprint

2021-17 [ 614 ]

Todd Stoffel (Inactive) made changes - 2022-06-02 18:02

Rank

Ranked higher

Roman added a comment - 2022-06-03 09:40

The mixed workload test(slap6.lua) doesn't make a benefit so obvious though.
develop-6 run on a 16 core Xeon E5620 @ 2.40GHz

Latency histogram (values are in milliseconds)

       value  ------------- distribution ------------- count

    3267.187 |*                                        1

    3326.551 |**                                       2

    3386.993 |**                                       2

    3448.533 |*                                        1

    3511.192 |*                                        1

    3639.945 |********                                 8

    3706.081 |******                                   6

    3773.420 |*******                                  7

    3841.981 |*****************                        16

    3911.789 |**********************                   21

    3982.864 |*************************                24

    4055.231 |********************************         30

    4128.913 |************************************     34

    4203.934 |**************************************** 38

    4280.318 |***************************              26

    4358.090 |**********************                   21

    4437.275 |*******************                      18

    4517.898 |************                             11

    4599.987 |**************                           13

    4683.567 |************                             11

    4768.666 |****                                     4

    4855.311 |*                                        1

    4943.530 |*                                        1

    5033.352 |*                                        1

    5124.806 |*                                        1

    5409.260 |*                                        1

SQL statistics:

    queries performed:

        read:                            1800

        write:                           0

        other:                           0

        total:                           1800

    transactions:                        300    (0.72 per sec.)

    queries:                             1800   (4.31 per sec.)

    ignored errors:                      0      (0.00 per sec.)

    reconnects:                          0      (0.00 per sec.)

General statistics:

    total time:                          417.4668s

    total number of events:              300

Latency (ms):

         min:                                 3288.39

         avg:                                 4169.14

         max:                                 5418.50

         95th percentile:                     4683.57

         sum:                              1250742.04

vs ~~MCOL-5044~~

Latency histogram (values are in milliseconds)

       value  ------------- distribution ------------- count

    2362.716 |*                                        1

    3511.192 |*                                        1

    3574.989 |**                                       3

    3639.945 |*                                        2

    3706.081 |******                                   8

    3773.420 |****                                     6

    3841.981 |***********                              15

    3911.789 |*************                            17

    3982.864 |******************                       24

    4055.231 |**************************               35

    4128.913 |*********************************        45

    4203.934 |**************************************** 54

    4280.318 |**********************                   30

    4358.090 |***********                              15

    4437.275 |**********                               13

    4517.898 |********                                 11

    4599.987 |******                                   8

    4683.567 |****                                     6

    4768.666 |**                                       3

    4855.311 |*                                        2

    5124.806 |*                                        1

SQL statistics:

    queries performed:

        read:                            1800

        write:                           0

        other:                           0

        total:                           1800

    transactions:                        300    (0.72 per sec.)

    queries:                             1800   (4.33 per sec.)

    ignored errors:                      0      (0.00 per sec.)

    reconnects:                          0      (0.00 per sec.)

General statistics:

    total time:                          416.1538s

    total number of events:              300

Latency (ms):

         min:                                 2381.25

         avg:                                 4153.23

         max:                                 5093.98

         95th percentile:                     4599.99

         sum:                              1245967.92

Threads fairness:

    events (avg/stddev):           100.0000/0.00

    execution time (avg/stddev):   415.3226/0.59

Roman added a comment - 2022-06-03 09:40 The mixed workload test(slap6.lua) doesn't make a benefit so obvious though. develop-6 run on a 16 core Xeon E5620 @ 2.40GHz Latency histogram (values are in milliseconds) value ------------- distribution ------------- count 3267.187 |* 1 3326.551 |** 2 3386.993 |** 2 3448.533 |* 1 3511.192 |* 1 3639.945 |******** 8 3706.081 |****** 6 3773.420 |******* 7 3841.981 |***************** 16 3911.789 |********************** 21 3982.864 |************************* 24 4055.231 |******************************** 30 4128.913 |************************************ 34 4203.934 |**************************************** 38 4280.318 |*************************** 26 4358.090 |********************** 21 4437.275 |******************* 18 4517.898 |************ 11 4599.987 |************** 13 4683.567 |************ 11 4768.666 |**** 4 4855.311 |* 1 4943.530 |* 1 5033.352 |* 1 5124.806 |* 1 5409.260 |* 1 SQL statistics: queries performed: read: 1800 write: 0 other: 0 total: 1800 transactions: 300 (0.72 per sec.) queries: 1800 (4.31 per sec.) ignored errors: 0 (0.00 per sec.) reconnects: 0 (0.00 per sec.) General statistics: total time: 417.4668s total number of events: 300 Latency (ms): min: 3288.39 avg: 4169.14 max: 5418.50 95th percentile: 4683.57 sum: 1250742.04 vs MCOL-5044 Latency histogram (values are in milliseconds) value ------------- distribution ------------- count 2362.716 |* 1 3511.192 |* 1 3574.989 |** 3 3639.945 |* 2 3706.081 |****** 8 3773.420 |**** 6 3841.981 |*********** 15 3911.789 |************* 17 3982.864 |****************** 24 4055.231 |************************** 35 4128.913 |********************************* 45 4203.934 |**************************************** 54 4280.318 |********************** 30 4358.090 |*********** 15 4437.275 |********** 13 4517.898 |******** 11 4599.987 |****** 8 4683.567 |**** 6 4768.666 |** 3 4855.311 |* 2 5124.806 |* 1 SQL statistics: queries performed: read: 1800 write: 0 other: 0 total: 1800 transactions: 300 (0.72 per sec.) queries: 1800 (4.33 per sec.) ignored errors: 0 (0.00 per sec.) reconnects: 0 (0.00 per sec.) General statistics: total time: 416.1538s total number of events: 300 Latency (ms): min: 2381.25 avg: 4153.23 max: 5093.98 95th percentile: 4599.99 sum: 1245967.92 Threads fairness: events (avg/stddev): 100.0000/0.00 execution time (avg/stddev): 415.3226/0.59

Roman made changes - 2022-07-09 15:54

Status

In Progress [ 3 ]

In Testing [ 10301 ]

Roman made changes - 2022-07-09 15:54

Assignee

Roman [ drrtuy ]

Daniel Lee [ dleeyh ]

Daniel Lee (Inactive) made changes - 2022-07-15 18:56

Assignee	Daniel Lee [ dleeyh ]	Roman [ drrtuy ]
Status	In Testing [ 10301 ]	Stalled [ 10000 ]

Roman added a comment - 2022-07-22 16:45

4QA the first test goal would be to have the same functionality for single-node and cluster.
The second thing is that a delay for a small query should be smaller when PrimProcs in the cluster is busy with a complex query(or queries) that consumes the system's resource. In general I expect the current develop to increase a throughput for a mixed workload with short lasting and long lasting queries.

Roman added a comment - 2022-07-22 16:45 4QA the first test goal would be to have the same functionality for single-node and cluster. The second thing is that a delay for a small query should be smaller when PrimProcs in the cluster is busy with a complex query(or queries) that consumes the system's resource. In general I expect the current develop to increase a throughput for a mixed workload with short lasting and long lasting queries.

Roman made changes - 2022-07-22 16:46

Status

Stalled [ 10000 ]

Confirmed [ 10101 ]

Roman made changes - 2022-07-22 16:46

Status

Confirmed [ 10101 ]

In Review [ 10002 ]

Roman made changes - 2022-07-22 16:46

Status

In Review [ 10002 ]

In Testing [ 10301 ]

Roman made changes - 2022-07-22 16:46

Assignee

Roman [ drrtuy ]

Daniel Lee [ dleeyh ]

Daniel Lee (Inactive) added a comment - 2022-07-27 15:34

Close by regression tests

Daniel Lee (Inactive) added a comment - 2022-07-27 15:34 Close by regression tests

Daniel Lee (Inactive) made changes - 2022-07-27 15:34

Resolution		Fixed [ 1 ]
Status	In Testing [ 10301 ]	Closed [ 6 ]

People

Assignee:: Daniel Lee (Inactive)

Reporter:: Roman

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 2022-04-06 12:00

Updated:: 2024-10-03 15:52

Resolved:: 2022-07-27 15:34

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB ColumnStore

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration