[MCOL-4849] Optimize ExeMgr to reduce a number of context switches - Jira

XML

Word

Printable

Details

Type: New Feature
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 6.1.1
Fix Version/s: 6.2.2
Component/s: ExeMgr
Labels:
None

Epic Link:
ColumnStore Performance Improvements
Sprint:
2021-10, 2021-11, 2021-12, 2021-13, 2021-14, 2021-15

Description

MCS generates enourmous amount of context switches that degrades performance a lot. The screenshoot cs.png demonstates this, namely cs number raises from ~80 to 21k once I run a single query(select l_orderkey, count(l_orderkey) from lineitem group by l_orderkey limit 10) in an infinite loop.
According with the perf tool observations(collected with perf record --call-graph dwarf -e context-switches -p $(pidof ExeMgr) command) there are number of candidates for the optimization:

messageqcpp::InetStreamSocket::readToMagic() that both uses poll and reads a byte at a time increasing the number of cs needed.
joblist::TupleBPS::receiveMultiPrimitiveMessages() that contains a loop that processes intermediate RGData sent by PP to EM as results of Primitive requests. There are both mutex and conditional_variables widely used in TupleBPS methods's code.
joblist::FIFO<rowgroup::RGData>::swapBuffers() that leverages mutexes to save crit section of this RGData queue from TBPS to TAS.
Plz take a look at perf.png for some details.
The goal is to reduce a number of context switches produced by ExeMgr code. It is worth to have in mind that othere categories of queries might produce a different pattern. However the three parts mentioned affect every query b/c the are substantial to query processing.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

cs.png
226 kB
2021-08-30 12:13
graph.svg
23 kB
2021-08-30 18:20
perf.png
485 kB
2021-08-30 12:36
profile.txt
109 kB
2021-08-30 18:30
Screenshot from 2021-09-01 20-22-08.png
119 kB
2021-09-01 17:25

Issue Links

relates to

MCOL-4593 Multiple concurrent queries with aggregates are bottlenecked, result in lack of user scalability

Stalled

Activity

People

Assignee:: Daniel Lee (Inactive)

Reporter:: Roman

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 2021-08-30 12:34

Updated:: 2024-10-03 15:53

Resolved:: 2021-11-17 22:37

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.