[MCOL-4849] Optimize ExeMgr to reduce a number of context switches - Jira

Details

Type: New Feature
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 6.1.1
Fix Version/s: 6.2.2
Component/s: ExeMgr
Labels:
None

Epic Link:
ColumnStore Performance Improvements
Sprint:
2021-10, 2021-11, 2021-12, 2021-13, 2021-14, 2021-15

Description

MCS generates enourmous amount of context switches that degrades performance a lot. The screenshoot cs.png demonstates this, namely cs number raises from ~80 to 21k once I run a single query(select l_orderkey, count(l_orderkey) from lineitem group by l_orderkey limit 10) in an infinite loop.
According with the perf tool observations(collected with perf record --call-graph dwarf -e context-switches -p $(pidof ExeMgr) command) there are number of candidates for the optimization:

messageqcpp::InetStreamSocket::readToMagic() that both uses poll and reads a byte at a time increasing the number of cs needed.
joblist::TupleBPS::receiveMultiPrimitiveMessages() that contains a loop that processes intermediate RGData sent by PP to EM as results of Primitive requests. There are both mutex and conditional_variables widely used in TupleBPS methods's code.
joblist::FIFO<rowgroup::RGData>::swapBuffers() that leverages mutexes to save crit section of this RGData queue from TBPS to TAS.
Plz take a look at perf.png for some details.
The goal is to reduce a number of context switches produced by ExeMgr code. It is worth to have in mind that othere categories of queries might produce a different pattern. However the three parts mentioned affect every query b/c the are substantial to query processing.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

cs.png
226 kB
2021-08-30 12:13
graph.svg
23 kB
2021-08-30 18:20
perf.png
485 kB
2021-08-30 12:36
profile.txt
109 kB
2021-08-30 18:30
Screenshot from 2021-09-01 20-22-08.png
119 kB
2021-09-01 17:25

Issue Links

relates to

MCOL-4593 Multiple concurrent queries with aggregates are bottlenecked, result in lack of user scalability

Stalled

Activity

Daniel Lee (Inactive) added a comment - 2021-11-17 22:37

Build tested: 6.2.2 (#3334, #3335)

Build performance compared to release 6.1.1

Build #3334

   DBT3 performance

      Disk-run   is 10.05% faster

      Cached-run is 10.84% faster

   CPImport is       5.71% faster

   LDI is            1.03% faster

   insertSelect is   2.42% faster

Build #3335

   DBT3 performance

      Disk-run   is 10.69% faster

      Cached-run is 12.07% faster

   CPImport is       1.43% faster

   LDI is            1.20% faster

   insertSelect is   2.76% faster

Detailed performance test result can be found here:

https://docs.google.com/spreadsheets/d/1tznQqmpKfkbnn4HjfjYIIeowJlQpNlHj8PVI6QM-3mc/edit?usp=sharing

Daniel Lee (Inactive) added a comment - 2021-11-17 22:37 Build tested: 6.2.2 (#3334, #3335) Build performance compared to release 6.1.1 Build #3334 DBT3 performance Disk-run is 10.05% faster Cached-run is 10.84% faster CPImport is 5.71% faster LDI is 1.03% faster insertSelect is 2.42% faster Build #3335 DBT3 performance Disk-run is 10.69% faster Cached-run is 12.07% faster CPImport is 1.43% faster LDI is 1.20% faster insertSelect is 2.76% faster Detailed performance test result can be found here: https://docs.google.com/spreadsheets/d/1tznQqmpKfkbnn4HjfjYIIeowJlQpNlHj8PVI6QM-3mc/edit?usp=sharing

People

Assignee:: Daniel Lee (Inactive)

Reporter:: Roman

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 2021-08-30 12:34

Updated:: 2024-10-03 15:53

Resolved:: 2021-11-17 22:37

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB ColumnStore

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration