[MCOL-2009] Fix jobstep abort - Jira

XML

Word

Printable

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.1.7, 1.2.3
Component/s: ExeMgr
Labels:
None

Sprint:
2018-21, 2019-01

Description

Summary of the backtraces in the associated support ticket:

mysqld is waiting for results from a syscat query
exemgr has ~500 jobstep threads in the process of aborting, but blocked trying to send data to the next jobstep
primproc is idle

From the email I sent to the team.
"There are several joblists running, and all are in the process of aborting. TupleBPS threads are blocked trying to send data downstream. On abort, a joblist needs to be aborted ‘down-up’. On noticing the query was aborted, all jobsteps need to stop sending data downstream, then consume all of their remaining input to be sure that upstream jobsteps get unblocked, so that they can abort next. My suspicion is that there is a jobstep that isn’t implementing that completely right. From the backtraces I can’t tell which jobstep it is though, because it has already gone away (without draining its input)."

It should be easy to find now that we know what to look for. Start by looking for references to the cancelled() fcn in each jobstep to find the abort logic. Odds are one of them is not draining its input before returning.

Attachments

Issue Links

relates to

MCOL-1702 Joblist thread pool leaks if mariadb client connection drops its connection early.

Closed

MCOL-2104 Killed query locks ExeMgr and PrimProc and get all cpu resources

Closed

Activity

People

Assignee:: Daniel Lee (Inactive)

Reporter:: Patrick LeBlanc (Inactive)

Votes:: 3 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 2018-12-10 19:45

Updated:: 2024-07-08 02:31

Resolved:: 2019-01-24 21:04

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.