[MCOL-3983] segv from cpimport bulk load preparation - Jira

XML

Word

Printable

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 6.3.1
Component/s: cpimport, Storage Manager
Labels:
None
Environment:
Skysql, 1.4.3-1 of columnstore

Sprint:
2021-11, 2021-12, 2021-13, 2021-14, 2021-15, 2021-16, 2021-17

Description

A customer ran into a problem that caused SM to continuously restart. Looking at the core file, there were 886 threads, and the ones I looked at had pretty crazy backtraces. For example, the ultimate cause of the crash, according to gdb, was an assertion failure in the string dtor, in the metadataObject dtor, except the line it's pointing at instantiates a metadataObject (doesn't destroy it). Then, that causes fatalHandler() to run, which segfaults, causing fatalHander() to run again.

My suspicion is that there is a general synchronization problem, and this results in mem corruption, and all of the random fallout that can happen from that. Need to follow up on things like Synchronizer::process(), where we use references to strings in a list (need to verify the iterator can't be invalidated or the value changed during use, etc).

This ticket is for general robustification of StorageManager. Also need to figure out how they got up to 886 threads right away (or ever for that matter).

Unclear whether licensing restrictions prevent me from saving the core file somewhere and linking the ticket to it. I'll do that once I get the go-ahead.

They were running 10.4.12-6 enterprise with columnstore @ 'columnstore-1.4.3-1'

An update, I found a bug in the config listeners for Downloader and Synchronizer, where if the config file has max_concurrent_uploads/downloads = 20, those threadpools never have a limit imposed on start. That could explain the whole problem. If the OS decides we've started too many threads too fast, or we hit a thread limit, that could cause a wide range of problems. Still, a scan through the code looking for races would be justified

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

all-threads-backtrace.txt
1.80 MB
2020-05-04 20:30

Issue Links

is duplicated by

MCOL-4003 Thread Concurrency Variables Not Limiting When Set to 20

Closed

is part of

MCOL-4343 umbrella for tech debt issues

Open

Activity

People

Assignee:: Daniel Lee (Inactive)

Reporter:: Patrick LeBlanc (Inactive)

Votes:: 1 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 2020-05-04 20:29

Updated:: 2024-07-08 02:01

Resolved:: 2022-02-24 16:37

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.