[MCOL-5587] Columnstore crashes/unstable on too large selects - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Affects Version/s: 23.02.3, 23.02.4, 23.10.0
Fix Version/s: 23.10.3
Component/s: None
Labels:
- rm_stability
- triage
Environment:
Docker and AWS EC2
4 cpu, 16 GB ram

Sprint:
2023-11, 2023-12, 2024-2

Description

Summary: Running too large of a select causes primproc to disappear, seems linked to ram usage and running out.
The work around is to add "-q" flag to the mariadb client so that the results from columnstore are not fully buffered in mariadbd before returning to the client. But this work around is unacceptable for 3rd party integrations and simplified user experience

Expectation: Columnstore software stays stable. Rejecting or erroring out too large of queries, self recovery or maybe an error message suggesting what is needed to complete the query (cpu/ram) but a subprocess disappearing and system staying broken until manual intervention to restart the system isn't acceptable.

Workaround: restart columnstore

Reproduction: See developer comment

Client Side error:

ERROR 1815 (HY000) at line 1: Internal error: MCS-2004: Cannot connect to ExeMgr.

primproc.log

getFreeMemory : returned from  getMemUsageFromCGroup : usage 5211672576 (GIB) 4

debug.log

Oct  6 17:29:47 mcs1 messagequeue[794]: 47.156748 |0|0|0| W 31 CAL0071: InetStreamSocket::read: timeout during first read: socket read error: Success; InetStreamSocket: sd: 65 inet: 127.0.0.1 port: 8601; Will retry.

mariadb-error.log

ClientRotator caught exception: InetStreamSocket::connect: connect() error: Connection refused to: InetStreamSocket: sd: 64 inet: 127.0.0.1 port: 8601

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

primproc.log
11 kB
2023-10-09 15:38
Right-before-server-crash.png
300 kB
2023-11-22 18:20

Issue Links

is blocked by

MDEV-34704 Quick mode produces the bug for mariadb client

Closed

is part of

MCOL-5766 Add `quick` setting to [client] section of columnstore.cnf

Closed

Activity

Ascending order - Click to sort in descending order

Roman added a comment - 2023-10-07 08:30

Did I get it right that memory allowance in Sky for the pod is 5211672576 bytes ?

Roman added a comment - 2023-10-07 08:30 Did I get it right that memory allowance in Sky for the pod is 5211672576 bytes ?

Leonid Fedorov added a comment - 2023-10-09 12:18

kirill.perov@mariadb.com please try attached reproduction script with simular AWS EC2 VM and with bigger one as well

Leonid Fedorov added a comment - 2023-10-09 12:18 kirill.perov@mariadb.com please try attached reproduction script with simular AWS EC2 VM and with bigger one as well

Kirill Perov (Inactive) added a comment - 2023-10-10 14:00 - edited

I ran the replay 4 times in same 4cpu 16Gb AWS VM.
No crashes.

mariadb-plugin-columnstore 10.6.15.10-23.02.4+maria~ubu2204 amd64

The only errors I see are from malformed queries.

Kirill Perov (Inactive) added a comment - 2023-10-10 14:00 - edited I ran the replay 4 times in same 4cpu 16Gb AWS VM. No crashes. mariadb-plugin-columnstore 10.6.15.10-23.02.4+maria~ubu2204 amd64 The only errors I see are from malformed queries.

People

Assignee:: Alan Mologorsky

Reporter:: Allen Herrera

Assigned for Review:: Leonid Fedorov

Assigned for Testing:: Kirill Perov (Inactive)

Votes:: 1 Vote for this issue

Watchers:: 11 Start watching this issue

Dates

Created:: 2023-10-06 20:35

Updated:: 2025-02-20 15:56

Resolved:: 2025-01-09 14:43

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB ColumnStore

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration