[MCOL-1770] Columnstore restart fails when the Swap is over threshold Created: 2018-10-04  Updated: 2021-01-15

Status: Closed
Project: MariaDB ColumnStore
Component/s: DMLProc, ExeMgr
Affects Version/s: 1.0.11
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: Abhinav santi Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None
Environment:

AWS AMI, 1um, 4pm



 Description   

Columnstore initiated the restart when UM1's swap space reached major threshold but it fails to come up successfully as the swap is still at the major threshold.

Manual Stop/Start or Shutdown system didn't work either. Is this a known bug?



 Comments   
Comment by Andrew Hutchings (Inactive) [ 2018-10-04 ]

Hi,

There is no know issues regarding this. Can you please provide a columnstore support report so we can look into this for you?

https://mariadb.com/kb/en/library/system-troubleshooting-mariadb-columnstore/#mariadb-columnstore-support-report-tool

Comment by Abhinav santi [ 2018-10-04 ]

Is there a private location where i can share the data ? I apologize I cannot share it here in the community jira

Comment by Andrew Hutchings (Inactive) [ 2018-10-04 ]

Yes, you can use our write-only FTP server: ftp://ftp.mariadb.com/uploads

Let us know the file name and we can view it.

Comment by Abhinav santi [ 2018-10-04 ]

Unable to upload to ftp.mariadb.com fails with 553 error. Do I have enough permissions?

ftp> put /tmp/columnstoreSupportReport.prod-cs-pm1.tar.gz /private/columnstoreSupportReport.prod-cs-pm1.tar.gz
local: /tmp/columnstoreSupportReport.prod-cs-pm1.tar.gz remote: /private/columnstoreSupportReport.prod-cs-pm1.tar.gz
227 Entering Passive Mode (184,106,201,174,195,190).
553 Could not create file.

Comment by Andrew Hutchings (Inactive) [ 2018-10-04 ]

Hi,

There is already a file on the server with that filename at it won't let you overwrite. Can you please add something unique to the filename?

Comment by Abhinav santi [ 2018-10-04 ]

uploaded new.columnstoreSupportReport.prod-cs-pm1.tar.gz to ftp location

Comment by Andrew Hutchings (Inactive) [ 2018-10-05 ]

There is a lot going on here so I'll try and break down what I'm seeing:

  • There are times where a lot of queries are being executed on simultaneously. These will use a lot of RAM in ExeMgr especially if there is a large result set for each. Whilst ColumnStore can run multiple queries in parallel it is not recommended and there is little to no performance gain in doing so. This is because all the CPUs will be used for every query where possible.
  • The swap space in the system is tiny and that is likely why the alarms are so easy to trigger even after ColumnStore processes are restarted. It is likely the kernel moved something else into swap.
  • You have a lot of cpimport zombie processes. Not even sure how that has happened. My guess is something went wrong during LOAD DATA INFILE or INSERT SELECT?

I highly recommend increasing your swap file size. I would say 8GB is the bare minimum for that much RAM, I would aim for more like 1/3 of your RAM. I would also recommend rebooting the UM to clear out all the zombie processes and clear the current swap usage.

Generated at Thu Feb 08 02:31:15 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.