[MCOL-5050] Worker node crash after DDL . Possibly docker only Created: 2022-04-12  Updated: 2022-05-17  Resolved: 2022-04-20

Status: Closed
Project: MariaDB ColumnStore
Component/s: None
Affects Version/s: None
Fix Version/s: 6.3.1

Type: Bug Priority: Blocker
Reporter: Todd Stoffel (Inactive) Assignee: Daniel Lee (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Environment:

Docker 3 node CS cluster with MaxScale


Attachments: PNG File Screen Shot 2022-04-12 at 2.01.36 PM.png    
Issue Links:
Problem/Incident
is caused by MCOL-4912 MCS bulk insertion is slow Closed
is caused by MCOL-5057 EM index code miscalculates RAM neede... Closed
Relates
relates to MCOL-5088 Docker: Cluster creation did not fini... Stalled

 Description   

Note: This must be tested with cluster (not single node).

Beginning after build "cron 4208", Columnstore DDL now takes up to 300 seconds to complete. This is true with or without S3 (Storagemanager).

To reproduce use:

https://github.com/mariadb-corporation/mariadb-enterprise-columnstore-docker

customize the included .env file and change lines:

https://github.com/mariadb-corporation/mariadb-enterprise-columnstore-docker/blob/master/Dockerfile#L59-L69

To use later drone builds.

Then run

docker-compose up -d && docker exec -it mcs1 provision

It will hang after the "Validating" step which is when it tries to clean up the sample schema.

I have noticed that you must create a table first before things start to go haywire. Simply creating a db and dropping it are not enough.



 Comments   
Comment by Roman [ 2022-04-18 ]

tntnatbry The crashes are caused by EM Index code and the crash traces are available in /var/log/mariadb/columnstore/traces.

Comment by alexey vorovich (Inactive) [ 2022-04-18 ]

drrtuy

So who will work on this ? You ?

Do we understand why this reproduces in docker only ? Isn't EM code in C++ and run run with and without docker?

Comment by alexey vorovich (Inactive) [ 2022-04-19 ]

[dleeyh it appears to be caused by 5057 so I am moving this to test as well

drrtuy if not related to 5057 , please reverse

Comment by Daniel Lee (Inactive) [ 2022-04-20 ]

The issue caused all three docker nodes to be in readonly mode. workernode process is not running in the slave node.

Comment by Daniel Lee (Inactive) [ 2022-04-20 ]

Build verified: 6.3.1-1 (#4299)

Verified using a 3-node cluster with docker.

Generated at Thu Feb 08 02:54:58 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.