[MCOL-4863] storagemanager process would not start until... Created: 2021-09-10  Updated: 2021-11-17  Resolved: 2021-11-15

Status: Closed
Project: MariaDB ColumnStore
Component/s: Storage Manager
Affects Version/s: None
Fix Version/s: 6.1.1

Type: Bug Priority: Major
Reporter: Edward Stoever Assignee: Unassigned
Resolution: Not a Bug Votes: 0
Labels: None


 Description   

Working on a Columnstore cluster recently, I could not get storagemanager to start on any node, until I edited /lib/systemd/system/mcs-loadbrm.service changing "After" to "Requires" as seen here:

#After=network.target mcs-storagemanager.service
Requires=network.target mcs-storagemanager.service



 Comments   
Comment by Edward Stoever [ 2021-09-12 ]

Another workaround that gets the same results in this circumstance is to edit /usr/bin/mariadb-columnstore-start.sh and add in the start of mcs-storagemanager before any other process starts:

/bin/systemctl start mcs-storagemanager
sleep 1

Update on October 6, 2021: use REQUIRES as shown in the original post. It works better than this.

Comment by Edward Stoever [ 2021-09-12 ]

Now that I have seen this behavior in two environments, it is worth mentioning that it seems to happen when using [ObjectStorage] service = LocalStorage
and there are NFS mounted directories for shared storage in data1, data2, etc.

Comment by Edward Stoever [ 2021-10-06 ]

It is a good idea to edit the file /usr/share/columnstore/mcs-loadbrm.service and make the same change. This file is copied over /lib/systemd/system/mcs-loadbrm.service if the post install script is run. So, better to make the change in both files. It can be done with two simple commands:

perl -p -i -e "s/After/Requires/g" /lib/systemd/system/mcs-loadbrm.service
perl -p -i -e "s/After/Requires/g" /usr/share/columnstore/mcs-loadbrm.service

Comment by Roman [ 2021-11-10 ]

edward The reason why it is After and not Requires is that SM must be started and become a dependency if and only if S3 storage is used. We check username/password/bucket tuple from /etc/columnstore/storagemanager.cnf to detect if S3 is used. If a cluster uses S3 it starts SM from mcs-loadbrm systemd unit(/usr/bin/mcs-loadbrm.py). So we need to search for the root cause that prevents your cluster from starting.
To research the case future you can configure the cluster, shut it down and manually start systemd units just at the primary node in this order: mcs-loadbrm, mcs-workernode@1. This sequence implicitly starts SM.
Feel free to close the issue if my explanation is enough or give more details so we can proceed.

Comment by Edward Stoever [ 2021-11-17 ]

Awesome, thank you Roman and Allen!

Generated at Thu Feb 08 02:53:35 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.