[MCOL-4132] systemd startup logic suggestions Created: 2020-06-30  Updated: 2020-11-21  Resolved: 2020-11-20

Status: Closed
Project: MariaDB ColumnStore
Component/s: installation
Affects Version/s: None
Fix Version/s: 5.4.1

Type: Task Priority: Major
Reporter: Daniel Black Assignee: Roman
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
PartOf
includes MCOL-4170 Refactor services/systemd units to fi... Closed
Relates
relates to MCOL-70 systemd service support Closed
relates to MCOL-3836 Columnstore OAM replacement Closed

 Description   

MCOL-3618 - the entire install should be runable as non-root and as such systemd service files should have a `User=`

paths in scripts need to be at least configurable.

mariadb-columnstore

/usr/bin/mariadb-columnstore-start.sh

#!/bin/bash
 
# This script allows to gracefully start MCS
 
/bin/systemctl start mcs-storagemanager
/bin/systemctl start mcs-loadbrm
/bin/systemctl start mcs-workernode
/bin/systemctl start mcs-controllernode
/bin/systemctl start mcs-primproc
/bin/systemctl start mcs-writeengineserver
/bin/systemctl start mcs-exemgr
/bin/systemctl start mcs-dmlproc
/bin/systemctl start mcs-ddlproc
 
exit 0

This is horrible. Use Requires or similar. Maybe the `Partof` in other services forms the heirarcy ok.

What I've generally found with systemd is to specify something if required. patching both ends of the service just gets messy.

Don't sleep

Use of sleep like:

ExecStart=/usr/bin/env bash -c "/bin/sleep 2 && /usr/bin/DDLProc"

is an ugly fragile hack. Use of Type=simple isn't robust way to handle the dependencies to other services.

Ways to fix this in preferred order are:

a) Use `Type=notify`

When using this the server will be linked against a systemd server component and use sd_notify to communicate when it is ready.

The mariadb server uses this since 10.1.8 so see how it does it. READY is sent in sql/mysqld.cc just before the loop that accepts connections.

b) Type=dbus

If you are consider moving all communication to dbus rather than focusing development effort on client/server protocols.

c) Use `Type=forking`.

When the main process is ready it should fork.

d) Use ExecStartPost

To have some sort of wait condition (e.g. file created, socket exists etc).

LD_PRELOAD jemalloc

ExecStart=/usr/bin/env bash -c "LD_PRELOAD=$(ldconfig -p | grep -m1 libjemalloc | awk '{print $1}') exec /usr/bin/ExeMgr"

So if the libjemalloc is in the loader cache it should be used? This isn't a good idea because:

  • What the service is running depends rather indirectly on the system
  • jemalloc may be build with a prefix and not immediately replace the malloc functions
  • the use can force jemalloc use if they want to by doing a service override (examples https://mariadb.com/kb/en/mariadb/systemd/).

ExecStop kill

ExecStop=/usr/bin/env bash -c "kill -15 $MAINPID"

Don't do this. This is close to the default anyway so can be omitted.

clearShm

ExecStopPost=/usr/bin/env bash -c "clearShm > /dev/null 2>&1"

Is redirecting the output to null the best idea?

Description

The description of these services is a bit brief.

good bits

Good use of PartOf

Feature enhancements

mcs-workernode.service

this is a good case for Service Template where the instance name is the worker identifier.

mariadb.service

Think how you want the dependency with the mariadb.service to show up as. `PartOf` again?



 Comments   
Comment by Daniel Black [ 2020-06-30 ]

Not sure how to execute After a service you are PartOf

mcs-writeengineserver.service
::::::::::::::
[Unit]
Description=WriteEngineServer
PartOf=mcs-exemgr.service
After=mcs-exemgr.service

Comment by Roman [ 2020-07-13 ]

I agree that systemd calls for significant changes however the ways suggested aren't appropriate.
/usr/bin/mariadb-columnstore-start.sh here to provide an umbrella syncronous service that exists when all dependant services have started.
sleep will go away in favor of socket activation. notify doesn't suit well for multi-service software IMHO.
We should use static jemalloc but for the next couple releases we stay with what OS gives us with expected drawbacks and yes, user can overwrite the systemd units if they want and know.
Talking about systemd. Its defaults sometimes are cryptic so it is hard to deduce what happens by default. Moreover if there is no Timeout service get SIGKILL-ed right away IIRC.
The redirect of stdout is a legacy mb we'll reconsider it later.
mcs-workernode.service It isn't a good example of Template usage b/c the other workernodes lives somewhere else in the cluster and doesn't belong to this node.
We won't be having any dependency on mariadb.service at least for now b/c the cluster is still a separate software with MDB as an essential frontend.

Comment by Daniel Black [ 2020-07-17 ]

Thanks for your consideration.

Socket activation sounds like a good plan. I should revive my archived code on that for the MariaDB server too.

I assume things like exemgr perform multiple things and agree that Type=notify is hard where there is more than one ready state.

Jemalloc, static linking to system libraries tends to violate distro packaging rules. If you can provide a dynamic link option that would be useful. Are you definitely getting a jemalloc benefit? I've seen some blogs that proclaim glibc has improved the malloc preformance significantly even with a few year old versions.

I totally agree that some of the dependency relations are hard to deduce.

Timeout of 0 going to SIGKILL immediately sounds plausible. MDEV-14705 shows how I extended systemd and allows a service to negotiate an extension of the shutdown time for type=notify services.

mcs-workernode.service - good to know. I was more thinking for testing multiple worker nodes on a single hardware but this isn't an end user consideration so engineering it for that seems harmful.

Good to know its a separate software. I'll look at seeing if in the packaging of the columns store plugin, a mariadb.service extension depends on the right mcs service (or socket eventually).

I didn't fully get to a running mcs so please forgive errors and assumptions. I hope to get a more complete understand in the near future.

Comment by Roman [ 2020-07-20 ]

MCS .so and binaries aren't statically but dynamically linked against system libraries and yes we benefit using jemalloc. The default Linux memory allocator allocates a number of pools by default and doesn't return anonymously memory mapped segments back to the system. From application perspective this looks like a mem leak. This happens when software aggresivelly allocates/deallocates considirable amount of memory.
Thanks for sharing the link. I'll look into MDEV-14705.
No, this isn't that kind of worker you expect. The main workhorses are: ExeMgr(final aggregator), PrimProc(scanning and most of processing), WriteEngine(data changes). All the processes uses ThreadPool to execute tasks.
JFYI mcs-dmlproc and mcs-ddlproc are the latest units to come online.
Nothing to worry about and feel free to ask me or anyone from the team in either zulip or Slack(preferable).

Comment by Gregory Dorman (Inactive) [ 2020-07-22 ]

1.5.4 codeline includes a number of temporary parts, which need to be corrected. The bash file was introduced as a last minute workaround for a self-inflicted problem ("race condition" when starting up MCS component processes from the mariadb-columnstore.service, due to lack of proper synchronization of starting sequence.

Likewise 2 seconds sleeps were injected to temporarily avoid the problem.

Both are to be eliminated by means of properly using socket activation method.

Comment by Roman [ 2020-11-20 ]

Most of the suggestions had made its way into the code base so danblack thank you for the suggestions. We left some bits unchanged though.
We daemonize our processes to get rid of sleeps. usr/bin/mariadb-columnstore-start.sh is still here though. We might start building and ship our own version of jemalloc soon.

Comment by Daniel Black [ 2020-11-20 ]

> We might start building and ship our own version of jemalloc soon.

Doing so will incur the rage and hatred of every distribution packager.

Comment by Roman [ 2020-11-21 ]

Unfortunatelly for them I'm happy with that
IMHO distributions nowadays isn't so important in the modern open-source
environment as they were previously so I would better make MCS
crossplatform in a true way instead of following ever changing distro.

Generated at Thu Feb 08 02:48:03 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.