Hi Daniel. Unfortunately, notifications from this Jira instance don't seem to be reaching the top of my inbox. I saw activity on GitHub, though, and noticed your reply here.
tl;dr: If MariaDB is using Type=notify (and I think it is, IIRC) then you should probably set SendSIGKill=Yes and leverage EXTEND_TIMEOUT_USEC to avoid timeouts when performing a long but orderly shutdown. KillMode=mixed remains my recommendation unless KillMode=cgroup is known to be correct for all processes.
My Detailed Thoughts
Regarding KillMode=mixed vs. KillMode=cgroup, I tend to feel that, unless the design of the daemon's process set intends KillMode=cgroup, that "mixed" is more appropriate because it provides the main PID with time to shut down children before systemd blankets all children in shutdown signals. Another way to look at this is what happens when there's a mismatch between daemon expectations and the systemd unit configuration; I'll cover each scenario and the cost of being wrong.
If KillMode=cgroup, but that's wrong for any process...
If the daemon expects to reap its own children, then "cgroup" mode will cause broken behavior because systemd will aggressively send signals to both main and child PID processes. It can be broken because it can break internal dependency expectations around signal propagation. That is, if a child PID expects that the shutdown signal comes from the main PID after the main PID is ready for children to shut down, then systemd sending it at the same time to main and child processes will cause premature shutdown in child processes, possibly confusing the main process. Daemons like MariaDB that are portable across systems tend to expect to reap their own children because KillMode=cgroup is a systemd-specific thing AFAIK; that's why I'm suspicious whether "cgroup" is the right choice (or even a safe one) for MariaDB.
So, KillMode=cgroup can introduce hard-to-notice shutdown race conditions when the signal propagation order is important but processes tend to shut down in an acceptable order due to other effects, like the duration spent on various phases of shutdown. For example, let's say a parent expects to send the shutdown signal to its child. If that child process runs some housekeeping at the beginning of shutdown – therefore keeping it around a bit longer typically – while the parent interacts with it on the parent's shutdown, then that parent process's expectations might be broken if the child process happens to have a particular quick housekeeping run and disappears because systemd told it to shut down.
This hazard could exist in the other direction, too. Let's say a parent process wants to open a pipe for child processes to communicate their final status during shutdown. It expects to open this pipe and only then send shutdown signals to children. If those children receive a premature shutdown signal from systemd, they might try to use the pipe before the parent process has created it.
If KillMode=mixed, but that's wrong for any process...
If the daemon expects systemd to reap its children, then using "mixed" will cause shutdown to hang, usually pending an eventual SIGKill. However, if MariaDB has SIGKill=No, then I can see why it might cause shutdown/restart to hang indefinitely. An indefinite hang on shutdown is a risk whenever SIGKill=No is the configuration, though.
Concluding Thoughts and How to Extend Shutdown
Given that you don't want SIGKill sweeping in prematurely to stop a shutdown-in-progress, it seems brazen to me to have systemd send all processes shutdown signals, bypassing any process shutdown topology you might otherwise expect to manage internally using signals. The risk of "mixed" seems to be hanging shutdowns, while the risk of "cgroup" seems to be undefined behavior (unless KillMode=cgroup is known to be well-defined). For a database, I'd choose the risk around "mixed" every time (again, unless everything is known to manage process dependencies correctly through other means).
Finally, and perhaps most importantly, I see that you're worried about SIGKill happening when an a shutdown is orderly but long. Is MariaDB using Type=notify? If so, I'd advise using EXTEND_TIMEOUT_USEC= so that orderly shutdowns don't get interrupted, but you can still have the guarantee of systemd reaping a failed shutdown.
Notes so far:
KillMode=mixed
I'm almost ok with changing the KillMode=mixed, just need to look at the details below to really justify it.
Seems compatible with f9179b36d313ef50240407fcb2737ac3a0aa3b9e and `SendSIGKill=No` (https://github.com/systemd/systemd/commit/5bcffb4b549c0d115d8e40137ea885b7568ec6cb).
TODO, see how killing mariadbd would propagate to the shutdown of:
TODO, on the case for the existing KillMode=cgroup, what is the handling of the termination of the above scripts?
SendSIGKill=No
The case behind this was mainly around a start up service that was slow due to recovery. With SendSIGKill=No the systemd concept of the service can terminate while the process continues to rollback from the undo log.
There is also the case of a service shutdown that is just a bit slow to do all the necessary cleanups. Having it linger a little longer to continue seems like a reasonable constraint.
There obviously consequences of this meaning a server shutdown by letting the mariadbd process continue. One being that a restart of the service won't see the existing process still running and would hit the mariadbd mechanisms in aria/innodb that lock files for exclusive access to prevent duplicate processes. https://github.com/systemd/systemd/commit/5bcffb4b549c0d115d8e40137ea885b7568ec6cb was written to ensure that protection was applied a little earlier (and obviously only applies systemd v242 and later).
So its largely here because the cleanup from a hard kill is quite expensive.
MariaDB does use the extend timeout type=notify API of systemd to try to avoid it getting into this state
MDEV-14705.MDEV-17571, https://github.com/MariaDB/server/commit/d78f02d73d5b2f962c0ea6a1198e932c7355adc2#diff-8c0a9bb1f023e03364e3310d3a385ac726a3274d634eba036e64bcf4984555c4, changed this to 15 minutes.Both the extend timeout and SendSIGKill=No where trying to avoid Timeout(Start,Stop)Sec=infinity and avoid areas of clean up code (that can take considerable time).
So competing requirements as I see it that have resulted in this balance:
So these are the tradeoffs made. I'm happy to have reflections on problems this has/might cause or better ways of balancing this.