Details
-
Task
-
Status: Closed (View Workflow)
-
Minor
-
Resolution: Fixed
-
10.1.7-1, 10.1.7-2, 10.1.8-1, 10.1.8-3
Description
Supporting socket activation would make each of the following possible for admins:
- Cleaner restarts (the listener socket stays open persistently)
- Network namespace isolation, disallowing any network access beyond the inherited listener port (and connections accepted from it).
- Lazy startup for densely hosted instances. (It's also possible with socket activation to start it eagerly, as usual.)
- Running MariaDB on privileged ports without having to start it initially as root
- Non-racy startup for services (like a PHP site) that depend on connecting to MariaDB. Because systemd opens listener sockets early in boot, they're available even while MariaDB is starting
- Deeper integration into coming network support in future systemd releases
Some examples in C are here:
http://0pointer.de/blog/projects/socket-activation.html
I am willing to sponsor development of this feature.
Attachments
- systemd_full.patch
- 36 kB
Issue Links
- includes
-
MDEV-4606 cmake build to be able to choose sysvinit or systemd (or other) init files
-
- Closed
-
- is blocked by
-
MDEV-427 Provide a systemd script for MariaDB
-
- Closed
-
- relates to
-
MDEV-5713 RFE: Add support for systemd notify feature
-
- Closed
-
-
MDEV-25233 Review shutdown patterns in systemd service units
-
- Open
-
-
MDEV-25282 Auto-shutdown on idle when socket-activated
-
- Stalled
-
-
MDEV-6347 Build RHEL7 packages
-
- Closed
-
-
MDEV-6536 make --bind=hostname to listen on both IPv6 and IPv4 addresses
-
- Closed
-
- links to
Activity
Thanks, hhorak! These are very important considerations that should be taken into account.
> Just for anybody who is not willing to read the whole passionate discussion at (1), I'd like to point out the comment #62 there (2), which summarizes the main cons of using socket activation for a database server.
Those aren't cons of using socket activation. At most, they're arguments against enjoying certain benefits, like on-demand startup. (We actually use on-demand startup for about 300,000 MariaDB instances; it's merely an issue of good hygiene during shutdown.) Any issues with on-demand startup are avoidable by simply starting the server at boot, which is the option of any administrator with or without socket activation.
Remember that "socket activation" merely means systemd opens the initial listener and passes it in; it does not mean the service must wait on the first client connection to start. I think people are getting too caught up in the name "socket activation" rather than its functional aspects.
> I'd also add another comment, that according at least Fedora guidelines, socket-activated services need to be autostart (3), which is not something we wish from database server
Fedora packaging guidelines, in general, require RPMs to not start services on installation or by default at boot. This is not a distinction for socket-activated services. If an administrator explicitly "enables" a socket-activated service unit, it will start at boot. This is the same behavior required today for the non-socket-activated MariaDB packages shipped by Fedora.
> and worth noting that other distros can have similar requirements.
Other distros, like Ubuntu and Debian (which are moving to systemd) actually do not have such a restriction on services auto-starting after package installation.
> That having in mind, I don't think it's worth trying to implement socket activation now, unless there is some special use case for such a feature.
The benefits for clean restarts, security isolation, privilege separation, race-condition avoidance, and support for the network features (which are in the current v209 release now) remain useful, even if you think on-demand startup isn't.
That sounds like you have been more successful than us, that's great. davidstrauss, can you share your service and socket unit files as well?
> can you share your service and socket unit files as well?
We don't use socket activation with MariaDB yet, but we do something very similar. If anything, it takes longer:
- Run a stub that tries to connect before serving a PHP-based web page.
- Have the stub contact a service on the database host to relaunch the MariaDB instance if it's shut down.
- The service waits until the Unix socket is created on disk.
- The stub records into APC/APCu that the database is relaunched so the next hour of requests don't have to connect twice. (We never shut down instances that were launched under an hour ago.)
- The PHP-based web page connects and continues execution as normal.
The remote CLI has a similar stub to relaunch the database on access. If MariaDB supported socket activation, we could remove most of this complexity. I've also written a socket activation proxy [1] that may be a better way than what we do today, but any proxy obfuscates the real client IP addresses, which breaks MariaDB/MySQL's automatic password intrusion prevention support.
Here's the my.cnf template [2] we use. You'll notice we disable fast shutdown in order to ensure fast(er) startup is possible.
[1] http://www.freedesktop.org/software/systemd/man/systemd-socket-proxyd.html
[2] https://gist.github.com/davidstrauss/9141741
Some feedback on the first draft of the patch:
- It's unclear why the Also=mysqld.socket directive is in the service file. This should really be Requires=mysqld.socket (the Before=mysqld.service in the socket unit is implicit). It's not good practice to use Also= here because, then, manually starting the service using systemctl start mysqld.service won't work properly if the socket isn't already running.
- Use of socket activation doesn't necessarily mean skip-networking is appropriate. For example, the instance running may be a slave and need to connect to a master. Replication is not compatible with skip-networking, at least in MySQL.
- Setting the I/O priority to real-time may not be safe. It's much safer for an administrator to drop the BlockIOWeight= of other control groups and leave/set MariaDB to BlockIOWeight=1000. This ensures that MariaDB gets very high priority access to I/O without potentially starving everything else (even admin tools).
- Group=mysql is probably redundant. With User=mysql and an unspecified Group=, the group defaults to the default group of user mysql.
- I assume the notify integration is working properly, but if you have issues, consider setting NotifyAccess=all, just in case the notification is coming from something systemd does not consider the main process.
- It may be worth including PrivateNetwork=false (or documentation to that effect) because that option – while seemingly good for a socket-activated MariaDB not using replication – breaks Type=notify support, thereby mysteriously breaking the service.
So here we have a completed patch that to support systemd socket activation (and notification).
Thanks for the feedback David. The reason for Also= is "we just place an Also= line that makes sure that cups.path and cups.socket are automatically also enabled if the user asks to enable cups.service (they are enabled according to the [Install] sections in those unit files)." as per http://0pointer.de/blog/projects/socket-activation2.html. This seems to be still current based on examples in distribution.
It has a compile option WITH_SYSTEMD for those distros/OSs that don't have systemd. Even with compiled with systemd the socket activation will have no effect unless invoked from systemd with socket activation configured.
I'm happy to write a wiki page when this gets accepted.
This invokes mysqld directly we reuse systemd options to perform the functions of mysqld_safe.
Using systemd notify (MDEV-5713) the systemd knows when the mysqld is running and taking connection and can start other dependent services.
Although socket activation isn't totally needed on production the WantedBy=multi-user.target means it runs on bootup like a normal service with the added start that if something early in boot needs it it starts earlier.
Adding $OPTIONS seems to be a standard way that end users can alter /etc/systemd/system/mariadb.service.d/XXX.conf to override locations or settings.
I've used a STATUS notification on some things that cause slow start up so sysadmins can see the progress.
There may be some improvements to the config as suggested by our distro maintainers. TODO package spec/deb rules don't install support-files/mariadb.
{service|socket}- wasn't sure how.
Is exit status 1 from mysqld something that should cause systemd to auto restart mysqld?
I'm hoping a working patch and the most voted on issue topic (systemd) is enough to change the goal to 10.0.
The reason for Also= is "we just place an Also= line that makes sure that cups.path and cups.socket are automatically also enabled if the user asks to enable cups.service (they are enabled according to the [Install] sections in those unit files)." as per http://0pointer.de/blog/projects/socket-activation2.html. This seems to be still current based on examples in distribution.
That blog post is ancient by systemd documentation standards. The current documentation states:
"Socket units will have a Before= dependency on the service which they trigger added implicitly. No implicit WantedBy= or RequiredBy= dependency from the socket to the service is added. This means that the service may be started without the socket, in which case it must be able to open sockets by itself. To prevent this, an explicit Requires= dependency may be added."
Is exit status 1 from mysqld something that should cause systemd to auto restart mysqld?
Yes, unless non-zero exit codes get explicitly listed in the service file as clean exits.
Ok. reworked with Requires. Removed MaxConnections as that is only for Accept=true.
Thanks for the changes. Just re-reviewed.
My only additional feedback: Requires= must be in the Unit section. Otherwise, systemd will ignore it and warn.
ack. done.
Is exit status 1 from mysqld something that should cause systemd to auto restart mysqld?
The key word here is 'should'. I looked it up and it seems exist status 1 is fatal config options to which a restart isn't' useful.
A such a restart isn't attempted on exit status 1:
-SuccessExitStatus=0,1
+RestartPreventExitStatus=1
> -SuccessExitStatus=0,1
> +RestartPreventExitStatus=1
Indeed, this is the right configuration.
I'll amend my comment: "Yes, unless non-zero exit codes get explicitly listed in the service file as clean exits or exits where a restart is useless."
since last patch: corrected mariadb-systemd-start to do proper checks and use defaults if not set.
systemd didn't like mysqld doing a shutdown on the listening sockets so I disabled it for systemd sockets to make systemd happy.
Added dh-systemd to debian/ubuntu depends in control file and attempted to set dh_systemd_enable/dh_systemd_start as per debian policy (still could be errors there).
Added support-files/mariadb-socket-convert which takes a my.cnf config and generates a mariadb.socket.conf files based on the same network settings. Intended to help distro maintainers help users migrate across.
unresolved is, as far as I know, some of the distro policies and packaging:
Generally:
Does mysqld using dbus now for notify mean a selinux policy needs to be added? (seems like an easy policy for those that need it http://blog.siphos.be/2014/06/d-bus-and-selinux/).
Is a ExecStartPre=/usr/bin/mariadb-systemd-start the right way to create a database if not installed?
should ListenStream=3306 be [::1]:3306 by default?
should [mysqld_safe] bits get converted for distros like the network settings in mariadb-socket-convert (hoping raising LimitNOFile=16k eliminates most surprises)?
Debian: how is /etc/mysql/debian-start run (mysql-check/mysql_update) run? (may need a separate service that depends on mariadb.service - ExecStartPost may be too early)
rpm spec files need to set WITH_SYSTEMD=no for distros that don't use it.
Is a ExecStartPre=/usr/bin/mariadb-systemd-start the right way to create a database if not installed?
Yes. A separate service would be excessive and confusing.
should ListenStream=3306 be [::1]:3306 by default?
ListenStream=3306 will result in a combination IPv4/IPv6 listener on port 3306 for all interfaces. My hunch is that [::1]:3306 will create a listener on 127.0.0.1:3306 and [::1]:3306, but I haven't tested this. I think most distros have MariaDB installing, by default, with only a localhost listener.
should [mysqld_safe] bits get converted for distros like the network settings in mariadb-socket-convert (hoping raising LimitNOFile=16k eliminates most surprises)?
Do you have specific bits in mind?
Debian: how is /etc/mysql/debian-start run (mysql-check/mysql_update) run? (may need a separate service that depends on mariadb.service - ExecStartPost may be too early)
It's possible to create a "run after" service using a combination of WantedBy= and After=. WantedBy= means, if enabled, it gets added into any transaction where the MariaDB service starts (but does not imply ordering). After= ensures that it waits for MariaDB to be ready. The advantage of it being a separate service is making it clearly optional to administrators.
ListenStream=[::1]:3306 ...
only does ipv6. Changed to:
ListenStream=[::1]:3306
|
ListenStream=127.0.0.1:3306
|
BindToDevice=
|
|
Backlog=150
|
Reworked the mariadb-socket-convert to do all resolutions of a (hostname like localhost) as bind-address.
[mysqld_safe]...Do you have specific bits in mind?
support-files/mariadb.service documents a way to manually convert all of the [mysqld_safe] to mariadb.service settings. As really only open-file-limit/LimitNOFILE and timezone really effect the running of the mariadb after and upgrade I've put a default 16K file limit to keep people out of trouble and maybe timezone isn't often set.
I'll only want to write any conversion script here if there is a real demand and value.
I think i've gone down the path of attempted distro maintenance here as far as I want to without any specific guidance from distro maintainers so I'll let them work out the rest. They have quite extensive policies that they are more familiar with after all
only does ipv6. Changed
If you have two ListenStream= options, systemd will send in two separate socket file descriptors. Is that handled in the code?
If you have two ListenStream= options, systemd will send in two separate socket file descriptors. Is that handled in the code?
Actually 3 since the unix socket is a ListenStream too. Up to 10 are handled within the code which is just a #define with non-fatal errors if this is exceeded and written to mysql error log.
Reminded me the current code has extra_ip_sock(extra-port) has special handling. How important is it to maintain this?
if (mysql_socket_getfd(sock) == mysql_socket_getfd(extra_ip_sock))
|
{
|
thd->extra_port= 1;
|
thd->scheduler= extra_thread_scheduler;
|
}
|
Reminded me the current code has extra_ip_sock(extra-port) has special handling. How important is it to maintain this?
I've never been a fan of the specialized handling of the IP versus the Unix socket. It would be great to have a generic pool of listeners; I see no downside.
More fixes. I dare say this is getting pretty close.
Using systemd list fds on ip_sock, extra_ip_sock and unix_socket wasn't working as there needed to be a determination of the socket type. As systemd can specify multiple sockets in any order a proper map of socket type to file descriptor was made.
The performance schema interface was also missing which is now corrected. For the moment I've put a PS event type for systemd_unix, systemd_ipv6, systemd_ipv4 but can probably expand these to include path/host/port/numbers information if desired. Although the code for HAVE_POLL=no is there I couldn't get past MDEV-7473 which stalled on linux even with -DHAVE_SYSTEMD=no -DHAVE_POLL=no but hey, linux always has poll and systemd isn't ported to non-linux so we're safe for now.
The socket activation part already implements MDEV-6536 IPv6 bind address now because systemd handles the opening of sockets and with a little more config parsing some of these new structures this sets the basis a more extensive listening interface.
The main missing bit so far, a decent automated test case. Nothing non-root exists in systemd however it shouldn't be too hard to open a few sockets, set some env vars and exec another process.
Added https://github.com/MariaDB/server/pull/83 to merge to 10.1 incorporating review comments
Removing from 10.1 backlog and lowering priority until new pull request is submitted.
Generic systemd functionality has been incorporated for a while. As such I've stalled in getting the socket activation included. At this point I'd like to re-ask how much interest is there currently for socket activation? For what use cases? If there is an upper number of listening sockets what limit?
> At this point I'd like to re-ask how much interest is there currently for socket activation? For what use cases?
Let's review the original list I posted on the issue.
> Cleaner restarts (the listener socket stays open persistently)
Still true unless MariaDB has added an nginx-style method of restarting/reloading, which itself uses socket inheritance (just not via systemd).
> Network namespace isolation, disallowing any network access beyond the inherited listener port (and connections accepted from it).
Still true, and network namespace isolation would have helped mitigate vulnerabilities like CVE-2016-6662 by not allowing malicious code to open new listeners on the host or initiate outbound connections.
> Lazy startup for densely hosted instances. (It's also possible with socket activation to start it eagerly, as usual.)
Still true, though admittedly an edge case. We still start MariaDB instances with our own on-demand logic, but socket activation would be cleaner.
> Running MariaDB on privileged ports without having to start it initially as root
Still true, but a very limited use case given the standard ports for MySQL/MariaDB.
> Non-racy startup for services (like a PHP site) that depend on connecting to MariaDB. Because systemd opens listener sockets early in boot, they're available even while MariaDB is starting
If MariaDB is now properly integrating with systemd via Type=notify or Type=forking with PIDFile= set, then this is no longer useful.
> Deeper integration into coming network support in future systemd releases
There have been a few things added to systemd's network and namespace support, including JoinsNamespaceOf=, which allows running a backend like memcached and having it accessible to MariaDB from its namespace. Nothing too groundbreaking, though.
> If there is an upper number of listening sockets what limit?
There are limits in the kernel on file descriptor counts, but they're very high (and consumed by MariaDB opening its own sockets just as much).
From the systemd side, admins and packagers can configure any number of sockets to be opened and passed into MariaDB via systemd. The configured sockets will appear to MariaDB with file descriptors numbered in a way corresponding to the listeners configured in mariadb.socket.
davidstrauss Thanks for the feedback. I'll take this on board. Notes:
Cleaner restarts - not implemented in mariadb - http://nginx.org/en/docs/control.html - wow - quite involved, additionally so with a single process and Type=notify. sending file descriptors, exec(2) using shm for data provide partial solutions. Perhaps this should be a separate MDEV (if one doesn't exist).
In this implementation and discussion there has been nothing mentioned on auto-deactivation. Is this a requirement in densely hosted instances? If so create a separate JIRA issue.
If something like a max_idle_execution time maybe. There are avenues for integration with RuntimeMaxSec and extending timeout of runtime, and/or purely terminating the listening execution loop if no new connections are happening or queries running.
But please, new issue and state the requirement first.
Documentation will be coming in the next week or so https://mariadb.com/kb/en/systemd/
In this implementation and discussion there has been nothing mentioned on auto-deactivation. Is this a requirement in densely hosted instances?
I wouldn't make it a requirement for this issue, even if it's an interesting capability. The socket activation should already mean that it's possible to shut down a MariaDB service that's about to get a new connection and have everything still work. A shutdown immediately followed by a connection attempt should cause systemd's own handling of the socket to enqueue the necessary work to re-launch the service. On the client side, the only thing that should be noticeable is a delay in the initial connection (while MariaDB starts).
After MDEV-5536, it should be possible to analyze MariaDB activity however one chooses and send "systemctl stop something.service" while being confident that they're not causing system breakage if they anticipated wrongly (and a new connection or query arrives after the analysis or decision to shut down).
So, self shutdown on idleness would be a complementary feature, but it's pretty separate. Socket activation is the main thing that unblocks the "MariaDB farm" use case because that part is what you can't easily do externally.
I agree that a separate issue is warranted.
I've posted a separate issue for automatic shutdown when idle: MDEV-25282
Just for anybody who is not willing to read the whole passionate discussion at (1), I'd like to point out the comment #62 there (2), which summarizes the main cons of using socket activation for a database server. I'd also add another comment, that according at least Fedora guidelines, socket-activated services need to be autostart (3), which is not something we wish from database server, and worth noting that other distros can have similar requirements.
That having in mind, I don't think it's worth trying to implement socket activation now, unless there is some special use case for such a feature.
(1) https://bugzilla.redhat.com/show_bug.cgi?id=714426
(2) https://bugzilla.redhat.com/show_bug.cgi?id=714426#c62
(3) https://fedoraproject.org/wiki/Packaging:Systemd#Socket_activation