[MDEV-10382] Using systemd, mariadb doesn't restart on crashes Created: 2016-07-16 Updated: 2016-12-07 Resolved: 2016-12-06 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Platform Debian, Scripts & Clients |
| Affects Version/s: | 10.1 |
| Fix Version/s: | 10.1.20 |
| Type: | Bug | Priority: | Critical |
| Reporter: | jocelyn fournier | Assignee: | Sergei Golubchik |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Description |
|
Hi, After switching to debian jessie using /etc/systemd/system/mariadb.service instead of mysqld_safe, mariadb doesn't restart on crash. (official apt package, mariadb 10.1.14) Thanks, |
| Comments |
| Comment by jocelyn fournier [ 2016-07-18 ] | ||
|
It would also be great to add in this file a EnvironmentFile=/etc/sysconfig/mysql & create an empty file in /etc/sysconfig/mysql | ||
| Comment by Andrii Nikitin (Inactive) [ 2016-11-03 ] | ||
|
I am not expert in systemd, but this request doesn't look valid to me. Please confirm if conclusions in this report are based on some source or solely from personal experience. Maybe some restart timeout caused misunderstanding? (It is 5 sec by default in my Ubuntu 16.04). | ||
| Comment by jocelyn fournier [ 2016-11-03 ] | ||
|
Hi Andrii, It's based only on personal experience : MariaDB was properly running, then it hits a crashes, and didn't restart (I reproduced this behaviour several times). BTW on-failure is the recommended setting : "Setting this to on-failure is the recommended choice for long-running services, in order to increase reliability by attempting automatic recovery from errors." | ||
| Comment by Sergey Vojtovich [ 2016-11-03 ] | ||
|
We used to have on-failure but had to switch to on-abort for a reason, see comment in mariadb.service:
| ||
| Comment by Sergey Vojtovich [ 2016-11-03 ] | ||
|
If all you worry about is crashes not caught by systemd, then I'd say it is systemd bug. Better to have it fixed there. OTOH I tend to remember InnoDB may use exit() instead of abort(). In this case it may not be handled properly and we need to fix this in MariaDB. See | ||
| Comment by jocelyn fournier [ 2016-11-03 ] | ||
|
Not having mysql restarted in case of a crash causes downtime which is not good. The switch from mysqld_safe (which was handling properly crashes) to systemd is fairly recent in the debian jessie package, AFAIK ? | ||
| Comment by Sergey Vojtovich [ 2016-11-03 ] | ||
|
Not having mysqld restarted in case of a crash is certainly no good. But I'm afraid switching from on-abort to on-failure is no good either. mysqld_safe distinguishes crashes from normal shutdown by checking pid file existence. I doubt systemd is flexible enough to perform such check. Even if it were, I'd do my best to avoid this. If you could provide mysqld error log from one of such crashes, we could probably come up with better solution. | ||
| Comment by jocelyn fournier [ 2016-11-03 ] | ||
|
I think the crash was caused by https://jira.mariadb.org/browse/MDEV-10410 | ||
| Comment by Sergey Vojtovich [ 2016-11-03 ] | ||
|
According to man systemd.service:
Not sure how systemd determines "uncaught signal", but we definitely catch all signals and after handling those we call exit(). | ||
| Comment by Andrii Nikitin (Inactive) [ 2016-11-04 ] | ||
|
OK, if mysqld suppresses all signals (in signal_handler.cc) and calls exit(1); , then Restart=on-abort is for sure incorrect configuration here (because technically mysqld suppresses all signals which can be suppressed). The closest behavior to mysqld_safe will be Restart=always. To solve problem when mysqld restarts when incorrect option is used - we should return special exit code with incorrect configuration - e.g. 203 and configure RestartPreventExitStatus=203 So I set status to 'Confirmed' and increase priority to 'Critical' | ||
| Comment by Daniel Black [ 2016-11-28 ] | ||
|
Or make the signal handler raise the signal again after removing the handlers and before exit. systemd will get the child signal from the waitid(2). | ||
| Comment by Sergei Golubchik [ 2016-11-30 ] | ||
|
agree, it's probably the safest to use Restart=on-failure and RestartPreventExitStatus to fine tune what errors should cause a restart. | ||
| Comment by Marko Mäkelä [ 2016-12-07 ] | ||
|
InnoDB calls exit() from I/O threads in certain situations. It would probably be better to call abort() to be consistent with other cases where InnoDB aborts when encountering a corrupted page. |