[MDEV-11046] systemd loses control of running daemon after RPM upgrade Created: 2016-10-13  Updated: 2019-06-06  Resolved: 2019-04-02

Status: Closed
Project: MariaDB Server
Component/s: Packaging
Affects Version/s: 10.1.18
Fix Version/s: 10.4.4

Type: Bug Priority: Major
Reporter: Jonathan Gazeley Assignee: Sergei Golubchik
Resolution: Duplicate Votes: 0
Labels: packaging
Environment:

CentOS 7


Issue Links:
Duplicate
is duplicated by MDEV-10797 RPM includes init script and a system... Closed
Relates
relates to MDEV-11045 RPM should execute systemctl daemon-r... Closed
relates to MDEV-19706 RPM no longer installs init script on... Closed

 Description   

I receive MariaDB updates from the mariadb-enterprise-main repo. Yesterday I updated from 10.1.17 to 10.1.18 using yum but after the upgrade, system no longer has control of the running daemon.

The workaround I've found is to stop the daemon with the legacy init script (which shouldn't exist on EL7, see MDEV-10797) and then restart it with the systemd unit. After that, systemd has control again. This is a dangerous workaround as it requires taking down a live server.

I have also reported MDEV-11045 about the systemctl daemon-reload issue, which may or may not be related.

[jg4461@db2-dev ~]$ sudo cat /var/log/yum.log | grep Maria
Oct 12 06:20:32 Updated: MariaDB-common-10.1.18-1.el7.centos.x86_64
Oct 12 06:20:39 Updated: MariaDB-client-10.1.18-1.el7.centos.x86_64
Oct 12 06:20:58 Updated: MariaDB-server-10.1.18-1.el7.centos.x86_64
Oct 12 06:20:58 Updated: MariaDB-shared-10.1.18-1.el7.centos.x86_64
[jg4461@db2-dev ~]$ sudo systemctl status mariadb
● mariadb.service - MariaDB database server
   Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/mariadb.service.d
           └─migrated-from-my.cnf-settings.conf, timeout.conf
   Active: failed (Result: exit-code) since Thu 2016-10-13 09:16:21 UTC; 31min ago
  Process: 7403 ExecStartPost=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
  Process: 5247 ExecStart=/usr/sbin/mysqld $MYSQLD_OPTS $_WSREP_NEW_CLUSTER $_WSREP_START_POSITION (code=exited, status=0/SUCCESS)
  Process: 5148 ExecStartPre=/bin/sh -c [ ! -e /usr/bin/galera_recovery ] && VAR= ||   VAR=`/usr/bin/galera_recovery`; [ $? -eq 0 ]   && systemctl set-environment _WSREP_START_POSITION=$VAR || exit 1 (code=exited, status=1/FAILURE)
  Process: 5146 ExecStartPre=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
 Main PID: 5247 (code=exited, status=0/SUCCESS)
 
Oct 13 09:14:10 db2-dev.nomadic-core.bris.ac.uk systemd[1]: Starting MariaDB database server...
Oct 13 09:14:10 db2-dev.nomadic-core.bris.ac.uk sh[5148]: 2016-10-13  9:14:10 140368927991936 [Warning] option 'wsrep_max_ws_size': unsigned value 2147483648 adjusted to 2147483647
Oct 13 09:14:10 db2-dev.nomadic-core.bris.ac.uk sh[5148]: 2016-10-13  9:14:10 140368927991936 [Note] /usr/sbin/mysqld (mysqld 10.1.18-MariaDB-enterprise) starting as process 5612 ...
Oct 13 09:16:21 db2-dev.nomadic-core.bris.ac.uk sh[5148]: WSREP: Failed to start mysqld for wsrep recovery: '2016-10-13  9:14:10 140368927991936 [ERROR] mysqld: Can't lock aria control file '/v...r 30 seconds
Oct 13 09:16:21 db2-dev.nomadic-core.bris.ac.uk sh[5148]: 2016-10-13  9:14:41 140368927991936 [ERROR] mysqld: Got error 'Could not get an exclusive lock; file is probably in use by another proc...log_control'
Oct 13 09:16:21 db2-dev.nomadic-core.bris.ac.uk sh[5148]: 2016-10-13  9:14:41 140368927991936 [ERROR] Plugin 'Aria' init function returned error.
Oct 13 09:16:21 db2-dev.nomadic-core.bris.ac.uk systemd[1]: mariadb.service: control process exited, code=exited status=1
Oct 13 09:16:21 db2-dev.nomadic-core.bris.ac.uk systemd[1]: Failed to start MariaDB database server.
Oct 13 09:16:21 db2-dev.nomadic-core.bris.ac.uk systemd[1]: Unit mariadb.service entered failed state.
Oct 13 09:16:21 db2-dev.nomadic-core.bris.ac.uk systemd[1]: mariadb.service failed.
Warning: mariadb.service changed on disk. Run 'systemctl daemon-reload' to reload units.
Hint: Some lines were ellipsized, use -l to show in full.
[jg4461@db2-dev ~]$ ps -ef | grep mysql
root      4480     1  0 Oct12 ?        00:00:00 /bin/sh /usr/bin/mysqld_safe --datadir=/var/lib/mysql/data --pid-file=/var/lib/mysql/data/db2-dev.nomadic-core.bris.ac.uk.pid
mysql     5082  4480  0 Oct12 ?        00:02:52 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql/data --plugin-dir=/usr/lib64/mysql/plugin --user=mysql --wsrep_on=on --wsrep_provider=/usr/lib64/galera/libgalera_smm.so --log-error=/var/log/mysqld.log --open-files-limit=65535 --pid-file=/var/lib/mysql/data/db2-dev.nomadic-core.bris.ac.uk.pid --socket=/var/lib/mysql/mysql.sock --port=3306 --wsrep_start_position=01277571-0c67-11e6-961c-32aa2e838dbc:304
jg4461    9415  8089  0 09:48 pts/0    00:00:00 grep --color=auto mysql
[jg4461@db2-dev ~]$ sudo /etc/init.d/mysql stop
Shutting down MySQL....                                    [  OK  ]
[jg4461@db2-dev ~]$ sudo systemctl start mariadb
Warning: mariadb.service changed on disk. Run 'systemctl daemon-reload' to reload units.
[jg4461@db2-dev ~]$ sudo systemctl status mariadb
● mariadb.service - MariaDB database server
   Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/mariadb.service.d
           └─migrated-from-my.cnf-settings.conf, timeout.conf
   Active: active (running) since Thu 2016-10-13 09:48:40 UTC; 12s ago
  Process: 10054 ExecStartPost=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
  Process: 9470 ExecStartPre=/bin/sh -c [ ! -e /usr/bin/galera_recovery ] && VAR= ||   VAR=`/usr/bin/galera_recovery`; [ $? -eq 0 ]   && systemctl set-environment _WSREP_START_POSITION=$VAR || exit 1 (code=exited, status=0/SUCCESS)
  Process: 9467 ExecStartPre=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
 Main PID: 9977 (mysqld)
   Status: "Taking your SQL requests now..."
   CGroup: /system.slice/mariadb.service
           └─9977 /usr/sbin/mysqld --wsrep_start_position=01277571-0c67-11e6-961c-32aa2e838dbc:304
 
Oct 13 09:48:36 db2-dev.nomadic-core.bris.ac.uk systemd[1]: Starting MariaDB database server...
Oct 13 09:48:37 db2-dev.nomadic-core.bris.ac.uk sh[9470]: 2016-10-13  9:48:37 140270106884224 [Warning] option 'wsrep_max_ws_size': unsigned value 2147483648 adjusted to 2147483647
Oct 13 09:48:37 db2-dev.nomadic-core.bris.ac.uk sh[9470]: 2016-10-13  9:48:37 140270106884224 [Note] /usr/sbin/mysqld (mysqld 10.1.18-MariaDB-enterprise) starting as process 9933 ...
Oct 13 09:48:39 db2-dev.nomadic-core.bris.ac.uk sh[9470]: WSREP: Recovered position 01277571-0c67-11e6-961c-32aa2e838dbc:304
Oct 13 09:48:39 db2-dev.nomadic-core.bris.ac.uk mysqld[9977]: 2016-10-13  9:48:39 140308169877632 [Warning] option 'wsrep_max_ws_size': unsigned value 2147483648 adjusted to 2147483647
Oct 13 09:48:39 db2-dev.nomadic-core.bris.ac.uk mysqld[9977]: 2016-10-13  9:48:39 140308169877632 [Note] /usr/sbin/mysqld (mysqld 10.1.18-MariaDB-enterprise) starting as process 9977 ...
Oct 13 09:48:40 db2-dev.nomadic-core.bris.ac.uk systemd[1]: Started MariaDB database server.
Warning: mariadb.service changed on disk. Run 'systemctl daemon-reload' to reload units.



 Comments   
Comment by Daniel Black [ 2018-02-04 ]

since the systemd service doesn't (and never has) use(d) mysqld_safe, and your process listing shows that mysqld is the child process of mysqld_safe, I suspect this is a direct consequence of MDEV-10797 as the sysv init scripts still use it.

The higher process number of the systemd status output indicated those where started a while ago. How systemd was tricked into thinking it still has an active service without its child, process 9977, being active I have no idea, especially as systemd would of received a signal when process 9977 died.

I'm really hoping the changes in MDEV-11045 (and finally closing MDEV-10797) add sufficient rigour around the upgrade to prevent this reoccurring. I await confirmation, good or bad, once these have been merged.

Generated at Thu Feb 08 07:46:53 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.