[MDEV-13376] Stopping mariadb.service on joiner node during SST does not stop state transfer Created: 2017-07-23 Updated: 2017-08-22 Resolved: 2017-08-22 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Galera, Galera SST, Scripts & Clients |
| Affects Version/s: | 10.2.7 |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Minor |
| Reporter: | Juha Pyy | Assignee: | Andrii Nikitin (Inactive) |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | galera, sst, systemd | ||
| Environment: |
Debian jessie |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Description |
|
Example cluster with only two nodes: (normally would also have a third node) wsrep_sst_method=mysqldump DONOR node started with "galera_new_cluster", JOINER node started with "systemctl start mariadb.service" after deleting grastate.dat to make sure node needs SST.
No errors, thus you would assume the node has been stopped successfully, right? In reality, what happens is SST keeps on running (confirmed by "ps aux | grep mysql") until fully finished, which could take a looong time depending on database size, after which JOINER node completes its shutdown and DONOR finally returns to SYNCED state. Now since the shutdown was not actually completed and no error was returned by systemctl, the user could try running "systemctl start mariadb.service" and end up with the new mysqld process failing to start. (I didn't try this but I assume it will fail gracefully like any other time) When I run "systemctl stop", I would expect the node to be stopped (and thus SST to be interrupted), so the donor could return back to SYNCED state ASAP. Since interrupting an SST, the joining node's database consistency doesn't really matter as it will get a new SST anyway when started next time. If I was stopping a node in the middle of an IST, the current "non-interrupting" way could be preferred to possibly avoid SST on next startup in case a busy cluster. (didn't test this but I assume currently also IST wouldn't be interrupted) Attached logs from both nodes showing what happened. TL;DR; I would suggest the systemctl command should at least return failure notice similar to when startup fails:
Thus the user would immediately know to check status, which correctly shows something has failed:
It would also be nice to get a message in logs stating what happens to the SST (or IST), e.g. "waiting until SST has finished" or "SST interrupted, node in inconsistent state, new SST required". |
| Comments |
| Comment by Andrii Nikitin (Inactive) [ 2017-08-18 ] | ||
|
Thank you for the report. I was told on systemd channel in freenode that you always expected to check `systemctl status` after `systemctl stop`. In my understanding this topic is still under development/discussion in systemd project and I wasn't able to find any reliable guidelines regarding that:
Side note: you can consider to configure SendSIGKILL=yes in unit file, especially if nodes may be easily discarded / rebuilt from fresh SST . That should work around original problem and `kill -9` command will terminate SST This call will be closed as duplicate of MDEV-13580 (so developers don't need to deal with SIGTERM and systemd problems at the same time). |