[MDEV-18450] slow master shutdown to ensure slaves have received all its event Created: 2019-02-01  Updated: 2023-05-16  Resolved: 2019-03-12

Status: Closed
Project: MariaDB Server
Component/s: Replication
Fix Version/s: 10.4.4

Type: Task Priority: Major
Reporter: Andrei Elkin Assignee: Andrei Elkin
Resolution: Fixed Votes: 1
Labels: None

Issue Links:
Relates
relates to MDEV-19847 Update mysqladmin man page Closed
relates to MXS-2620 Document that shutting down the maste... Closed

 Description   

When master server shuts down its slaves stop receiving events from it at random
point in terms of replication event count (such as GTID), specifically it's not guaranteed
that all the replication events have been sent to a slave. Therefore such slave can't
be automatically promoted to master.

It must be feasible how to defer critical shutdown operation until after all/some of the currently connected slaves have received the last event from the master binlog. When client connections are closed, so no more data are generated for replication, the fact of sending (even better - receiving) of the last event should release the final phase of shutdown.
Such slow way stopped master is guaranteed to have an up-to-date slave that was fed with the last master binlog event.



 Comments   
Comment by Geoff Montee (Inactive) [ 2019-02-15 ]

I have been working with someone who experienced this exact problem with semi-synchronous replication. They noticed that when they trigger a normal shutdown on the master, some data could still be committed by existing client threads after the semi-synchronous replication master threads have been shutdown.

Comment by Andrei Elkin [ 2019-02-15 ]

Sergei, hello.

I've submitted a patch 80961a70f22 to bb-10.4-andrei.
It may still lack some design elements like warnings to the stderr if time to catch up would exceed some limit.
Yet it implements the core including changes in the client and passed my smoke tests
incl mtr suites run with the "slow" mode.

I hope to hear from you when your busy times permit

Thanks!

Andrei

Comment by Geoff Montee (Inactive) [ 2019-02-15 ]

Hi Elkin,

I noticed that the patch prevents the server from killing the binlog dump thread early on the master, which sounds great.

For semi-synchronous replication, does anything need to be changed to prevent the server from killing the ack receiver thread early on the master as well? Currently, it looks like this might be killed after the binlog dump thread, and before the storage engines shut down:

2019-02-01 12:45:07 0 [Note] /usr/sbin/mysqld (initiated by: unknown): Normal shutdown
2019-02-01 12:45:07 0 [Note] Event Scheduler: Purging the queue. 0 events
2019-02-01 12:45:07 0 [Note] InnoDB: FTS optimize thread exiting.
2019-02-01 12:45:07 128 [Note] Stop semi-sync binlog_dump to slave (server_id: 50882876)
2019-02-01 12:45:07 112 [Note] Stop semi-sync binlog_dump to slave (server_id: 173730129)
2019-02-01 12:45:08 6 [Note] Stopping ack receiver thread
190201 12:45:08 server_audit: STOPPED
2019-02-01 12:45:08 0 [Note] InnoDB: Starting shutdown...
2019-02-01 12:45:08 0 [Note] InnoDB: Dumping buffer pool(s) to /mariadb/data/ib_buffer_pool
2019-02-01 12:45:08 0 [Note] InnoDB: Buffer pool(s) dump completed at 190201 12:45:08
2019-02-01 12:45:09 0 [Note] InnoDB: Shutdown completed; log sequence number 1673557; transaction id 82
2019-02-01 12:45:09 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"
2019-02-01 12:45:09 0 [Note] /usr/sbin/mysqld: Shutdown complete

In the source code, I see it's currently stopped here:

https://github.com/MariaDB/server/blob/62c0ac2da66f8e26d5bbf79f3a7dac56cad34f5e/sql/semisync_master.cc#L1348

https://github.com/MariaDB/server/blob/62c0ac2da66f8e26d5bbf79f3a7dac56cad34f5e/sql/mysqld.cc#L1973

Comment by Andrei Elkin [ 2019-02-18 ]

GeoffMontee, thank you for this inspring question about the semisync and its ack collector specifically.
It looks indeed a race is possible so that Master would gain an extra trx as result of

0. Master is committing a trx
1. client shuts down
2. Master has not sent trx yet to the slave while having received the close user connection
command
3. The master trx gets committed as part of the close user connections

Notice that according to my observation the Ack collector role is completely ignored.
It might be killed as

4. Ack collector thread is killed

to not change the fact of trx is on master but is missed on slave.

I got to rework the patch to make the wait-for-slaves shutdown respect the semisync ack collector as well.

Actually, the conclusion did not consider the very fixes which ensure the dump thread is alive
all the time until binlog EOF is reached.
I may need to add up a specific test case to demonstrate this scenario.

Cheers,
Andrei

Comment by Andrei Elkin [ 2019-03-07 ]

There is an open question in the end of the MDEV description and implementation.

MDEV-18450 Slaves wait shutdown

The patches features an optional shutdown behavior to hold on
until after all connected slaves have been sent the last binlogged event.
The connected slave is one whose START SLAVE has been acknowledged and
that was not stopped since that though it could be technically
reconnecting in background.

The solution therefore disallows killing the dump thread until is
has found EOF of the latest binlog file.
It is up to the shutdown requester (DBA) to set up a sufficiently large
shutdown timeout value for shudown to wait patiently until
lagging behind slaves have been synchronized. On the other hand if a
specific slave needs exclusion from synchronization the DBA would have
to stop it manually which would terminate its dump thread.

`mysqladmin shutdown' is extended with a `--wait_for_slaves' option
which translates to `SHUTDOWN wait_for_slaves' sql query
to enable the feature on the client side.

A question is raised by svoj whether or not make sense to introduce a dynamic
global server variable which would hold a desired shutdown policy to act upon
in case of the server is killed externally rather than through an SQL command.
serg, valerii could you please provide your judgment?

Andrei

Comment by Sergei Golubchik [ 2019-03-07 ]

Elkin, I'd say it's fine without a variable. Until somebody requests it and explains why he cannot use mysql -e 'shutdown wait_for_slaves. (hint: not knowing root password is not a good reason, as 10.4 uses unix_socket by default)

But please rename "wait_for_slaves" to "wait_for_all_slaves" and may be drop underscores — sql standard generally avoids them

Comment by Andrei Elkin [ 2019-03-07 ]

GeoffMontee, thanks for assessing and the future-proof idea! Personally I have thought as well the global var can wait, until requested. Hopefully, that's fine with you as well.

Comment by Geoff Montee (Inactive) [ 2019-03-07 ]

Hi Elkin,

I am working with a user who ran into this problem with a master that has semi-synchronous slaves.

If there is no global variable to start with, if this semi-synchronous master is shutdown externally (i.e. via systemd), would it still be shutdown safely by default? Or would that master also have to be manually shutdown with something like "shutdown wait_for_all_slaves"?

These servers are in a containerized environment. These containers can be shutdown automatically when they need to be moved to a different physical host for performance reasons. This is a completely automated process, so there is no DBA involved to execute a command like "shutdown wait_for_all_slaves". For this user, it would be much preferred if they could ensure that the semi-synchronous master could be shutdown safely by default without manually executing some SQL command.

Comment by Andrei Elkin [ 2019-03-07 ]

GeoffMontee to

>there is no DBA involved to execute a command like "shutdown wait_for_all_slaves"
Killing of the server process is harsh, so dump threads may not complete sending and therefore
no seemless failover.

I read the server option has just been requested and to deliver that to the architectures.

Comment by Geoff Montee (Inactive) [ 2019-03-07 ]

Hi Elkin,

The server process isn't killed in the case that I'm talking about. It's shutdown gracefully by the system, similar to how it would be if someone rebooted their database server like this:

shutdown -r now

The server process would be gracefully shutdown with systemd.

Comment by Andrei Elkin [ 2019-03-07 ]

GeoffMontee does this specific container shutdown have any hook where mysqladmin could run?

Comment by Geoff Montee (Inactive) [ 2019-03-07 ]

Hi Elkin,

He said that they can use kubernetes PreStop hook:

https://kubernetes.io/docs/concepts/containers/container-lifecycle-hooks/

So he is OK with the non-safe shutdown being the default behavior.

Comment by Andrei Elkin [ 2019-03-12 ]

Pushed 3568427d11f to 10.4.4.

Comment by Geoff Montee (Inactive) [ 2019-08-01 ]

I just documented this behavior: https://mariadb.com/kb/en/library/replication-threads/#binary-log-dump-threads-and-the-shutdown-process

Generated at Thu Feb 08 08:44:11 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.