[MDEV-9622] MariaDB client connection hangs when semi-sync is enabled but not connected Created: 2016-02-24  Updated: 2016-04-08  Resolved: 2016-04-08

Status: Closed
Project: MariaDB Server
Component/s: Replication
Affects Version/s: 10.1.11
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Joseph Glanville Assignee: Kristian Nielsen
Resolution: Done Votes: 0
Labels: None
Environment:

Ubuntu 14.04, official MariaDB packages.



 Description   

When using semi-sync replication with the master configured to block writes until the number of semi-sync slaves is > 1 with a long (or infinite) timeout there seems to be a bug when inserting data when a slave is disconnected.

Specifically if a write query is executed it's successfully blocked however once the slave is connected the client connection isn't unblocked (and never receives any data indicating success or failure of the query). Things however get more strange as the query is actually successfully executed on connection of the slave and the data is properly replicated.
This results in the client connection hanging forever also, which isn't a great failure condition as it's impossible to determine if the write was successful or not.

You can reproduce this by doing the following.

Start a master server.

Enable semi-sync replication with the following:

CREATE USER 'flynn'@'%' IDENTIFIED BY 'flynn'
GRANT ALL ON . TO 'flynn'@'%' WITH GRANT OPTION
FLUSH PRIVILEGES
INSTALL PLUGIN rpl_semi_sync_master SONAME 'semisync_master.so'
SET GLOBAL rpl_semi_sync_master_wait_point = "AFTER_SYNC"
SET GLOBAL rpl_semi_sync_master_timeout = 18446744073709551615
SET GLOBAL rpl_semi_sync_master_enabled = 1
SET GLOBAL rpl_semi_sync_master_wait_no_slave = 1

Execute a query that would write data. The bug is reproducible with DDL and DML, CREATE DATABASE is probably the easiest option. Note that your client will now be hung. This goes for any connector and the mysql command line client.

Create a slave from a backup of the master, populating the GTID of the master (or at the time of backup if the master has progressed):

INSTALL PLUGIN rpl_semi_sync_slave SONAME 'semisync_slave.so'
SET GLOBAL rpl_semi_sync_slave_enabled = 1
SET GLOBAL gtid_slave_pos = MASTER_GTID'
CHANGE MASTER TO MASTER_HOST='127.0.0.1', MASTER_PORT=13306, MASTER_USER='flynn', MASTER_PASSWORD='flynn', MASTER_CONNECT_RETRY=10, MASTER_USE_GTID=current_pos
STOP SLAVE IO_THREAD
START SLAVE IO_THREAD
START SLAVE

The master should now be writable again but your initial write should still be hung.
However if you connect another session you should see that your statement has actually executed.

Let me know if any additional information can be supplied, happy to do the leg work running debug builds etc.



 Comments   
Comment by Joseph Glanville [ 2016-02-24 ]

So playing with this some more suggests this is only a problem in single threaded/connection applications.

Because of how the semi-sync logic works sleeping threads are only woken up when a new write is processed.
Otherwise the full timeout is always observed.

In the case of an infinite timeout this means that unless you do another write on another connection your previous connection will never unblock.
However in the case of a shorter timeout the timeout will expire, the master will switch semi-sync off (even though there is a semi-sync slave available that has replicated all of the available entries) and the very next write will switch it back on.

The reason why this is important is the semantics of the timeout. The following only applies to applications that don't do concurrent writes.
Basically you want to be able to set the semi-sync timeout to the Mean Time To Recovery of your slave in most scenarios, or infinite if you never want to lose any data.
Doing either of these means your single threaded application is always going to be blocked the full duration of the timeout if your slave disconnects.
Which is untenable for obvious reasons.

Would there be a way to wake connections up when a new slave is connected or better yet when a segment > the wait position has been transferred to a slave?

Comment by Joseph Glanville [ 2016-02-24 ]

Another reasonable solution would be to have an option to change the behaviour on timeout from switching semi-sync off to returning an error to the client and rolling back the transaction.
This would eliminate the need for using an infinite timeout to ensure data safety is enforced.

Comment by Joseph Glanville [ 2016-02-24 ]

I have found what I believe to be the root cause, which is that when using purely GTID replication the initial binlog dump doesn't result in waiting sessions being woken up because it relies on a comparison of the log file names/positions which may not be set on CHANGE MASTER when using purely GTID replication.

Changing my test case to also set MASTER_LOG_FILE and MASTER_LOG_POS worked around the issue.

Should I update the original issue to reflect the root cause?

Comment by Elena Stepanova [ 2016-02-24 ]

josephglanville,

When you do SET GLOBAL gtid_slave_pos = MASTER_GTID, which GTID do you set, exactly?
Is it the GTID of the last replicated event, or GTID of the event that is currently hanging (waiting) on the master?

Comment by Joseph Glanville [ 2016-02-24 ]

I set the GTID of the most recent entry in physical backup used to create the slave.

It appears ActiveTranx::compare doesn't know how to handle GTID replication, or at least it's not aware of it and something should be converting to the old log file name + pos format before calling it.
Is there a way to get the master file name and position from the GTID during initial semi-sync dump?

Comment by Elena Stepanova [ 2016-02-24 ]

I asked because I can easily reproduce the behavior if I use GTID of the hanging event, but not with the GTID of the last executed event.
Anyway, I'll pass it over to the replication expert knielsen for further analysis and comments on how GTID and semi-sync should work together.

Comment by Joseph Glanville [ 2016-02-24 ]

Ahh, looks like I had a bug in my test script that was trying to replicate the hang.
Fixing it so it uses the proper GTID fixes that little hang.

After I got past this one I ran into a different issue which I am not sure is actually solvable.

Basically I have a system that sets up an automated chain replicated MariaDB setup.

Normally it's setup with 3 nodes, one master with a semi-sync slave and an async slave replicating from the semi-sync one.
i.e:

Primary -> Sync -> Async.

Now when this is being setup each pulls a physical backup from it's upstream peer using xtrabackup.
How it works is a innobackupex process is spawned using xbstream mode and streamed into the local data directory.

All of this works fine except in one special case. Which is when someone attempts to call something like CREATE DATABASE while xtrabackup is trying to pull the first physical backup in order to setup the semi-sync slave.

What happens here is that the CREATE DATABASE transaction goes into the active transactions hash table and the database starts waiting on a slave to replicate it.
However the backup isn't yet complete.. so when it goes to finalize the backup and get a consistent snapshot of the binlog is calls FLUSH TABLES WITH READ LOCK.
Which seems to deadlock with the CREATE DATABASE transaction.
But because the semi-sync slave needs this physical backup to start replicating the CREATE DATABASE transaction will never be unblocked.

Is there anyway to avoid this scenario?

The only hunch I have right now is because the underlying filesystem is ZFS is looking into whether I can create a consistent snapshot without needing to call FLUSH TABLES WITH READ LOCK or other massive database lock.

Is it possible to take a consistent backup this way or prevent DDL like CREATE DATABASE from locking up the database until the semi-sync slave is online? Like a stronger version of @@read_only?

Comment by Kristian Nielsen [ 2016-02-24 ]

The GTID set in gtid_slave_pos is the last event executed on the slave.
So that GTID will never be received by the slave (but all following GTIDs
will).

I am not much familiar with semisync, but I'd expect that then the GTID will
also not be acknoledged (but following GTIDs will).

I think that is what Elena was refering to, so I agree, that seems right.

Comment by Joseph Glanville [ 2016-02-26 ]

Seeing as I have gotten to the bottom of this and it's clear it's not actual a MariaDB bug but rather an interaction between semi-sync replication, locks and innobackupex/xtrabackup this should probably be closed as invalid.

Comment by Kristian Nielsen [ 2016-04-08 ]

Closing as per discussion in comments

Generated at Thu Feb 08 07:36:04 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.