Multi-source replication (MDEV-253)

[MDEV-3793] Multi-source: Semisync replication is not fully supported for multiple masters and can cause replication failure and relay log corruption Created: 2012-10-04  Updated: 2020-05-04  Resolved: 2020-04-15

Status: Closed
Project: MariaDB Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 10.3.23

Type: Technical task Priority: Minor
Reporter: Elena Stepanova Assignee: Michael Widenius
Resolution: Done Votes: 1
Labels: None

Issue Links:
Duplicate
is duplicated by MDEV-4920 Semi-sync slave plugin assumes only o... Closed
Relates

 Description   

Semisync replication doesn't properly distinguish multiple master connections, which causes different problems. For example, if one master has the semisync plugin, and another one doesn't, trying to enable semisync on slave makes replication from both masters abort. The actual errors vary. With the test case below, most often I'm getting

On the connection with the master which does not have the semisync plugin:

Last_IO_Errno   1593
Last_IO_Error   Fatal error: Failed to run 'after_read_event' hook

On the connection with the master which has the semisync plugin:
either

Last_SQL_Errno  1594
Last_SQL_Error  Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.

or

Last_IO_Errno   1595
Last_IO_Error   Relay log write failure: could not queue event from master

If we decide to support it, the corresponding variables Rpl_semi_sync_slave_status and rpl_semi_sync_slave_enabled should probably be made session-aware.

The test case is draft, it should not be added to the suite as is. It contains sleeps that are unreliable and slow, when the problem is fixed, they should be replaced with proper waits and syncs.

Please make sure you have at least revno 3438 from 10.0-base, since the test uses reset_master_slave.inc include file which was added there.

If you haven't got the error on the first attempt, give it another try, sometimes it gets lucky and passes, apparently there is some kind of a race condition.

Test case:
cat semisync.test

# TODO: when the problem is fixed,
# instead of the sleeps below there should be proper
# waits for slaves to start, and also synchronization
# with each master. For now, it will just make the test
# hang for long time, so I won't put it here.
# Also, an log error suppression will need to be added.
 
 
--connect (master1,127.0.0.1,root,,,$SERVER_MYPORT_1)
install soname 'semisync_master.so';
 
--connect (slave,127.0.0.1,root,,,$SERVER_MYPORT_3)
 
install soname 'semisync_slave.so';
set global rpl_semi_sync_slave_enabled = 1;
 
--replace_result $SERVER_MYPORT_1 MYPORT_1
eval change master 'master1' to
master_port=$SERVER_MYPORT_1,
master_host='127.0.0.1',
master_user='root';
 
start slave 'master1';
--sleep 2
 
--replace_result $SERVER_MYPORT_2 MYPORT_2
eval change master 'master2' to
master_port=$SERVER_MYPORT_2,
master_host='127.0.0.1',
master_user='root';
 
start slave 'master2';
--sleep 2
 
stop all slaves;
--sleep 2
start all slaves;
--sleep 3
--replace_result $SERVER_MYPORT_1 MYPORT_1 $SERVER_MYPORT_2 MYPORT_2
query_vertical show all slaves status;
 
# Cleanup
 
--source reset_master_slave.inc
uninstall plugin rpl_semi_sync_slave;
--disconnect slave
 
--connection master1
--source reset_master_slave.inc
uninstall plugin rpl_semi_sync_master;
--disconnect master1
 



 Comments   
Comment by Michael Widenius [ 2013-06-09 ]

Because semi-sync is a plugin, this is not a trivial task to fix.

What would need to be done:

  • Change all global variables in semisync_slave.h and semsync_slave.h to be a dynamically allocated, hashed by connection name. The easiest way is probably to just move these to the ReplSemiSyncSlave structure.
  • Change initial allocation so that when semi sync starts, it will copy the initial values to all running multi-source instances as default values.
  • Change variable repl_semisync to be a dynamicly allocated variable, based on connection name.
  • Change the "semi_sync_slave_system_vars" and "semi_sync_slave_status_vars" variables to connection variables.
  • This is the hard part as don't have support for these kind of dynamic connection variables from a plugin.
  • Change fix_rpl_semi_sync_slave_enabled() to work with current connection name.

Add connection name to Binlog_relay_IO_param; This is needed to be able to lockup the correct value for rpl_semisync for the current master.

Comment by Elena Stepanova [ 2013-08-26 ]

MDEV-4920 was marked a duplicate of this report.

Comment by Kristian Nielsen [ 2015-01-09 ]

I believe MySQL 5.7 has multi-source?
So the necessary changes to the plugin should be available from there, I suppose - unless they also didn't care about making it work.

Comment by Michael Widenius [ 2020-04-15 ]

This is marked as 'done' as semisync, starting from MariaDB 10.3, is not a plugin anymore and this should hopefully solve this issue.

Generated at Thu Feb 08 06:51:15 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.