Details
-
Bug
-
Status: Closed (View Workflow)
-
Minor
-
Resolution: Fixed
-
10.0.2
-
None
-
None
Description
I set replication 1->2 to use GTID, start it, execute some events on server 1 and server 2, then set replication 2->1 to use GTID too, and attempt to start it.
In the example below, it fails with "'Table 'test.t2' doesn't exist'", apparently it misses an event upon startup, although it's present in the binary log.
Output of the test case provided below
#
|
# For now we'll only have 1->2 running |
#
|
# Server 1
|
# Stop replication 2->1
|
include/stop_slave.inc
|
#
|
# Server 2
|
# Use GTID for replication 1->2
|
include/stop_slave.inc
|
change master to master_use_gtid=1;
|
include/start_slave.inc
|
#
|
# Create some 0-1-* and 0-2-* events in binlog of server 2
|
connection server_1;
|
create table t1 (i int) engine=InnoDB;
|
insert into t1 values (1);
|
connection server_2;
|
create table t2 (i int) engine=InnoDB;
|
connection server_1;
|
insert into t1 values (2);
|
connection server_2;
|
insert into t2 values (1);
|
#
|
# All events are present in the binlog of server 2
|
show binlog events;
|
Log_name Pos Event_type Server_id End_log_pos Info
|
slave-bin.000001 4 Format_desc 2 248 Server ver: 10.0.1-MariaDB-debug-log, Binlog ver: 4
|
slave-bin.000001 248 Gtid_list 2 271 []
|
slave-bin.000001 271 Binlog_checkpoint 2 310 slave-bin.000001
|
slave-bin.000001 310 Gtid 1 348 GTID 0-1-1
|
slave-bin.000001 348 Query 1 453 use `test`; create table t1 (i int) engine=InnoDB
|
slave-bin.000001 453 Gtid 1 491 BEGIN GTID 0-1-2
|
slave-bin.000001 491 Query 1 584 use `test`; insert into t1 values (1)
|
slave-bin.000001 584 Xid 1 611 COMMIT /* xid=277 */
|
slave-bin.000001 611 Gtid 2 649 GTID 0-2-3
|
slave-bin.000001 649 Query 2 754 use `test`; create table t2 (i int) engine=InnoDB
|
slave-bin.000001 754 Gtid 1 792 BEGIN GTID 0-1-3
|
slave-bin.000001 792 Query 1 885 use `test`; insert into t1 values (2)
|
slave-bin.000001 885 Xid 1 912 COMMIT /* xid=282 */
|
slave-bin.000001 912 Gtid 2 950 BEGIN GTID 0-2-4
|
slave-bin.000001 950 Query 2 1043 use `test`; insert into t2 values (1)
|
slave-bin.000001 1043 Xid 2 1070 COMMIT /* xid=283 */
|
#
|
# Server 1
|
# Start replication 2->1 using GTID,
|
# it fails with 'Table 'test.t2' doesn't exist' |
# (which shows up either as a failure on sync_with_master, |
# or more often as hanging start_slave.inc) |
change master to master_use_gtid=1; |
include/start_slave.inc
|
MariaDB [test]> show slave status \G
|
*************************** 1. row ***************************
|
Slave_IO_State: Waiting for master to send event
|
Master_Host: 127.0.0.1
|
Master_User: root
|
Master_Port: 16001
|
Connect_Retry: 1
|
Master_Log_File: slave-bin.000001
|
Read_Master_Log_Pos: 1070
|
Relay_Log_File: master-relay-bin.000002
|
Relay_Log_Pos: 597
|
Relay_Master_Log_File: slave-bin.000001
|
Slave_IO_Running: Yes
|
Slave_SQL_Running: No
|
Replicate_Do_DB:
|
Replicate_Ignore_DB:
|
Replicate_Do_Table:
|
Replicate_Ignore_Table:
|
Replicate_Wild_Do_Table:
|
Replicate_Wild_Ignore_Table:
|
Last_Errno: 1146
|
Last_Error: Error 'Table 'test.t2' doesn't exist' on query. Default database: 'test'. Query: 'insert into t2 values (1)'
|
Skip_Counter: 0
|
Exec_Master_Log_Pos: 310
|
Relay_Log_Space: 1053
|
Until_Condition: None
|
Until_Log_File:
|
Until_Log_Pos: 0
|
Master_SSL_Allowed: No
|
Master_SSL_CA_File:
|
Master_SSL_CA_Path:
|
Master_SSL_Cert:
|
Master_SSL_Cipher:
|
Master_SSL_Key:
|
Seconds_Behind_Master: NULL
|
Master_SSL_Verify_Server_Cert: No
|
Last_IO_Errno: 0
|
Last_IO_Error:
|
Last_SQL_Errno: 1146
|
Last_SQL_Error: Error 'Table 'test.t2' doesn't exist' on query. Default database: 'test'. Query: 'insert into t2 values (1)'
|
Replicate_Ignore_Server_Ids:
|
Master_Server_Id: 2
|
Using_Gtid: 1
|
1 row in set (0.00 sec)
|
Test case:
--source include/have_innodb.inc
|
--let $rpl_topology=1->2->1
|
--source include/rpl_init.inc
|
|
--echo #
|
--echo # For now we'll only have 1->2 running
|
|
--echo #
|
--echo # Server 1
|
--echo # Stop replication 2->1
|
--connection server_1
|
--source include/stop_slave.inc
|
|
--echo #
|
--echo # Server 2
|
--echo # Use GTID for replication 1->2
|
--connection server_2
|
--source include/stop_slave.inc
|
change master to master_use_gtid=1; |
--source include/start_slave.inc
|
|
--echo #
|
--echo # Create some 0-1-* and 0-2-* events in binlog of server 2
|
|
--enable_connect_log
|
|
--connection server_1
|
create table t1 (i int) engine=InnoDB; |
insert into t1 values (1); |
--save_master_pos
|
|
--connection server_2
|
--sync_with_master
|
create table t2 (i int) engine=InnoDB; |
--save_master_pos
|
|
--connection server_1
|
insert into t1 values (2); |
--save_master_pos
|
|
--connection server_2
|
--sync_with_master
|
insert into t2 values (1); |
--save_master_pos
|
|
--disable_connect_log
|
|
--echo #
|
--echo # All events are present in the binlog of server 2
|
|
show binlog events;
|
|
--echo #
|
--echo # Server 1
|
--echo # Start replication 2->1 using GTID,
|
--echo # it fails with 'Table 'test.t2' doesn't exist'
|
--echo # (which shows up either as a failure on sync_with_master,
|
--echo # or more often as hanging start_slave.inc)
|
|
--connection server_1
|
change master to master_use_gtid=1; |
--source include/start_slave.inc
|
--sync_with_master
|
|
|
--source include/rpl_end.inc
|
cnf file:
!include suite/rpl/rpl_1slave_base.cnf
|
!include include/default_client.cnf
|
|
|
[mysqld.1]
|
log-slave-updates
|
loose-innodb
|
|
[mysqld.2]
|
log-slave-updates
|
loose-innodb
|
bzr version-info
revision-id: knielsen@knielsen-hq.org-20130503092729-gedp152b19k5wdnj
|
revno: 3626
|
branch-nick: 10.0-base
|
Attachments
Issue Links
- relates to
-
MDEV-26 Global transaction ID
-
- Closed
-
Thanks for testing this!
You are in uncharted territory, I did consider circular topologies in the
design but did not test yet
There is one problem with your test. You have two masters active at the same
time. Doing this with GTID requires configuring different gtid_domain_id for
the two masters.
It does not help that you stopped the direction 2->1. What matters is that you
have two masters (whether their slave is running or not at the precise
moment), and you are doing updates on one without first replicating all
changes from the other.
Concretely, we have
S2: create table t2 ...
S1: insert into t1 ...
On S2, "create table t2" will be binlogged before "insert into t1". But on S1,
"insert into t1" is binlogged first.
So when S1 connects with GTID as slave to S2, it asks to start from the
"insert into t1" which is the latest GTID it applied in domain 0. But this is
after "create table t2" in binlog on S2, so that event is lost.
There are two ways to do this correctly:
1. Either configure different domain_ids for S1 and S2.
2. Or alternatively, use a single domain and make sure that everything is
replicated S2->S1 before doing changes on S1, and vice versa. Basically, let
both slave directions run at the same time, and --sync-with-master each time
before doing updates on the next server.
So I would like to handle this in two stages.
First, we should make sure that (1) and (2) actually work correctly (they
should, but it definitely needs testing, there are likely bugs).
Second, while what the test does is fundamentally incorrect and cannot work,
it would still be best if we can give a better user experience, give a clear
error message rather than silently dropping events.
"First" we should do now. "Second" I would like to revisit later, when the
basic stuff has been better tested and is solid. Multimaster ring is somewhat
of an advanced concept, and it is more reasonable to expect more knowledge
from the DBA who sets that up. My vague idea is that we could implement a GTID
"strict" mode that would detect the wrong configuration and give an error
immediately. The detection would be to see that S2 gets an event from S1 with
the same sequence number that it already logged itself to its own binlog. Such
strict mode can probably not be on by default though, as then a simple upgrade
to 10.0 would break ring setups, even if users have no plans to use GTID.
What do you think?