[MDEV-28292] Allow both --replicate-same-server-id=on and --log-slave-updates=on to be enabled at the same time Created: 2022-04-11  Updated: 2022-07-26  Resolved: 2022-04-25

Status: Closed
Project: MariaDB Server
Component/s: Replication
Fix Version/s: N/A

Type: Task Priority: Major
Reporter: Daniel Lenski Assignee: Unassigned
Resolution: Not a Bug Votes: 0
Labels: None

Issue Links:
Relates
relates to MDEV-23990 enable replicate-same-server-id while... Open
relates to MDEV-27760 event may non stop replicate in circu... Closed
relates to MDEV-28609 refine gtid-strict-mode to ignore sam... Closed

 Description   

Enabling both log-slave-updates and replicate-same-server-id simultaneously is possible in MySQL 8.0+ thanks to GTID.

This is a useful feature to guarantee durability in certain replication configurations. MariaDB should add this capability. This requires adding new code to gracefully ignore duplicated GTIDs.

I am willing to write this code but request advice from upstream core contributors on how to properly do it.

Research

  • MariaDB 10.6 forbids the combination of log-slave-updates and replicate-same-server-id as a vestige of old code from MySQL, even though it always writes GTID to the binlog.
  • MySQL <8.0 forbade this combination entirely. An enhancement in MySQL >=8.0 allows it if GTID logging is enabled, because logging GTIDs should permit the identification and deduplication of repeated transactions

Testing

I modified MariaDB 10.6 to allow the combination of these two flags:

 --- a/sql/mysqld.cc
 +++ b/sql/mysqld.cc
 @@ -4853,24 +4853,6 @@
    DBUG_ASSERT((uint)global_system_variables.binlog_format <=
                array_elements(binlog_format_names)-1);
  
 -#ifdef HAVE_REPLICATION
 -  if (opt_log_slave_updates && replicate_same_server_id)
 -  {
 -    if (opt_bin_log)
 -    {
 -      sql_print_error("using --replicate-same-server-id in conjunction with "
 -                      "--log-slave-updates is impossible, it would lead to "
 -                      "infinite loops in this server.");
 -      unireg_abort(1);
 -    }
 -    else
 -      sql_print_warning("using --replicate-same-server-id in conjunction with "
 -                        "--log-slave-updates would lead to infinite loops in "
 -                        "this server. However this will be ignored as the "
 -                        "--log-bin option is not defined.");
 -  }
 -#endif
 -
    if (opt_bin_log)
    {
      /* Reports an error and aborts, if the --log-bin's path 
 @@ -6552,9 +6534,7 @@
     0, 0, 0, GET_STR | GET_ASK_ADDR, REQUIRED_ARG, 0, 0, 0, 0, 0, 0},
  #ifdef HAVE_REPLICATION
    {"replicate-same-server-id", 0,
 -   "In replication, if set to 1, do not skip events having our server id. "
 -   "Default value is 0 (to break infinite loops in circular replication). "
 -   "Can't be set to 1 if --log-slave-updates is used.",
 +   "In replication, if set to 1, do not skip events having our server id.",
     &replicate_same_server_id, &replicate_same_server_id,
     0, GET_BOOL, NO_ARG, 0, 0, 0, 0, 0, 0},
  #endif

Then I set the combination of the two flags in rpl_circular_for_4_hosts.test, an MTR test of circular replication with a 4-server configuration:

 --- a/mysql-test/suite/rpl/t/rpl_circular_for_4_hosts.cnf
 +++ b/mysql-test/suite/rpl/t/rpl_circular_for_4_hosts.cnf
 @@ -2,18 +2,22 @@
  
  [mysqld.1]
  log-slave-updates
 +replicate-same-server-id
  loose-innodb
  
  [mysqld.2]
  log-slave-updates
 +replicate-same-server-id
  loose-innodb
  
  [mysqld.3]
  log-slave-updates
 +replicate-same-server-id
  loose-innodb
  
  [mysqld.4]
  log-slave-updates
 +replicate-same-server-id
  loose-innodb
  
  [ENV]

With both of these flags, the servers start up successfully, but the test fails after one of the slaves is stopped and attempts to restart. This is due to errors with duplicate transactions:

2022-04-07 16:23:44 11 [ERROR] Slave SQL: Error 'Duplicate entry '1' for key 'PRIMARY'' on query. Default database: 'test'. Query: 'INSERT INTO t2 (b,c) VALUES('MDEV-515', 100)', Gtid 0-1-3, Internal MariaDB error code: 1062
2022-04-07 16:23:44 11 [Warning] Slave: Duplicate entry '1' for key 'PRIMARY' Error_code: 1062

The above might be solvable by gracefully dropping duplicate GTIDs, but I need upstream guidance on how to properly proceed with this.



 Comments   
Comment by Daniel Lenski [ 2022-04-11 ]

MDEV-23990 is a very similar issue from ~2 years ago, but appears to be specifically focused on Galera replication.

Comment by Daniel Black [ 2022-04-12 ]

MDEV-27760 is merged into 10.6 that resolves the endless replication problem.

Elkin commented "So to answer the question, yes, to support the combination is feasible."

Comment by Andrei Elkin [ 2022-04-12 ]

Thanks, danblack. Let me be more specific.
MDEV-27760 actually provides a solution to accept own server-id events requiring though
the slave server to be set with
set @@global.rpl_semi_sync_slave_enabled=1; and
set @@global.gtid_strict_mode=1;.

The semisync enablement does not mean the semisync actual mode of the slave operation - that's just a formal setting (of readiness) and can be lifted altogether.
dlenski: maybe that's what you'd be satisfied with?

Comment by Daniel Lenski [ 2022-04-12 ]

Thank you!

elkin wrote:

MDEV-27760 actually provides a solution to accept own server-id events requiring though
the slave server to be set with
set @@global.rpl_semi_sync_slave_enabled=1; and
set @@global.gtid_strict_mode=1;

The semisync enablement does not mean the semisync actual mode of the slave operation - that's just a formal setting (of readiness) and can be lifted altogether.

Does this mean that both replicate-same-server-id=on and log-slave-updates=on can be enabled as long as gtid-strict-mode=on and rpl_semi_sync_slave_enabled=1 are also enabled?

I didn't fully understand your comment about "the semisync enablement."

Comment by Andrei Elkin [ 2022-04-14 ]

dlenski: to the semisync enablement, MDEV-27760 fixes make the same server-id transactions accepted only under the condition of the two slave variables are set,
but one does not have to semi-sync enable master.
You also don't have to touch replicate-same-server-id whose role is to terminate event circulation, designed in pre-GTID times. CHANGE-MASTER's do,ignore- domain_id are supposed to supersede.

The gtid strict mode ON role is to not let re-execution of the same gtid transactions.
(Obviously that does not relieve the user from a correct ordering of replicated transactions).
log-slave-updates value is irrelevant in the above, so ON is allowed.

Comment by Otto Kekäläinen [ 2022-04-25 ]

Using gtid-strict-mode=on is the correct path forward, thus closing this issue.

The docs at https://mariadb.com/kb/en/gtid/#gtid_strict_mode and a couple mentions of GTID at https://mariadb.com/kb/en/mariadb-vs-mysql-compatibility/ are a bit thin, but I don't have any suggestions to improve them right now.

Generated at Thu Feb 08 09:59:35 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.