[MDEV-33215] Table is overwriten on Slave when created on Master Created: 2024-01-10  Updated: 2024-01-12

Status: Needs Feedback
Project: MariaDB Server
Component/s: Replication
Affects Version/s: 10.3.20, 11.2.2
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Oli Sennhauser Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: replication
Environment:

Linux, n.a.



 Description   

When I have a table test on the Slave which does NOT exist on the Master and later I create the table on the Master the table is dropped and recreated on the Slave!!!
This is not expected behaviour. We expect that replication stops with an error.
This is also not the behaviour from a competitive product.

  • Reproduce:
    On Slave:
    create table test (id int, data varchar(255));
    insert into test values (1, 'Some data'), (2, 'Some more data');
    select * from test;
    2 rows in set (0.000 sec)

On Master:
create table test (id int);

On Slave;

select * from test;
Empty set (0.001 sec)
show create table test\G
*************************** 1. row ***************************
       Table: test
Create Table: CREATE TABLE `test` (
  `id` int(11) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1
 
 show slave status\G
              Slave_IO_Running: Yes
             Slave_SQL_Running: Yes
                Last_SQL_Errno: 0
                Last_SQL_Error: 

Really bad...



 Comments   
Comment by Sergei Golubchik [ 2024-01-10 ]

It's intentional, this behavior was implemented in this commit, ten years ago. It's controlled by the variable @@slave_ddl_exec_mode, documented as

Modes for how replication of DDL events should be executed. Legal values are STRICT and IDEMPOTENT (default). In IDEMPOTENT mode, the replica will not stop for failed DDL operations that would not cause a difference between the primary and the replica. In particular CREATE TABLE is treated as CREATE OR REPLACE TABLE and DROP TABLE is treated as DROP TABLE IF EXISTS.

While it's not a bug, perhaps, the default should be a more conservative STRICT? It's currently IDEMPOTENT which caused the behavior you're reporting.

Comment by Oli Sennhauser [ 2024-01-11 ]

We all (an old Oracle DBA, an old Informix DBA, and myself) have the opinion that the default should be a safe configuration. So we vote for STRICT.
IDEMPOTENT can lead to data loss. Better a breaking replication (by default) than loosing some data which is possibly not detected for days or even weeks.

Comment by Sergei Golubchik [ 2024-01-11 ]

I tend to agree. @@slave_exec_mode is STRICT by default. Before @@slave_exec_ddl_mode was implemented the behavior for DDLs was strict. MySQL, as you're saying, behaves as strict (even if it doesn't have this variable). But it was IDEMPOTENT for ~10 years, so it'll be a change in behavior.

knielsen, do you have an opinion here?

Comment by Kristian Nielsen [ 2024-01-11 ]

The "idempotent" mode was something monty implemented I think, I'm not 100% sure what the use-case of it is.

Maybe it can be related to the fact that replicating DDL on the slave is not crash-safe? It is possible for the slave to crash after the CREATE TABLE is replicated but before the GTID position is updated. In this case, the idempotency mode will allow the replication to continue after the slave comes back up. (I think DDLs by themselves are now crash safe, but the GTID position update for replicated DDL is not, AFAIK). Not sure that idempotency by itself is enough to make the slave crash safe for DDL in all cases though.

In my opinion, if the master and slave are diverged, then that is a problem in itself. There are many cases of diverged slave content that cannot be reliably detected, so just making CREATE TABLE "more strict" has limited value. If the slave and master are aligned, then idempotent or strict makes no difference.

If a user wants to deliberately run replication with different state on master and slave, this can be done in some scenarios, but then that user needs to be prepared to understand the possible issues in detail, including how to configure something like idempotency.

So I don't see a definitive argument one way or the other, either setting as default has merit.

Different people can have different opinions on what the defaults should be for the individual settings available. If we change the defaults back and forth every few years based on whoever's opinion is voiced the strongly at that particular time, then that at least is unlikely to be helpful for users.

Comment by Oli Sennhauser [ 2024-01-11 ]

Why we came to the problem: This was in a training session and the students got the task to "destroy" the replication. One student had the idea to create a table on the slave first, then on the master the same table to make replication break and we were completely surprised first, that replication did not break and later found out why this did not happen or respectively what happens and Sergey gave the explanation for the behaviour.

Real scenario: We see from time to time customer doing things on the slave (which does not happen on the master). For example reporting. So let us assume the case we have a table `webshop` on the master and it is replicated to the slave. On the slave we create a table `daily_aggregates` to aggregate the sales from the `webshop ` table for reporting. Now by accident somebody creates the table `daily_aggregate` on the master and immediately destroys our table on the slave with all its data in it... Not nice! Hopefully we have a backup from it. IMHO by default this should not happen.

Comment by Oli Sennhauser [ 2024-01-11 ]

About changing the defaults back and forth I agree with you. What is the general philosophy of MariaDB? Safety first or make all errors disappear until you cannot hide them any more?

Comment by Sergei Golubchik [ 2024-01-11 ]

"Different users have different opinions" — that's clearly subjective, we cannot change back and forth like that.

Here are less subjective arguments:

  • MariaDB has two slave...exec_mode variables. slave_exec_mode was implemented first, it is STRICT by default, slave_ddl_exec_mode was clearly modelled after the first one, but it has a different default, it's inconsistent.
  • Before slave_ddl_exec_mode was added, replication would've stopped if the slave table had existed, with the new variable having IDEMPOTENT default, it changed the behavior of replication in that case
  • MySQL replication stops if the slave table exists. While we can deviate from MySQL and we do, there has to be a really good reason for that, users generally appreciate when we're compatible.

These arguments aren't subjective. If slave_ddl_exec_mode would've been added now, it'd be a no-brainer, the default, of course, should've been STRICT.

But it was added ten years ago. Changing the default now would be an incompatible change. This is an objective argument for IDEMPOTENT. Is it strong enough to overweight three objective arguments for STRICT?

I personally — subjectively — think, it's not and we should change to STRICT. The fact that nobody noticed this in 10 years, nobody complained about this incompatible change in behavior when it happened — it shows it's a very low-key issue and hardly anyone will be affected negatively when we change the default.

Comment by Kristian Nielsen [ 2024-01-12 ]

Serg, I don't know why you skip the most important objective argument:

  • IDEMPOTENCY makes a normal, correct use case, where the user is not doing anything wrong, behave correctly and not break replication.
  • STRICT makes a slave crash at the wrong time break replication.

I just want to make sure this is understood. It's just a default, DBAs can change it back after upgrading if needed. But in general there seems to be a severe lack of understanding how disruptive any backwards-incompatible change can be to the huge fraction of MariaDB users that do not have dedicated teams or DBAs/developers assigned to manage the database, and who are not active in bug reports, public forums, or support cases.

Oli, the primary objective of MariaDB replication is to make correctly setup master-slave connections replicate correctly and never break. A secondary goal is to support, to the extent possible, more advanced use cases that are "less correct" such as deliberate differences on the slave like extra/missing tables etc.

Detecting diverging slaves as early as possible is always best, but in general is not possible to do reliably. Catching any random corruption deliberately made on a slave by a student is not a primary objective of MariaDB replication, that would require a significantly different replication architecture, I think.

Generated at Thu Feb 08 10:37:14 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.