[MDEV-28843] slave_type_conversions does not check for NOT NULL or DEFAULT differences Created: 2022-06-14  Updated: 2023-10-16

Status: Stalled
Project: MariaDB Server
Component/s: Replication
Affects Version/s: 10.6.8
Fix Version/s: 10.6

Type: Bug Priority: Critical
Reporter: Hartmut Holzgraefe Assignee: Andrei Elkin
Resolution: Unresolved Votes: 0
Labels: None
Environment:

Only tested on 10.6.8 so far, but it's probably on all versions



 Description   

Consider the following replication situation:

  • binlog_format=ROW
  • slave_type_conversions is empty, so tables should be defined the same on master and slave for replication to work
  • a column on the master is created with "NULL", but on the slave it has same type and size, but "NOT NULL DEFAULT ..." instead

E.g.:

  • on master: CREATE TABLE t1(id int not null primary key, msg varchar(100) NULL);
  • on slave: ALTER TABLE t1 MODIFY COLUMN msg varchar(100) NOT NULL DEFAULT 'abc';
  • on master: INSERT INTO TABLE t1 VALUES(1, NULL); SELECT * FROM t1;
  • -> result id:1; msg: NULL
  • on slave: SELECT * FROM t1;
  • -> result id: 1; msg: 'abc'

So now we have a data inconsistency between master and slave even though slave_type_conversions='' should have prevented this.

Double check to see that slave type conversion checks are indeed in effect:

  • on master CREATE TABLE t2(id int not null primary key, msg varchar(100));
  • on slave ALTER TABLE t2 MODIFY COLUMN msg varchar(200);
  • on master: INSERT INTO t2 VALUES(1, NULL);
  • on slave -> Last_SQL_Error: Column 1 of table 'test.t2' cannot be converted from type 'varchar(400 octets)' to type 'varchar(800 octets) character set utf8mb4'


 Comments   
Comment by Jan Lindström (Inactive) [ 2022-06-17 ]

julien.fritsch Not really, why and how data inconsistency between master and slave was created?

Comment by Hartmut Holzgraefe [ 2022-06-17 ]
  • master sends row image with a NULL value, for a column that allows NULLs
  • slave has different table schema, which is what slave_type_conversions='' (aka: none) should prevent
  • column on slave is NOT NULL with a default value
  • slave tries to apply row image, sees NULL value, replaces it with the given non-NULL default
  • so master and slave now have different data

What I would have expected to happen:

  • either slave outright refuses to apply the NULL value
  • slave only doing the replacement with default value in case slave_type_conversions are allowed
Comment by Jan Lindström (Inactive) [ 2022-06-17 ]

hholzgra My questions was why and how slave has different table schema? Using normal DDL supported by cluster you should not be able to do so.

Comment by Hartmut Holzgraefe [ 2022-06-17 ]

In this specific case it was about a rolling schema upgrade on a Galera cluster, and whether writes to the table would be safe while the RSU was ongoing or not.

But also in general users sometimes have different table schema on master and slave, that's what slave_type_conversions=ON/OFF is about in the first place ...

Comment by Jan Lindström (Inactive) [ 2022-08-17 ]

hholzgra Based on my testing on 10.6 Galera does work exactly as normal master-slave i.e.

include/master-slave.inc
[connection master]
connection slave;
set global slave_type_conversions='';
connection master;
CREATE TABLE t1(id int not null primary key, msg varchar(100) NULL) engine=innodb;
connection slave;
connection slave;
ALTER TABLE t1 MODIFY COLUMN msg varchar(100) NOT NULL DEFAULT 'abc';
show create table t1;
Table	Create Table
t1	CREATE TABLE `t1` (
  `id` int(11) NOT NULL,
  `msg` varchar(100) NOT NULL DEFAULT 'abc',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
connection master;
show create table t1;
Table	Create Table
t1	CREATE TABLE `t1` (
  `id` int(11) NOT NULL,
  `msg` varchar(100) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
INSERT INTO t1 VALUES(1, NULL);
connection slave;
connection master;
SELECT * FROM t1;
id	msg
1	NULL
connection slave;
SELECT * FROM t1;
id	msg
1	abc

Comment by Hartmut Holzgraefe [ 2022-08-17 ]

So master and slave do not have the same "msg" value (NULL vs. abc), conversion between master and slave has happened, which empty slave_type_conversion should have prevented.

And yes, it does not matter whether classic replication or Galera is used. I just ran into it on a Galera customer case first.

Comment by Jan Lindström (Inactive) [ 2022-08-17 ]

hholzgra You can use --binlog-row-image=FULL but however there's a definite overlap between these two options - the image and conversion. I'd say when the image is not FULL the conversion can't full apply. That is CONVERSION only applies to fields that are present in the non-full image. However, in my opinion in your case there is no conversion happening it just about DEFAULT value behavior.

Comment by Jan Lindström (Inactive) [ 2022-08-18 ]

Elkin I leave it to you define is this DEFAULT value behavior a bug or works-as-designed.

Comment by Andrei Elkin [ 2023-10-11 ]

hholzgra, I think this case is not a bug actually. RBR behaves correctly to replicate the master's NULL value which is - not converted (sic!), but - computed into abc. This result may be expected by the user who intends (and we of course have a great number of those practically) for the slave's ALTER this effect.

We might consider more strict policies to react on this type of table definition mismatch with an error unless the user would specify his intents to produce
a different results on slave explicitly.
But that'd be a new option not slave_type_conversions extension imo.

With this I am closing the ticket, feel free to reopen and explain if I am missing anything.

Comment by Hartmut Holzgraefe [ 2023-10-11 ]

So you are basically telling me that "VARCHAR() NULL" and "VARCHAR() NOT NULL" are the same type?

IMHO replicating from a NULLable column to a NOT NULL one is a lossy conversion and should only be allowed with

slave_type_conversions=ALLOW_LOSSY

in effect.

Comment by Andrei Elkin [ 2023-10-12 ]

hholzgra, we implemented the conversion copying the upstream. The conversion applies when the basic type (incl the size of a var-size type) is different. They do not replicate NULL-ability as a a part of metadata. Therefore mariadb slave can't separate the same type with and without NULL, and more generally with different defaults.
The idea was simple - if the user defines a replicated table with different column defaults (e.g a NULL remains a NULL on master but is to be converted to 'abc' on slave) she means that.

To
> slave_type_conversions='' should have prevented this
Of course this view makes sense but that was not meant to be at the inception of the option. Feel free to read the upstream docs on that subject too.

Overall I believe what you're trying to achieve is fair to qualify as a task, not a bug.

Generated at Thu Feb 08 10:03:54 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.