[MDEV-8915] Replication ignoring column which is missing on slave Created: 2015-10-07  Updated: 2023-06-06  Resolved: 2023-06-06

Status: Closed
Project: MariaDB Server
Component/s: Galera, Replication
Affects Version/s: 10.0.21
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Tim Soderstrom Assignee: Andrei Elkin
Resolution: Won't Fix Votes: 0
Labels: columns, galera, replication
Environment:

Ubuntu 12.04.3



 Description   

Replication appears to silently fail on a table which exists on the slave that is missing a column that is on the master. The master is part of a 3-node Galera setup. The column was created during an rolling-schema update which makes me wonder if that's the reason the column never made it to the slave. Row based replication is of course being used here.

The relay log on the slave includes the column when doing a 'mysqlbinlog -v -v'. The column is the last column on the table.

The table is getting updated on the slave, minus the column. So it looks like it's just silently ignoring the missing column from the replication stream but writing the rest of the data? Slave_skip_errors is empty and slave_sql_verify_checksum is enabled.



 Comments   
Comment by Elena Stepanova [ 2015-10-17 ]

m00dawg,

The description of the problem is quite unclear: e.g. you say that replication fails, but silently, but then you say that (some?) columns are updated; or, first you see that a column is missing on the slave, but then that it's present as the last column...

Could you please provide a specific example of what appears to be the problem? E.g. SHOW CREATE TABLE on the master and slave, event that is produced by the master, and what you see on the slave.

Please also attach config files from the master and slave.
Do I understand correctly that although the master is a part of the cluster, when we are talking about master->slave replication here, we assume the standard async replication?

Thanks.

Comment by Tim Soderstrom [ 2015-10-19 ]

Valid points. Here is the table we are using:

CREATE TABLE `test1` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`type` varchar(255) DEFAULT NULL,
`other_id` int(11) DEFAULT NULL,
`date_start` datetime DEFAULT NULL,
`other2_id` int(11) DEFAULT NULL,
`other3_id` int(11) DEFAULT NULL,
`field2` int(11) DEFAULT NULL,
`field3` int(11) DEFAULT NULL,
`field4` varchar(255) DEFAULT NULL,
`field5` varchar(255) DEFAULT NULL,
`duration_hours` int(11) DEFAULT NULL,
`duration_minutes` int(11) DEFAULT NULL,
`event_result` varchar(255) DEFAULT NULL,
`result_url` varchar(255) DEFAULT NULL,
`updated_at` datetime DEFAULT NULL,
`created_at` datetime DEFAULT NULL,
`flag1` int(11) DEFAULT NULL,
`flag2` int(11) DEFAULT NULL,
`canceled` tinyint(1) DEFAULT '0',
`no_count` tinyint(1) DEFAULT '0',
`notes` text,
`time_field` tinyint(1) DEFAULT '0',
`other4_id` int(11) DEFAULT NULL,
`last_email_at` datetime DEFAULT NULL,
`field6` int(11) DEFAULT NULL,
`date_end` datetime DEFAULT NULL,
`repeat_type` int(11) DEFAULT NULL,
`display_icon` varchar(255) DEFAULT NULL,
`group_uuid` varchar(255) DEFAULT NULL,
`flag3` tinyint(1) DEFAULT '0',
`flag4` tinyint(1) DEFAULT '0',
`field7` int(11) DEFAULT NULL,
`field8` int(11) DEFAULT NULL,
`field9` int(11) DEFAULT '0',
`other5_id` int(11) DEFAULT NULL,
`master_league_event_id` int(11) DEFAULT NULL,
`field10` tinyint(1) DEFAULT '1',
`source` varchar(255) DEFAULT NULL,
`other6_id` int(11) DEFAULT NULL,
`field_updated_at` datetime DEFAULT NULL,
`field11` varchar(255) DEFAULT NULL,
`field_timezone` varchar(255) DEFAULT NULL,
`source_timezone` varchar(255) DEFAULT NULL,
`location_details` varchar(64) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `index_events_on_other_id` (`other_id`),
KEY `index_events_on_other2_id` (`other2_id`),
KEY `index_events_on_other3_id` (`other3_id`),
KEY `index_events_on_other4_id` (`other4_id`),
KEY `index_events_on_other5_id` (`other5_id`),
KEY `index_events_on_master_league_event_id` (`master_league_event_id`),
KEY `index_events_on_date_start` (`date_start`),
KEY `index_events_on_group_uuid` (`group_uuid`(8)),
KEY `index_events_on_team_id_and_date_start` (`other_id`,`date_start`),
KEY `index_events_on_other6_id` (`other6_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=DYNAMIC;

We have a 3 node Galera cluster with one of the nodes replicating to a conventional MariaDB slave (node1 in our case). The MariaDB slave is what was silently failing. By silently failing, I mean that the added column did not exist on the slave, but table updates were still being processed by the slave. I confirmed in looking at the binary log that data for the missing row was in there. Replication should have failed due to the missing column on the slave, but instead, MariaDB was ignoring the column and inserting the others into the table.

The sequence of events that caused us to discover this was:

1. We performed a Rolling Schema Update on Galera cluster with one of the nodes serving as a replication master to a regular MariaDB slave
2. RSU involved adding a column which took about 15 minutes per node (location_details in the above table example)
3. Upon completion, the regular MariaDB slave replicating from one of the Galera nodes did not have the column, but was still receiving updates to the table, including contents of the column.

We were able to fix this by manually adding the column and then using pt-table-sync to sync the missing data. As we have corrected this, we do not have an immediate way to reproduce the bug at the moment. We should have more time to do this later this year if need be. For now, our work around is to run pt-table-checksum and compare the results from time to time.

Comment by Elena Stepanova [ 2019-04-06 ]

Elkin,
this is an old entry from the depth of our backlog. Please check if you see there anything worth digging into, otherwise feel free to close with any status of your choosing.

Comment by Jan Lindström [ 2023-06-06 ]

10.0 and 10.1 are EOL.

Generated at Thu Feb 08 07:30:45 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.