[MDEV-33057] alter table form s3 to different engine on leader break replicas with binlog_alter_two_phase; Created: 2023-12-18 Updated: 2023-12-18 |
|
| Status: | Open |
| Project: | MariaDB Server |
| Component/s: | Replication, Storage Engine - S3 |
| Affects Version/s: | 10.11.6 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | VAROQUI Stephane | Assignee: | Unassigned |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Description |
|
On a replication cluster using mixed InnoDB tables ( active dataset) and S3 tables ( read only dataset) stored on a single shared s3 server. At some point to modify the content of any s3 table, we can alter it back to InnoDB on the leader of the replication cluster The new InnoDB table get deleted on all replicas probably via table discovery breaking the capability to failover the cluster without a filter on the modified table. The solution would be to inject DDL create or replace and all row events of the fetched row to innodb to materialized the innodb table on all replicas and looking at the binlogs we can see a tentative of doing this using binlog-alter-two-phase The reason it failed in optimistic and semi-sync the replication SQL thread receive the alter table but deadlock with a drop table
killing the alter and the drop table SQL threads and restart slave recover the innodb table on replicat for the record we used
We can confirme that disabling set global binlog_alter_two_phase=OFF; as suggested by @Kristian Nielsen is a valid workaround and table get restored on all replicas without replication issues |
| Comments |
| Comment by Kristian Nielsen [ 2023-12-18 ] |
|
I do not follow from the description why the table gets deleted on the replicas. Maybe I'm not sufficiently familiar with how S3 tables work - are they physically stored in some S3 cloud storage, with a storage engine layer that merely translates between SQL and corresponding network requests to the S3 storage? If the ALTER TABLE t_s3 ENGINE=InnoDB succeeds on the leader, then why doesn't it work the same on the replicas? Is the intention that the table should be altered to InnoDB on all the replicas, or only on the leader? I also did not understand the proposed solution "to inject cre", can you elaborate what "cre" means? |
| Comment by Kristian Nielsen [ 2023-12-18 ] |
|
From Zulip discussions, found that --binlog-alter-two-phase appears to be enabled. This explains how alter table and DROP TABLE end up replicating in parallel, and probably indicates a bug in the --binlog-alter-two-phase feature. |