[MDEV-24367] S3 Engine: delayed slave may stop with an error Created: 2020-12-08  Updated: 2020-12-15

Status: Open
Project: MariaDB Server
Component/s: Replication, Storage Engine - S3
Affects Version/s: 10.5
Fix Version/s: 10.5

Type: Bug Priority: Major
Reporter: Sergei Petrunia Assignee: Michael Widenius
Resolution: Unresolved Votes: 0
Labels: replication

Issue Links:
Problem/Incident

 Description   

(I am not sure if this property is a bug or a fundamental limitation. If it is a limitation, it's not documented, and we need to define this limitation)

The issue:
Consider a case where table T is first created in Aria (or any other regular engine), and then it is re-created in S3. Operations on T in Aria are binlogged. The slave will try to apply them. If the slave is delayed, it will see the table in S3, and try to apply changes to the S3 table.

It is tempting to define the limitation as "For any dbname.tablename, you should decide beforehand what storage engine you're going to use, and only use that engine". The problem is that for S3 you can't do that - the only way to fill the S3 table is to create an Aria table (or other local table?) and then do "ALTER TABLE t ENGINE=S3". That is, it's unavoidable that t is a local table first, and then it's an S3 table.

A testcase:

mysql-test/suite/s3/rpl_c.cnf

!include ../rpl/my.cnf
!include ./my.cnf
!include ./slave.cnf

mysql-test/suite/s3/rpl_c.test

--source include/have_s3.inc
--source include/master-slave.inc
 
--disable_warnings
drop table if exists t1;
--enable_warnings
 
connection slave;
show variables like 's3_slave%';
 
stop slave;
 
connection master;
create table t1 (a int, b int) engine=aria;
insert into t1 values (1,1),(2,2),(3,3);
alter table t1 engine=s3;
 
connection slave;
show tables;
start slave;
 
connection master;
--sync_slave_with_master
 
connection slave;
select * from t1;
 
--source include/rpl_end.inc

This will fail with:

mysqltest: At line 27: sync_slave_with_master failed: 'select master_pos_wait('master-bin.000001', 1056, 300, '')' returned NULL indicating slave SQL thread failure

SHOW SLAVE STATUS:

Last_Error      Could not execute Write_rows_v1 event on table test.t1; 
Table 't1' is read only, Error_code: 1036; handler error 
HA_ERR_TABLE_READONLY; the event's master log 
master-bin.000001, end_log_pos 850

**** SHOW BINLOG EVENTS on default ****
binlog_name = 'master-bin.000001'
SHOW BINLOG EVENTS IN 'master-bin.000001';
Log_name        Pos     Event_type      Server_id       End_log_pos     Info
master-bin.000001       4       Format_desc     1       256     Server ver: 10.5.9-MariaDB-debug-log, Binlog ver: 4
master-bin.000001       256     Gtid_list       1       285     []
master-bin.000001       285     Binlog_checkpoint       1       329     master-bin.000001
master-bin.000001       329     Gtid    1       371     GTID 0-1-1
master-bin.000001       371     Query   1       489     use `test`; DROP TABLE IF EXISTS `t1` /* generated by server */
master-bin.000001       489     Gtid    1       531     GTID 0-1-2
master-bin.000001       531     Query   1       640     use `test`; create table t1 (a int, b int) engine=aria
master-bin.000001       640     Gtid    1       682     BEGIN GTID 0-1-3
master-bin.000001       682     Annotate_rows   1       744     insert into t1 values (1,1),(2,2),(3,3)
master-bin.000001       744     Table_map       1       790     table_id: 32 (test.t1)
master-bin.000001       790     Write_rows_v1   1       850     table_id: 32 flags: STMT_END_F
master-bin.000001       850     Query   1       923     COMMIT
master-bin.000001       923     Gtid    1       965     GTID 0-1-4
master-bin.000001       965     Query   1       1056    use `test`; alter table t1 engine=s3



 Comments   
Comment by Sergei Petrunia [ 2020-12-08 ]

One possible way to solve this: What if the table in S3 included a replication GTID value from the CREATE TABLE statement. Let it be CREATE_GTID. If the slave's current_GTID is before the CREATE_GTID, the slave cannot see the table.
(This is just an idea, need to think it through carefully)

Comment by Sergei Petrunia [ 2020-12-09 ]

My recollection of monty's suggestion in the discussion yesterday:

The described problem occurs due to the slave "seeing" the same table in both

  • A. the remote (in this case, S3) storage engine,
  • B. the local engine.

The slave is applying the binlog which describes past operations on the local table (B) but it also can see the present remote table (A).

The idea is: if the slave thread is about to create a local table and there is a naming conflict with a remote table, it should "hide" the remote table by deleting its .frm file.

This should be only allowed to happen when the thread creating the local table is a replication slave thread. (We don't want a regular user to be able to do create table t (...) engine=myisam and silently hide the remote table t with it. This is error-prone.)

Things to check: are we sure there's no situation where the server discovers that the table exists in both engines and raises an error condition?
(TODO: anything else?)

Comment by Sergei Petrunia [ 2020-12-09 ]

A possible workaround is:

Do not create a table T first in local engine, then in remote engine. Use a different table name for tables in different engines.

That is, the workflow should be like:

create table t1 ( ...) engine= Aria; -- Or any other local engine
-- ... insert/update rows ...
alter table t1 engine=s3, rename t1_s3;
-- use t1_s3 from that point on

Generated at Thu Feb 08 09:29:27 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.