[MDEV-24087] s3.replication_partition fails in buildbot wiht replication failure Created: 2020-11-02  Updated: 2023-12-05

Status: Stalled
Project: MariaDB Server
Component/s: Storage Engine - S3, Tests
Affects Version/s: 10.5
Fix Version/s: 10.5

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Michael Widenius
Resolution: Unresolved Votes: 0
Labels: None


 Description   

http://buildbot.askmonty.org/buildbot/builders/kvm-rpm-fedora32-amd64/builds/621/steps/mtr-s3/logs/stdio

10.5 8e1e2856f2523c225a81840059e93fa9f61fbacf

s3.replication_partition 'innodb,mix'    [ fail ]
        Test ended at 2020-11-01 21:32:17
 
CURRENT_TEST: s3.replication_partition
analyze: sync_with_master
mysqltest: At line 106: sync_slave_with_master failed: 'select master_pos_wait('master-bin.000001', 4054, 300, '')' returned NULL indicating slave SQL thread failure
 
The result from queries just before the failure was:
< snip >
select sum(c1) from t1;
ERROR 42S02: Table 'database.t1' doesn't exist
start slave;
connection master;
#
# Check altering partitioned table to S3 and back
# Checks also rename partitoned table and drop partition
#
CREATE TABLE t2 (
c1 int primary key,
c2 int DEFAULT NULL
) ENGINE=InnoDB
PARTITION BY RANGE (c1)
(PARTITION p1 VALUES LESS THAN (200),
PARTITION p2 VALUES LESS THAN (300),
PARTITION p3 VALUES LESS THAN (400));
insert into t2 select seq*100,seq*100 from seq_1_to_3;
alter table t2 engine=S3;
rename table t2 to t1;
alter table t1 drop partition p1;
 
More results from queries before failure can be found in /dev/shm/var/log/replication_partition.log
 
 
 == /dev/shm/var/tmp/analyze-sync_with_master-mysqld.1.err ==
 
############################## default ##############################
 
**** SHOW WARNINGS on default ****
SHOW WARNINGS;
Level	Code	Message
 
**** SELECT replication-related variables on default ****
SELECT NOW(), @@SERVER_ID;
NOW()	@@SERVER_ID
2020-11-01 21:32:17	1
 
**** SHOW SLAVE STATUS on default ****
SHOW SLAVE STATUS;
 
**** SHOW MASTER STATUS on default ****
SHOW MASTER STATUS;
File	master-bin.000001
Position	4054
Binlog_Do_DB	
Binlog_Ignore_DB	
 
**** SHOW SLAVE HOSTS on default ****
SHOW SLAVE HOSTS;
Server_id	2
Host	127.0.0.1
Port	16001
Master_id	1
 
**** SHOW PROCESSLIST on default ****
SHOW PROCESSLIST;
Id	User	Host	db	Command	Time	State	Info	Progress
12	root	localhost:55242	NULL	Binlog Dump	0	Master has sent all binlog to slave; waiting for more updates	NULL	0.000
13	root	localhost	NULL	Query	0	starting	SHOW PROCESSLIST	0.000
 
**** SHOW BINARY LOGS on default ****
SHOW BINARY LOGS;
Log_name	File_size
master-bin.000001	4054
 
**** SHOW BINLOG EVENTS on default ****
binlog_name = 'master-bin.000001'
SHOW BINLOG EVENTS IN 'master-bin.000001';
Log_name	Pos	Event_type	Server_id	End_log_pos	Info
master-bin.000001	4	Format_desc	1	256	Server ver: 10.5.7-MariaDB-log, Binlog ver: 4
master-bin.000001	256	Gtid_list	1	285	[]
master-bin.000001	285	Binlog_checkpoint	1	329	master-bin.000001
master-bin.000001	329	Gtid	1	371	GTID 0-1-1
master-bin.000001	371	Query	1	530	create database s3_test_b64d55591c8911ebb00c525400123456
master-bin.000001	530	Gtid	1	572	GTID 0-1-2
master-bin.000001	572	Query	1	762	use `s3_test_b64d55591c8911ebb00c525400123456`; CREATE TABLE t1 (
c1 INT DEFAULT NULL
) ENGINE=Aria
PARTITION BY HASH (c1)
PARTITIONS 3
master-bin.000001	762	Gtid	1	804	BEGIN GTID 0-1-3
master-bin.000001	804	Query	1	964	use `s3_test_b64d55591c8911ebb00c525400123456`; INSERT INTO t1 VALUE (1), (2), (101), (102), (201), (202)
master-bin.000001	964	Query	1	1073	COMMIT
master-bin.000001	1073	Gtid	1	1115	GTID 0-1-4
master-bin.000001	1115	Query	1	1242	use `s3_test_b64d55591c8911ebb00c525400123456`; ALTER TABLE t1 ENGINE=S3
master-bin.000001	1242	Gtid	1	1284	GTID 0-1-5
master-bin.000001	1284	Query	1	1428	use `s3_test_b64d55591c8911ebb00c525400123456`; set @@sql_if_exists=1; ALTER TABLE t1 ADD PARTITION PARTITIONS 6
master-bin.000001	1428	Gtid	1	1470	GTID 0-1-6
master-bin.000001	1470	Query	1	1604	use `s3_test_b64d55591c8911ebb00c525400123456`; set @@sql_if_exists=1; ALTER TABLE t1 ADD COLUMN c INT
master-bin.000001	1604	Gtid	1	1646	GTID 0-1-7
master-bin.000001	1646	Query	1	1800	use `s3_test_b64d55591c8911ebb00c525400123456`; DROP TABLE IF EXISTS `t1` /* generated by server */
master-bin.000001	1800	Gtid	1	1842	GTID 0-1-8
master-bin.000001	1842	Query	1	2154	use `s3_test_b64d55591c8911ebb00c525400123456`; CREATE TABLE t1 (
c1 int primary key,
c2 int DEFAULT NULL
) ENGINE=InnoDB
PARTITION BY RANGE (c1)
(PARTITION p1 VALUES LESS THAN (200),
PARTITION p2 VALUES LESS THAN (300),
PARTITION p3 VALUES LESS THAN (400))
master-bin.000001	2154	Gtid	1	2196	BEGIN GTID 0-1-9
master-bin.000001	2196	Query	1	2352	use `s3_test_b64d55591c8911ebb00c525400123456`; insert into t1 select seq*100,seq*100 from seq_1_to_3
master-bin.000001	2352	Xid	1	2383	COMMIT /* xid=155 */
master-bin.000001	2383	Gtid	1	2425	GTID 0-1-10
master-bin.000001	2425	Query	1	2552	use `s3_test_b64d55591c8911ebb00c525400123456`; alter table t1 engine=S3
master-bin.000001	2552	Gtid	1	2594	GTID 0-1-11
master-bin.000001	2594	Query	1	2763	use `s3_test_b64d55591c8911ebb00c525400123456`; set @@sql_if_exists=1; ALTER TABLE t1 ADD PARTITION (PARTITION p4 VALUES LESS THAN (500))
master-bin.000001	2763	Gtid	1	2805	GTID 0-1-12
master-bin.000001	2805	Query	1	2959	use `s3_test_b64d55591c8911ebb00c525400123456`; DROP TABLE IF EXISTS `t1` /* generated by server */
master-bin.000001	2959	Gtid	1	3001	GTID 0-1-13
master-bin.000001	3001	Query	1	3313	use `s3_test_b64d55591c8911ebb00c525400123456`; CREATE TABLE t2 (
c1 int primary key,
c2 int DEFAULT NULL
) ENGINE=InnoDB
PARTITION BY RANGE (c1)
(PARTITION p1 VALUES LESS THAN (200),
PARTITION p2 VALUES LESS THAN (300),
PARTITION p3 VALUES LESS THAN (400))
master-bin.000001	3313	Gtid	1	3355	BEGIN GTID 0-1-14
master-bin.000001	3355	Query	1	3511	use `s3_test_b64d55591c8911ebb00c525400123456`; insert into t2 select seq*100,seq*100 from seq_1_to_3
master-bin.000001	3511	Xid	1	3542	COMMIT /* xid=181 */
master-bin.000001	3542	Gtid	1	3584	GTID 0-1-15
master-bin.000001	3584	Query	1	3711	use `s3_test_b64d55591c8911ebb00c525400123456`; alter table t2 engine=S3
master-bin.000001	3711	Gtid	1	3753	GTID 0-1-16
master-bin.000001	3753	Query	1	3877	use `s3_test_b64d55591c8911ebb00c525400123456`; set @@sql_if_exists=1; rename table t2 to t1
master-bin.000001	3877	Gtid	1	3919	GTID 0-1-17
master-bin.000001	3919	Query	1	4054	use `s3_test_b64d55591c8911ebb00c525400123456`; set @@sql_if_exists=1; alter table t1 drop partition p1
 
**** SHOW RELAYLOG EVENTS on default ****
relaylog_name = 'No such row'
SHOW RELAYLOG EVENTS IN 'No such row';
Log_name	Pos	Event_type	Server_id	End_log_pos	Info
connection default;
 
 == /dev/shm/var/tmp/analyze-sync_with_master-mysqld.2.err ==
 
############################## default ##############################
 
**** SHOW WARNINGS on default ****
SHOW WARNINGS;
Level	Code	Message
 
**** SELECT replication-related variables on default ****
SELECT NOW(), @@SERVER_ID;
NOW()	@@SERVER_ID
2020-11-01 21:32:17	2
 
**** SHOW SLAVE STATUS on default ****
SHOW SLAVE STATUS;
Slave_IO_State	Waiting for master to send event
Master_Host	127.0.0.1
Master_User	root
Master_Port	16000
Connect_Retry	1
Master_Log_File	master-bin.000001
Read_Master_Log_Pos	4054
Relay_Log_File	slave-relay-bin.000004
Relay_Log_Pos	1670
Relay_Master_Log_File	master-bin.000001
Slave_IO_Running	Yes
Slave_SQL_Running	No
Replicate_Do_DB	
Replicate_Ignore_DB	
Replicate_Do_Table	
Replicate_Ignore_Table	
Replicate_Wild_Do_Table	
Replicate_Wild_Ignore_Table	
Last_Errno	1033
Last_Error	Error 'Incorrect information in file: './s3_test_b64d55591c8911ebb00c525400123456/t1.frm'' on query. Default database: 's3_test_b64d55591c8911ebb00c525400123456'. Query: 'alter table t1 drop partition p1'
Skip_Counter	0
Exec_Master_Log_Pos	3877
Relay_Log_Space	2667
Until_Condition	None
Until_Log_File	
Until_Log_Pos	0
Master_SSL_Allowed	No
Master_SSL_CA_File	
Master_SSL_CA_Path	
Master_SSL_Cert	
Master_SSL_Cipher	
Master_SSL_Key	
Seconds_Behind_Master	NULL
Master_SSL_Verify_Server_Cert	No
Last_IO_Errno	0
Last_IO_Error	
Last_SQL_Errno	1033
Last_SQL_Error	Error 'Incorrect information in file: './s3_test_b64d55591c8911ebb00c525400123456/t1.frm'' on query. Default database: 's3_test_b64d55591c8911ebb00c525400123456'. Query: 'alter table t1 drop partition p1'
Replicate_Ignore_Server_Ids	
Master_Server_Id	1
Master_SSL_Crl	
Master_SSL_Crlpath	
Using_Gtid	No
Gtid_IO_Pos	
Replicate_Do_Domain_Ids	
Replicate_Ignore_Domain_Ids	
Parallel_Mode	optimistic
SQL_Delay	0
SQL_Remaining_Delay	NULL
Slave_SQL_Running_State	
Slave_DDL_Groups	14
Slave_Non_Transactional_Groups	1
Slave_Transactional_Groups	2
 
**** SHOW MASTER STATUS on default ****
SHOW MASTER STATUS;
File	slave-bin.000001
Position	3876
Binlog_Do_DB	
Binlog_Ignore_DB	
 
**** SHOW SLAVE HOSTS on default ****
SHOW SLAVE HOSTS;
 
**** SHOW PROCESSLIST on default ****
SHOW PROCESSLIST;
Id	User	Host	db	Command	Time	State	Info	Progress
13	system user		NULL	Slave_IO	0	Waiting for master to send event	NULL	0.000
15	root	localhost	NULL	Query	0	starting	SHOW PROCESSLIST	0.000
 
**** SHOW BINARY LOGS on default ****
SHOW BINARY LOGS;
Log_name	File_size
slave-bin.000001	3876
 
**** SHOW BINLOG EVENTS on default ****
binlog_name = 'slave-bin.000001'
SHOW BINLOG EVENTS IN 'slave-bin.000001';
Log_name	Pos	Event_type	Server_id	End_log_pos	Info
slave-bin.000001	4	Format_desc	2	256	Server ver: 10.5.7-MariaDB-log, Binlog ver: 4
slave-bin.000001	256	Gtid_list	2	285	[]
slave-bin.000001	285	Binlog_checkpoint	2	328	slave-bin.000001
slave-bin.000001	328	Gtid	1	370	GTID 0-1-1
slave-bin.000001	370	Query	1	529	create database s3_test_b64d55591c8911ebb00c525400123456
slave-bin.000001	529	Gtid	1	571	GTID 0-1-2
slave-bin.000001	571	Query	1	761	use `s3_test_b64d55591c8911ebb00c525400123456`; CREATE TABLE t1 (
c1 INT DEFAULT NULL
) ENGINE=Aria
PARTITION BY HASH (c1)
PARTITIONS 3
slave-bin.000001	761	Gtid	1	803	BEGIN GTID 0-1-3
slave-bin.000001	803	Query	1	963	use `s3_test_b64d55591c8911ebb00c525400123456`; INSERT INTO t1 VALUE (1), (2), (101), (102), (201), (202)
slave-bin.000001	963	Query	1	1072	COMMIT
slave-bin.000001	1072	Gtid	1	1114	GTID 0-1-4
slave-bin.000001	1114	Query	1	1241	use `s3_test_b64d55591c8911ebb00c525400123456`; ALTER TABLE t1 ENGINE=S3
slave-bin.000001	1241	Gtid	1	1283	GTID 0-1-5
slave-bin.000001	1283	Query	1	1427	use `s3_test_b64d55591c8911ebb00c525400123456`; set @@sql_if_exists=1; ALTER TABLE t1 ADD PARTITION PARTITIONS 6
slave-bin.000001	1427	Gtid	1	1469	GTID 0-1-6
slave-bin.000001	1469	Query	1	1603	use `s3_test_b64d55591c8911ebb00c525400123456`; set @@sql_if_exists=1; ALTER TABLE t1 ADD COLUMN c INT
slave-bin.000001	1603	Gtid	1	1645	GTID 0-1-7
slave-bin.000001	1645	Query	1	1799	use `s3_test_b64d55591c8911ebb00c525400123456`; DROP TABLE IF EXISTS `t1` /* generated by server */
slave-bin.000001	1799	Gtid	1	1841	GTID 0-1-8
slave-bin.000001	1841	Query	1	2153	use `s3_test_b64d55591c8911ebb00c525400123456`; CREATE TABLE t1 (
c1 int primary key,
c2 int DEFAULT NULL
) ENGINE=InnoDB
PARTITION BY RANGE (c1)
(PARTITION p1 VALUES LESS THAN (200),
PARTITION p2 VALUES LESS THAN (300),
PARTITION p3 VALUES LESS THAN (400))
slave-bin.000001	2153	Gtid	1	2195	BEGIN GTID 0-1-9
slave-bin.000001	2195	Query	1	2351	use `s3_test_b64d55591c8911ebb00c525400123456`; insert into t1 select seq*100,seq*100 from seq_1_to_3
slave-bin.000001	2351	Xid	1	2382	COMMIT /* xid=185 */
slave-bin.000001	2382	Gtid	1	2424	GTID 0-1-10
slave-bin.000001	2424	Query	1	2551	use `s3_test_b64d55591c8911ebb00c525400123456`; alter table t1 engine=S3
slave-bin.000001	2551	Gtid	1	2593	GTID 0-1-11
slave-bin.000001	2593	Query	1	2762	use `s3_test_b64d55591c8911ebb00c525400123456`; set @@sql_if_exists=1; ALTER TABLE t1 ADD PARTITION (PARTITION p4 VALUES LESS THAN (500))
slave-bin.000001	2762	Gtid	1	2804	GTID 0-1-12
slave-bin.000001	2804	Query	1	2958	use `s3_test_b64d55591c8911ebb00c525400123456`; DROP TABLE IF EXISTS `t1` /* generated by server */
slave-bin.000001	2958	Gtid	1	3000	GTID 0-1-13
slave-bin.000001	3000	Query	1	3312	use `s3_test_b64d55591c8911ebb00c525400123456`; CREATE TABLE t2 (
c1 int primary key,
c2 int DEFAULT NULL
) ENGINE=InnoDB
PARTITION BY RANGE (c1)
(PARTITION p1 VALUES LESS THAN (200),
PARTITION p2 VALUES LESS THAN (300),
PARTITION p3 VALUES LESS THAN (400))
slave-bin.000001	3312	Gtid	1	3354	BEGIN GTID 0-1-14
slave-bin.000001	3354	Query	1	3510	use `s3_test_b64d55591c8911ebb00c525400123456`; insert into t2 select seq*100,seq*100 from seq_1_to_3
slave-bin.000001	3510	Xid	1	3541	COMMIT /* xid=203 */
slave-bin.000001	3541	Gtid	1	3583	GTID 0-1-15
slave-bin.000001	3583	Query	1	3710	use `s3_test_b64d55591c8911ebb00c525400123456`; alter table t2 engine=S3
slave-bin.000001	3710	Gtid	1	3752	GTID 0-1-16
slave-bin.000001	3752	Query	1	3876	use `s3_test_b64d55591c8911ebb00c525400123456`; set @@sql_if_exists=1; rename table t2 to t1
 
**** SHOW RELAYLOG EVENTS on default ****
relaylog_name = 'slave-relay-bin.000004'
SHOW RELAYLOG EVENTS IN 'slave-relay-bin.000004';
Log_name	Pos	Event_type	Server_id	End_log_pos	Info
slave-relay-bin.000004	4	Format_desc	2	256	Server ver: 10.5.7-MariaDB-log, Binlog ver: 4
slave-relay-bin.000004	256	Rotate	1	0	master-bin.000001;pos=2763
slave-relay-bin.000004	304	Format_desc	1	0	Server ver: 10.5.7-MariaDB-log, Binlog ver: 4
slave-relay-bin.000004	556	Gtid	1	2805	GTID 0-1-12
slave-relay-bin.000004	598	Query	1	2959	use `s3_test_b64d55591c8911ebb00c525400123456`; DROP TABLE IF EXISTS `t1` /* generated by server */
slave-relay-bin.000004	752	Gtid	1	3001	GTID 0-1-13
slave-relay-bin.000004	794	Query	1	3313	use `s3_test_b64d55591c8911ebb00c525400123456`; CREATE TABLE t2 (
c1 int primary key,
c2 int DEFAULT NULL
) ENGINE=InnoDB
PARTITION BY RANGE (c1)
(PARTITION p1 VALUES LESS THAN (200),
PARTITION p2 VALUES LESS THAN (300),
PARTITION p3 VALUES LESS THAN (400))
slave-relay-bin.000004	1106	Gtid	1	3355	BEGIN GTID 0-1-14
slave-relay-bin.000004	1148	Query	1	3511	use `s3_test_b64d55591c8911ebb00c525400123456`; insert into t2 select seq*100,seq*100 from seq_1_to_3
slave-relay-bin.000004	1304	Xid	1	3542	COMMIT /* xid=181 */
slave-relay-bin.000004	1335	Gtid	1	3584	GTID 0-1-15
slave-relay-bin.000004	1377	Query	1	3711	use `s3_test_b64d55591c8911ebb00c525400123456`; alter table t2 engine=S3
slave-relay-bin.000004	1504	Gtid	1	3753	GTID 0-1-16
slave-relay-bin.000004	1546	Query	1	3877	use `s3_test_b64d55591c8911ebb00c525400123456`; set @@sql_if_exists=1; rename table t2 to t1
slave-relay-bin.000004	1670	Gtid	1	3919	GTID 0-1-17
slave-relay-bin.000004	1712	Query	1	4054	use `s3_test_b64d55591c8911ebb00c525400123456`; set @@sql_if_exists=1; alter table t1 drop partition p1
connection default;
 
 
 
 - saving '/dev/shm/var/log/s3.replication_partition-innodb,mix/' to '/dev/shm/var/log/s3.replication_partition-innodb,mix/'
worker[1] > Restart  - using different config file
worker[1] > Restart  - using different config file
***Warnings generated in error logs during shutdown after running tests: s3.replication_partition
 
2020-11-01 21:32:15 0 [Warning] Plugin 'S3' is of maturity level gamma while the server is stable
2020-11-01 21:32:15 0 [Warning] Plugin 'S3' is of maturity level gamma while the server is stable
2020-11-01 21:32:17 14 [Warning] Slave: Got error 130 "Incorrect file format" from storage engine S3 Error_code: 1030



 Comments   
Comment by Michael Widenius [ 2021-02-08 ]

A few of the failures was because of missing sync_slave_to_master in
the test suite.

However, the biggest reason for most failures was that in case of
ALTER PARTITION the master writes the query to the binary log before
it has updated the .frm and .par files. This causes a problem for an
S3 slave as it will start execute the ALTER PARTITION but get old .frm and
.par files from S3 which causes "open table" to fail, either with an error
or in some case with a crash.

Fixed by moving the writing of the binary log after the ALTER PARTITION
has been committed. This change should not affect any normal database usage.
The main drawback is that if the master fails at the very last stage of
ALTER PARTITION, the table may have changed without the change being
written to the binary log. Before this change, the binary log could
in this case have an ALTER PARTITION statement that was never commited
on the master, which is probably even worse.

This should be fixed as part of the atomic DDL project in 10.6

Comment by Julien Fritsch [ 2023-12-05 ]

Automated message:
----------------------------
Since this issue has not been updated since 6 weeks, it's time to move it back to Stalled.

Comment by JiraAutomate [ 2023-12-05 ]

Automated message:
----------------------------
Since this issue has not been updated since 6 weeks, it's time to move it back to Stalled.

Generated at Thu Feb 08 09:27:20 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.