Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-24087

s3.replication_partition fails in buildbot wiht replication failure

Details

    Description

      http://buildbot.askmonty.org/buildbot/builders/kvm-rpm-fedora32-amd64/builds/621/steps/mtr-s3/logs/stdio

      10.5 8e1e2856f2523c225a81840059e93fa9f61fbacf

      s3.replication_partition 'innodb,mix'    [ fail ]
              Test ended at 2020-11-01 21:32:17
       
      CURRENT_TEST: s3.replication_partition
      analyze: sync_with_master
      mysqltest: At line 106: sync_slave_with_master failed: 'select master_pos_wait('master-bin.000001', 4054, 300, '')' returned NULL indicating slave SQL thread failure
       
      The result from queries just before the failure was:
      < snip >
      select sum(c1) from t1;
      ERROR 42S02: Table 'database.t1' doesn't exist
      start slave;
      connection master;
      #
      # Check altering partitioned table to S3 and back
      # Checks also rename partitoned table and drop partition
      #
      CREATE TABLE t2 (
      c1 int primary key,
      c2 int DEFAULT NULL
      ) ENGINE=InnoDB
      PARTITION BY RANGE (c1)
      (PARTITION p1 VALUES LESS THAN (200),
      PARTITION p2 VALUES LESS THAN (300),
      PARTITION p3 VALUES LESS THAN (400));
      insert into t2 select seq*100,seq*100 from seq_1_to_3;
      alter table t2 engine=S3;
      rename table t2 to t1;
      alter table t1 drop partition p1;
       
      More results from queries before failure can be found in /dev/shm/var/log/replication_partition.log
       
       
       == /dev/shm/var/tmp/analyze-sync_with_master-mysqld.1.err ==
       
      ############################## default ##############################
       
      **** SHOW WARNINGS on default ****
      SHOW WARNINGS;
      Level	Code	Message
       
      **** SELECT replication-related variables on default ****
      SELECT NOW(), @@SERVER_ID;
      NOW()	@@SERVER_ID
      2020-11-01 21:32:17	1
       
      **** SHOW SLAVE STATUS on default ****
      SHOW SLAVE STATUS;
       
      **** SHOW MASTER STATUS on default ****
      SHOW MASTER STATUS;
      File	master-bin.000001
      Position	4054
      Binlog_Do_DB	
      Binlog_Ignore_DB	
       
      **** SHOW SLAVE HOSTS on default ****
      SHOW SLAVE HOSTS;
      Server_id	2
      Host	127.0.0.1
      Port	16001
      Master_id	1
       
      **** SHOW PROCESSLIST on default ****
      SHOW PROCESSLIST;
      Id	User	Host	db	Command	Time	State	Info	Progress
      12	root	localhost:55242	NULL	Binlog Dump	0	Master has sent all binlog to slave; waiting for more updates	NULL	0.000
      13	root	localhost	NULL	Query	0	starting	SHOW PROCESSLIST	0.000
       
      **** SHOW BINARY LOGS on default ****
      SHOW BINARY LOGS;
      Log_name	File_size
      master-bin.000001	4054
       
      **** SHOW BINLOG EVENTS on default ****
      binlog_name = 'master-bin.000001'
      SHOW BINLOG EVENTS IN 'master-bin.000001';
      Log_name	Pos	Event_type	Server_id	End_log_pos	Info
      master-bin.000001	4	Format_desc	1	256	Server ver: 10.5.7-MariaDB-log, Binlog ver: 4
      master-bin.000001	256	Gtid_list	1	285	[]
      master-bin.000001	285	Binlog_checkpoint	1	329	master-bin.000001
      master-bin.000001	329	Gtid	1	371	GTID 0-1-1
      master-bin.000001	371	Query	1	530	create database s3_test_b64d55591c8911ebb00c525400123456
      master-bin.000001	530	Gtid	1	572	GTID 0-1-2
      master-bin.000001	572	Query	1	762	use `s3_test_b64d55591c8911ebb00c525400123456`; CREATE TABLE t1 (
      c1 INT DEFAULT NULL
      ) ENGINE=Aria
      PARTITION BY HASH (c1)
      PARTITIONS 3
      master-bin.000001	762	Gtid	1	804	BEGIN GTID 0-1-3
      master-bin.000001	804	Query	1	964	use `s3_test_b64d55591c8911ebb00c525400123456`; INSERT INTO t1 VALUE (1), (2), (101), (102), (201), (202)
      master-bin.000001	964	Query	1	1073	COMMIT
      master-bin.000001	1073	Gtid	1	1115	GTID 0-1-4
      master-bin.000001	1115	Query	1	1242	use `s3_test_b64d55591c8911ebb00c525400123456`; ALTER TABLE t1 ENGINE=S3
      master-bin.000001	1242	Gtid	1	1284	GTID 0-1-5
      master-bin.000001	1284	Query	1	1428	use `s3_test_b64d55591c8911ebb00c525400123456`; set @@sql_if_exists=1; ALTER TABLE t1 ADD PARTITION PARTITIONS 6
      master-bin.000001	1428	Gtid	1	1470	GTID 0-1-6
      master-bin.000001	1470	Query	1	1604	use `s3_test_b64d55591c8911ebb00c525400123456`; set @@sql_if_exists=1; ALTER TABLE t1 ADD COLUMN c INT
      master-bin.000001	1604	Gtid	1	1646	GTID 0-1-7
      master-bin.000001	1646	Query	1	1800	use `s3_test_b64d55591c8911ebb00c525400123456`; DROP TABLE IF EXISTS `t1` /* generated by server */
      master-bin.000001	1800	Gtid	1	1842	GTID 0-1-8
      master-bin.000001	1842	Query	1	2154	use `s3_test_b64d55591c8911ebb00c525400123456`; CREATE TABLE t1 (
      c1 int primary key,
      c2 int DEFAULT NULL
      ) ENGINE=InnoDB
      PARTITION BY RANGE (c1)
      (PARTITION p1 VALUES LESS THAN (200),
      PARTITION p2 VALUES LESS THAN (300),
      PARTITION p3 VALUES LESS THAN (400))
      master-bin.000001	2154	Gtid	1	2196	BEGIN GTID 0-1-9
      master-bin.000001	2196	Query	1	2352	use `s3_test_b64d55591c8911ebb00c525400123456`; insert into t1 select seq*100,seq*100 from seq_1_to_3
      master-bin.000001	2352	Xid	1	2383	COMMIT /* xid=155 */
      master-bin.000001	2383	Gtid	1	2425	GTID 0-1-10
      master-bin.000001	2425	Query	1	2552	use `s3_test_b64d55591c8911ebb00c525400123456`; alter table t1 engine=S3
      master-bin.000001	2552	Gtid	1	2594	GTID 0-1-11
      master-bin.000001	2594	Query	1	2763	use `s3_test_b64d55591c8911ebb00c525400123456`; set @@sql_if_exists=1; ALTER TABLE t1 ADD PARTITION (PARTITION p4 VALUES LESS THAN (500))
      master-bin.000001	2763	Gtid	1	2805	GTID 0-1-12
      master-bin.000001	2805	Query	1	2959	use `s3_test_b64d55591c8911ebb00c525400123456`; DROP TABLE IF EXISTS `t1` /* generated by server */
      master-bin.000001	2959	Gtid	1	3001	GTID 0-1-13
      master-bin.000001	3001	Query	1	3313	use `s3_test_b64d55591c8911ebb00c525400123456`; CREATE TABLE t2 (
      c1 int primary key,
      c2 int DEFAULT NULL
      ) ENGINE=InnoDB
      PARTITION BY RANGE (c1)
      (PARTITION p1 VALUES LESS THAN (200),
      PARTITION p2 VALUES LESS THAN (300),
      PARTITION p3 VALUES LESS THAN (400))
      master-bin.000001	3313	Gtid	1	3355	BEGIN GTID 0-1-14
      master-bin.000001	3355	Query	1	3511	use `s3_test_b64d55591c8911ebb00c525400123456`; insert into t2 select seq*100,seq*100 from seq_1_to_3
      master-bin.000001	3511	Xid	1	3542	COMMIT /* xid=181 */
      master-bin.000001	3542	Gtid	1	3584	GTID 0-1-15
      master-bin.000001	3584	Query	1	3711	use `s3_test_b64d55591c8911ebb00c525400123456`; alter table t2 engine=S3
      master-bin.000001	3711	Gtid	1	3753	GTID 0-1-16
      master-bin.000001	3753	Query	1	3877	use `s3_test_b64d55591c8911ebb00c525400123456`; set @@sql_if_exists=1; rename table t2 to t1
      master-bin.000001	3877	Gtid	1	3919	GTID 0-1-17
      master-bin.000001	3919	Query	1	4054	use `s3_test_b64d55591c8911ebb00c525400123456`; set @@sql_if_exists=1; alter table t1 drop partition p1
       
      **** SHOW RELAYLOG EVENTS on default ****
      relaylog_name = 'No such row'
      SHOW RELAYLOG EVENTS IN 'No such row';
      Log_name	Pos	Event_type	Server_id	End_log_pos	Info
      connection default;
       
       == /dev/shm/var/tmp/analyze-sync_with_master-mysqld.2.err ==
       
      ############################## default ##############################
       
      **** SHOW WARNINGS on default ****
      SHOW WARNINGS;
      Level	Code	Message
       
      **** SELECT replication-related variables on default ****
      SELECT NOW(), @@SERVER_ID;
      NOW()	@@SERVER_ID
      2020-11-01 21:32:17	2
       
      **** SHOW SLAVE STATUS on default ****
      SHOW SLAVE STATUS;
      Slave_IO_State	Waiting for master to send event
      Master_Host	127.0.0.1
      Master_User	root
      Master_Port	16000
      Connect_Retry	1
      Master_Log_File	master-bin.000001
      Read_Master_Log_Pos	4054
      Relay_Log_File	slave-relay-bin.000004
      Relay_Log_Pos	1670
      Relay_Master_Log_File	master-bin.000001
      Slave_IO_Running	Yes
      Slave_SQL_Running	No
      Replicate_Do_DB	
      Replicate_Ignore_DB	
      Replicate_Do_Table	
      Replicate_Ignore_Table	
      Replicate_Wild_Do_Table	
      Replicate_Wild_Ignore_Table	
      Last_Errno	1033
      Last_Error	Error 'Incorrect information in file: './s3_test_b64d55591c8911ebb00c525400123456/t1.frm'' on query. Default database: 's3_test_b64d55591c8911ebb00c525400123456'. Query: 'alter table t1 drop partition p1'
      Skip_Counter	0
      Exec_Master_Log_Pos	3877
      Relay_Log_Space	2667
      Until_Condition	None
      Until_Log_File	
      Until_Log_Pos	0
      Master_SSL_Allowed	No
      Master_SSL_CA_File	
      Master_SSL_CA_Path	
      Master_SSL_Cert	
      Master_SSL_Cipher	
      Master_SSL_Key	
      Seconds_Behind_Master	NULL
      Master_SSL_Verify_Server_Cert	No
      Last_IO_Errno	0
      Last_IO_Error	
      Last_SQL_Errno	1033
      Last_SQL_Error	Error 'Incorrect information in file: './s3_test_b64d55591c8911ebb00c525400123456/t1.frm'' on query. Default database: 's3_test_b64d55591c8911ebb00c525400123456'. Query: 'alter table t1 drop partition p1'
      Replicate_Ignore_Server_Ids	
      Master_Server_Id	1
      Master_SSL_Crl	
      Master_SSL_Crlpath	
      Using_Gtid	No
      Gtid_IO_Pos	
      Replicate_Do_Domain_Ids	
      Replicate_Ignore_Domain_Ids	
      Parallel_Mode	optimistic
      SQL_Delay	0
      SQL_Remaining_Delay	NULL
      Slave_SQL_Running_State	
      Slave_DDL_Groups	14
      Slave_Non_Transactional_Groups	1
      Slave_Transactional_Groups	2
       
      **** SHOW MASTER STATUS on default ****
      SHOW MASTER STATUS;
      File	slave-bin.000001
      Position	3876
      Binlog_Do_DB	
      Binlog_Ignore_DB	
       
      **** SHOW SLAVE HOSTS on default ****
      SHOW SLAVE HOSTS;
       
      **** SHOW PROCESSLIST on default ****
      SHOW PROCESSLIST;
      Id	User	Host	db	Command	Time	State	Info	Progress
      13	system user		NULL	Slave_IO	0	Waiting for master to send event	NULL	0.000
      15	root	localhost	NULL	Query	0	starting	SHOW PROCESSLIST	0.000
       
      **** SHOW BINARY LOGS on default ****
      SHOW BINARY LOGS;
      Log_name	File_size
      slave-bin.000001	3876
       
      **** SHOW BINLOG EVENTS on default ****
      binlog_name = 'slave-bin.000001'
      SHOW BINLOG EVENTS IN 'slave-bin.000001';
      Log_name	Pos	Event_type	Server_id	End_log_pos	Info
      slave-bin.000001	4	Format_desc	2	256	Server ver: 10.5.7-MariaDB-log, Binlog ver: 4
      slave-bin.000001	256	Gtid_list	2	285	[]
      slave-bin.000001	285	Binlog_checkpoint	2	328	slave-bin.000001
      slave-bin.000001	328	Gtid	1	370	GTID 0-1-1
      slave-bin.000001	370	Query	1	529	create database s3_test_b64d55591c8911ebb00c525400123456
      slave-bin.000001	529	Gtid	1	571	GTID 0-1-2
      slave-bin.000001	571	Query	1	761	use `s3_test_b64d55591c8911ebb00c525400123456`; CREATE TABLE t1 (
      c1 INT DEFAULT NULL
      ) ENGINE=Aria
      PARTITION BY HASH (c1)
      PARTITIONS 3
      slave-bin.000001	761	Gtid	1	803	BEGIN GTID 0-1-3
      slave-bin.000001	803	Query	1	963	use `s3_test_b64d55591c8911ebb00c525400123456`; INSERT INTO t1 VALUE (1), (2), (101), (102), (201), (202)
      slave-bin.000001	963	Query	1	1072	COMMIT
      slave-bin.000001	1072	Gtid	1	1114	GTID 0-1-4
      slave-bin.000001	1114	Query	1	1241	use `s3_test_b64d55591c8911ebb00c525400123456`; ALTER TABLE t1 ENGINE=S3
      slave-bin.000001	1241	Gtid	1	1283	GTID 0-1-5
      slave-bin.000001	1283	Query	1	1427	use `s3_test_b64d55591c8911ebb00c525400123456`; set @@sql_if_exists=1; ALTER TABLE t1 ADD PARTITION PARTITIONS 6
      slave-bin.000001	1427	Gtid	1	1469	GTID 0-1-6
      slave-bin.000001	1469	Query	1	1603	use `s3_test_b64d55591c8911ebb00c525400123456`; set @@sql_if_exists=1; ALTER TABLE t1 ADD COLUMN c INT
      slave-bin.000001	1603	Gtid	1	1645	GTID 0-1-7
      slave-bin.000001	1645	Query	1	1799	use `s3_test_b64d55591c8911ebb00c525400123456`; DROP TABLE IF EXISTS `t1` /* generated by server */
      slave-bin.000001	1799	Gtid	1	1841	GTID 0-1-8
      slave-bin.000001	1841	Query	1	2153	use `s3_test_b64d55591c8911ebb00c525400123456`; CREATE TABLE t1 (
      c1 int primary key,
      c2 int DEFAULT NULL
      ) ENGINE=InnoDB
      PARTITION BY RANGE (c1)
      (PARTITION p1 VALUES LESS THAN (200),
      PARTITION p2 VALUES LESS THAN (300),
      PARTITION p3 VALUES LESS THAN (400))
      slave-bin.000001	2153	Gtid	1	2195	BEGIN GTID 0-1-9
      slave-bin.000001	2195	Query	1	2351	use `s3_test_b64d55591c8911ebb00c525400123456`; insert into t1 select seq*100,seq*100 from seq_1_to_3
      slave-bin.000001	2351	Xid	1	2382	COMMIT /* xid=185 */
      slave-bin.000001	2382	Gtid	1	2424	GTID 0-1-10
      slave-bin.000001	2424	Query	1	2551	use `s3_test_b64d55591c8911ebb00c525400123456`; alter table t1 engine=S3
      slave-bin.000001	2551	Gtid	1	2593	GTID 0-1-11
      slave-bin.000001	2593	Query	1	2762	use `s3_test_b64d55591c8911ebb00c525400123456`; set @@sql_if_exists=1; ALTER TABLE t1 ADD PARTITION (PARTITION p4 VALUES LESS THAN (500))
      slave-bin.000001	2762	Gtid	1	2804	GTID 0-1-12
      slave-bin.000001	2804	Query	1	2958	use `s3_test_b64d55591c8911ebb00c525400123456`; DROP TABLE IF EXISTS `t1` /* generated by server */
      slave-bin.000001	2958	Gtid	1	3000	GTID 0-1-13
      slave-bin.000001	3000	Query	1	3312	use `s3_test_b64d55591c8911ebb00c525400123456`; CREATE TABLE t2 (
      c1 int primary key,
      c2 int DEFAULT NULL
      ) ENGINE=InnoDB
      PARTITION BY RANGE (c1)
      (PARTITION p1 VALUES LESS THAN (200),
      PARTITION p2 VALUES LESS THAN (300),
      PARTITION p3 VALUES LESS THAN (400))
      slave-bin.000001	3312	Gtid	1	3354	BEGIN GTID 0-1-14
      slave-bin.000001	3354	Query	1	3510	use `s3_test_b64d55591c8911ebb00c525400123456`; insert into t2 select seq*100,seq*100 from seq_1_to_3
      slave-bin.000001	3510	Xid	1	3541	COMMIT /* xid=203 */
      slave-bin.000001	3541	Gtid	1	3583	GTID 0-1-15
      slave-bin.000001	3583	Query	1	3710	use `s3_test_b64d55591c8911ebb00c525400123456`; alter table t2 engine=S3
      slave-bin.000001	3710	Gtid	1	3752	GTID 0-1-16
      slave-bin.000001	3752	Query	1	3876	use `s3_test_b64d55591c8911ebb00c525400123456`; set @@sql_if_exists=1; rename table t2 to t1
       
      **** SHOW RELAYLOG EVENTS on default ****
      relaylog_name = 'slave-relay-bin.000004'
      SHOW RELAYLOG EVENTS IN 'slave-relay-bin.000004';
      Log_name	Pos	Event_type	Server_id	End_log_pos	Info
      slave-relay-bin.000004	4	Format_desc	2	256	Server ver: 10.5.7-MariaDB-log, Binlog ver: 4
      slave-relay-bin.000004	256	Rotate	1	0	master-bin.000001;pos=2763
      slave-relay-bin.000004	304	Format_desc	1	0	Server ver: 10.5.7-MariaDB-log, Binlog ver: 4
      slave-relay-bin.000004	556	Gtid	1	2805	GTID 0-1-12
      slave-relay-bin.000004	598	Query	1	2959	use `s3_test_b64d55591c8911ebb00c525400123456`; DROP TABLE IF EXISTS `t1` /* generated by server */
      slave-relay-bin.000004	752	Gtid	1	3001	GTID 0-1-13
      slave-relay-bin.000004	794	Query	1	3313	use `s3_test_b64d55591c8911ebb00c525400123456`; CREATE TABLE t2 (
      c1 int primary key,
      c2 int DEFAULT NULL
      ) ENGINE=InnoDB
      PARTITION BY RANGE (c1)
      (PARTITION p1 VALUES LESS THAN (200),
      PARTITION p2 VALUES LESS THAN (300),
      PARTITION p3 VALUES LESS THAN (400))
      slave-relay-bin.000004	1106	Gtid	1	3355	BEGIN GTID 0-1-14
      slave-relay-bin.000004	1148	Query	1	3511	use `s3_test_b64d55591c8911ebb00c525400123456`; insert into t2 select seq*100,seq*100 from seq_1_to_3
      slave-relay-bin.000004	1304	Xid	1	3542	COMMIT /* xid=181 */
      slave-relay-bin.000004	1335	Gtid	1	3584	GTID 0-1-15
      slave-relay-bin.000004	1377	Query	1	3711	use `s3_test_b64d55591c8911ebb00c525400123456`; alter table t2 engine=S3
      slave-relay-bin.000004	1504	Gtid	1	3753	GTID 0-1-16
      slave-relay-bin.000004	1546	Query	1	3877	use `s3_test_b64d55591c8911ebb00c525400123456`; set @@sql_if_exists=1; rename table t2 to t1
      slave-relay-bin.000004	1670	Gtid	1	3919	GTID 0-1-17
      slave-relay-bin.000004	1712	Query	1	4054	use `s3_test_b64d55591c8911ebb00c525400123456`; set @@sql_if_exists=1; alter table t1 drop partition p1
      connection default;
       
       
       
       - saving '/dev/shm/var/log/s3.replication_partition-innodb,mix/' to '/dev/shm/var/log/s3.replication_partition-innodb,mix/'
      worker[1] > Restart  - using different config file
      worker[1] > Restart  - using different config file
      ***Warnings generated in error logs during shutdown after running tests: s3.replication_partition
       
      2020-11-01 21:32:15 0 [Warning] Plugin 'S3' is of maturity level gamma while the server is stable
      2020-11-01 21:32:15 0 [Warning] Plugin 'S3' is of maturity level gamma while the server is stable
      2020-11-01 21:32:17 14 [Warning] Slave: Got error 130 "Incorrect file format" from storage engine S3 Error_code: 1030
      

      Attachments

        Activity

          monty Michael Widenius added a comment - - edited

          A few of the failures was because of missing sync_slave_to_master in
          the test suite.

          However, the biggest reason for most failures was that in case of
          ALTER PARTITION the master writes the query to the binary log before
          it has updated the .frm and .par files. This causes a problem for an
          S3 slave as it will start execute the ALTER PARTITION but get old .frm and
          .par files from S3 which causes "open table" to fail, either with an error
          or in some case with a crash.

          Fixed by moving the writing of the binary log after the ALTER PARTITION
          has been committed. This change should not affect any normal database usage.
          The main drawback is that if the master fails at the very last stage of
          ALTER PARTITION, the table may have changed without the change being
          written to the binary log. Before this change, the binary log could
          in this case have an ALTER PARTITION statement that was never commited
          on the master, which is probably even worse.

          This should be fixed as part of the atomic DDL project in 10.6

          monty Michael Widenius added a comment - - edited A few of the failures was because of missing sync_slave_to_master in the test suite. However, the biggest reason for most failures was that in case of ALTER PARTITION the master writes the query to the binary log before it has updated the .frm and .par files. This causes a problem for an S3 slave as it will start execute the ALTER PARTITION but get old .frm and .par files from S3 which causes "open table" to fail, either with an error or in some case with a crash. Fixed by moving the writing of the binary log after the ALTER PARTITION has been committed. This change should not affect any normal database usage. The main drawback is that if the master fails at the very last stage of ALTER PARTITION, the table may have changed without the change being written to the binary log. Before this change, the binary log could in this case have an ALTER PARTITION statement that was never commited on the master, which is probably even worse. This should be fixed as part of the atomic DDL project in 10.6

          Automated message:
          ----------------------------
          Since this issue has not been updated since 6 weeks, it's time to move it back to Stalled.

          julien.fritsch Julien Fritsch added a comment - Automated message: ---------------------------- Since this issue has not been updated since 6 weeks, it's time to move it back to Stalled.
          JIraAutomate JiraAutomate added a comment -

          Automated message:
          ----------------------------
          Since this issue has not been updated since 6 weeks, it's time to move it back to Stalled.

          JIraAutomate JiraAutomate added a comment - Automated message: ---------------------------- Since this issue has not been updated since 6 weeks, it's time to move it back to Stalled.

          People

            monty Michael Widenius
            elenst Elena Stepanova
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.