[MDEV-11491] binlog_encryption.rpl_checksum fails sporadically in buildbot Created: 2016-12-06  Updated: 2021-11-04  Resolved: 2016-12-08

Status: Closed
Project: MariaDB Server
Component/s: Tests
Affects Version/s: 10.1
Fix Version/s: 10.1.20

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Elena Stepanova
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Relates
relates to MDEV-7069 Fix buildbot failures in main server ... Stalled
relates to MDEV-16951 binlog_encryption.rpl_checksum failed... Closed

 Description   

http://buildbot.askmonty.org/buildbot/builders/p8-rhel7-bintar-debug/builds/1900/steps/test/logs/stdio

binlog_encryption.rpl_checksum 'mix'     w4 [ fail ]
        Test ended at 2016-12-05 13:58:46
 
CURRENT_TEST: binlog_encryption.rpl_checksum
--- /home/buildbot/maria-slave/power8-vlp03-bintar-debug/build/mysql-test/suite/binlog_encryption/rpl_checksum.result	2016-12-05 13:18:06.104205208 -0500
+++ /home/buildbot/maria-slave/power8-vlp03-bintar-debug/build/mysql-test/suite/binlog_encryption/rpl_checksum.reject	2016-12-05 13:58:45.454421184 -0500
@@ -71,7 +71,7 @@
 set @@global.debug_dbug='d,simulate_slave_unaware_checksum';
 start slave;
 include/wait_for_slave_io_error.inc [errno=1236]
-Last_IO_Error = 'Got fatal error 1236 from master when reading data from binary log: 'Slave can not handle replication events with the checksum that master is configured to log; the first event 'master-bin.000009' at 404, the last event read from 'master-bin.000010' at 4, the last byte read from 'master-bin.000010' at 249.''
+Last_IO_Error = 'Got fatal error 1236 from master when reading data from binary log: 'Slave can not handle replication events with the checksum that master is configured to log; the first event 'master-bin.000009' at 444, the last event read from 'master-bin.000010' at 4, the last byte read from 'master-bin.000010' at 249.''
 select count(*) as zero from t1;
 zero
 0
 
mysqltest: Result content mismatch



 Comments   
Comment by Elena Stepanova [ 2016-12-08 ]

I can get this error quite reliably with this crude injection:

diff --git a/mysql-test/extra/rpl_tests/rpl_checksum.inc b/mysql-test/extra/rpl_tests/rpl_checksum.inc
index 14664fd..44d3776 100644
--- a/mysql-test/extra/rpl_tests/rpl_checksum.inc
+++ b/mysql-test/extra/rpl_tests/rpl_checksum.inc
@@ -15,6 +15,9 @@ call mtr.add_suppression('Slave can not handle replication events with the check
 call mtr.add_suppression('Replication event checksum verification failed');
 # due to C failure simulation
 call mtr.add_suppression('Relay log write failure: could not queue event from master');
+--disable_query_log
+set @@global.debug_dbug='d,mdev11491';
+--enable_query_log
 call mtr.add_suppression('Master is configured to log replication events with checksum, but will not send such events to slaves that cannot process them');
 
 # A. read/write access to the global vars:
@@ -88,6 +91,10 @@ create table t1 (a int);
 flush logs;
 flush logs;
 flush logs;
+--sleep 10
+--disable_query_log
+set @@global.debug_dbug='';
+--enable_query_log
 
 sync_slave_with_master;
 #connection slave;
diff --git a/sql/log.cc b/sql/log.cc
index 45ab5c8..6e0a87b 100644
--- a/sql/log.cc
+++ b/sql/log.cc
@@ -9630,6 +9630,7 @@ binlog_background_thread(void *arg __attribute__((unused)))
       DEBUG_SYNC(thd, "binlog_background_thread_before_mark_xid_done");
       /* Grab next pointer first, as mark_xid_done() may free the element. */
       next= queue->next_in_queue;
+      DBUG_EXECUTE_IF("mdev11491", { my_sleep(1000000); };);
       mysql_bin_log.mark_xid_done(queue->binlog_id, true);
       queue= next;
 

So, it's a race condition for really slow systems (disks?) when mark_xid_done is considerably delayed.
We need to ensure that it happens before we rotate to the log which for which the position is later printed in the error message.

Comment by Elena Stepanova [ 2016-12-08 ]

https://github.com/MariaDB/server/commit/870d7589c6c4596b0f1a641add5071469e2c94a7

Generated at Thu Feb 08 07:50:21 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.