The test appears to always be retry-pass. The result file expects the error message to contain ‘master-bin.000001’, but instead the error log shows the IO thread with an empty string or 'FIRST' for the log file:
2022-12-12 10:53:42 241 [Note] Slave I/O thread: Start asynchronous replication to master 'root@127.0.0.1:16540' in log '' at position 4
|
2022-12-12 10:53:42 242 [Note] Slave SQL thread initialized, starting replication in log 'FIRST' at position 0, relay log './slave-relay-bin.000001' position: 4
|
2022-12-12 10:53:42 241 [Note] Slave I/O thread: connected to master 'root@127.0.0.1:16540',replication started in log 'FIRST' at position 4
|
2022-12-12 10:53:42 241 [ERROR] Error reading packet from server: could not find next log; the first event '.' at 4, the last event read from 'master-bin.000002' at 514, the last byte read from 'master-bin.000002' at 562. (server_errno=1236)
|
This failure was first seen with: https://buildbot.mariadb.org/#/builders/121/builds/6504 and is only seen on 10.10 and higher.
The error log for the failure shows (attached mysqld.2.err.30) that the CHANGE MASTER statement at the start of the test does not contain the filename for the New state:
2022-12-12 10:53:42 239 [Note] 'CHANGE MASTER TO executed'. Previous state master_host='127.0.0.1', master_port='16540', master_log_file='master-bin.000001', master_log_pos='329'. New state master_host='127.0.0.1', master_port='16540', master_log_file='', master_log_pos='4'.
|
2022-12-12 10:53:42 239 [Note] Previous Using_Gtid=Slave_Pos. New Using_Gtid=No
|
before this, when the slave is stopped so CM can be changed to USING_GTID=NO, the IO thread knows of the master log file but not the SQL thread (let’s call this Case A):
2022-12-12 10:53:42 235 [Note] Previous Using_Gtid=Slave_Pos. New Using_Gtid=Slave_Pos
|
2022-12-12 10:53:42 237 [Note] Slave I/O thread: Start asynchronous replication to master 'root@127.0.0.1:16540' in log '' at position 4
|
2022-12-12 10:53:42 237 [Note] Slave I/O thread: connected to master 'root@127.0.0.1:16540',replication starts at GTID position ''
|
2022-12-12 10:53:42 238 [Note] Slave SQL thread initialized, starting replication in log 'FIRST' at position 4, relay log './slave-relay-bin.000001' position: 4; GTID position ''
|
2022-12-12 10:53:42 238 [Note] Slave SQL thread exiting, replication stopped in log 'FIRST' at position 4; GTID position '', master: 127.0.0.1:16540
|
2022-12-12 10:53:42 237 [Note] Slave I/O thread exiting, read up to log 'master-bin.000001', position 329; GTID position , master 127.0.0.1:16540
|
while for the passing test in another build (https://buildbot.mariadb.org/#/builders/208) the error log shows that the CHANGE MASTER statement contains the master_log_file filename:
2023-04-14 16:13:00 239 [Note] 'CHANGE MASTER TO executed'. Previous state master_host='127.0.0.1', master_port='16240', master_log_file='master-bin.000001', master_log_pos='329'. New state master_host='127.0.0.1', master_port='16240', master_log_file='master-bin.000001', master_log_pos='329'.
|
2023-04-14 16:13:00 239 [Note] Previous Using_Gtid=Slave_Pos. New Using_Gtid=No
|
and both IO and SQL threads know of the log file:
2023-04-14 16:13:00 235 [Note] Previous Using_Gtid=Slave_Pos. New Using_Gtid=Slave_Pos
|
2023-04-14 16:13:00 237 [Note] Slave I/O thread: Start asynchronous replication to master 'root@127.0.0.1:16240' in log '' at position 4
|
2023-04-14 16:13:00 238 [Note] Slave SQL thread initialized, starting replication in log 'FIRST' at position 4, relay log './slave-relay-bin.000001' position: 4; GTID position ''
|
2023-04-14 16:13:00 237 [Note] Slave I/O thread: connected to master 'root@127.0.0.1:16240',replication starts at GTID position ''
|
2023-04-14 16:13:00 238 [Note] Error reading relay log event: slave SQL thread was killed
|
2023-04-14 16:13:00 238 [Note] Slave SQL thread exiting, replication stopped in log 'master-bin.000001' at position 329; GTID position '', master: 127.0.0.1:16240
|
2023-04-14 16:13:00 237 [Note] Slave I/O thread exiting, read up to log 'master-bin.000001', position 329; GTID position , master 127.0.0.1:16240
|
Notably, the error log for the failure does NOT show the message “Error reading relay log event: slave SQL thread was killed”
Elkin I am unsure if there is a timing bug with Case A to investigate. Otherwise, the expected error message can simply be made more general so that it would accept either ‘.’ or ‘master-bin.000001’, or the "USING_GTID=NO" could be removed as the 1236 error is still achievable when the index file is manually changed.
The test appears to always be retry-pass. The result file expects the error message to contain ‘master-bin.000001’, but instead the error log shows the IO thread with an empty string or 'FIRST' for the log file:
2022-12-12 10:53:42 241 [Note] Slave I/O thread: Start asynchronous replication to master 'root@127.0.0.1:16540' in log '' at position 4
2022-12-12 10:53:42 242 [Note] Slave SQL thread initialized, starting replication in log 'FIRST' at position 0, relay log './slave-relay-bin.000001' position: 4
2022-12-12 10:53:42 241 [Note] Slave I/O thread: connected to master 'root@127.0.0.1:16540',replication started in log 'FIRST' at position 4
2022-12-12 10:53:42 241 [ERROR] Error reading packet from server: could not find next log; the first event '.' at 4, the last event read from 'master-bin.000002' at 514, the last byte read from 'master-bin.000002' at 562. (server_errno=1236)
This failure was first seen with: https://buildbot.mariadb.org/#/builders/121/builds/6504 and is only seen on 10.10 and higher.
The error log for the failure shows (attached mysqld.2.err.30) that the CHANGE MASTER statement at the start of the test does not contain the filename for the New state:
2022-12-12 10:53:42 239 [Note] 'CHANGE MASTER TO executed'. Previous state master_host='127.0.0.1', master_port='16540', master_log_file='master-bin.000001', master_log_pos='329'. New state master_host='127.0.0.1', master_port='16540', master_log_file='', master_log_pos='4'.
2022-12-12 10:53:42 239 [Note] Previous Using_Gtid=Slave_Pos. New Using_Gtid=No
before this, when the slave is stopped so CM can be changed to USING_GTID=NO, the IO thread knows of the master log file but not the SQL thread (let’s call this Case A):
2022-12-12 10:53:42 235 [Note] Previous Using_Gtid=Slave_Pos. New Using_Gtid=Slave_Pos
2022-12-12 10:53:42 237 [Note] Slave I/O thread: Start asynchronous replication to master 'root@127.0.0.1:16540' in log '' at position 4
2022-12-12 10:53:42 237 [Note] Slave I/O thread: connected to master 'root@127.0.0.1:16540',replication starts at GTID position ''
2022-12-12 10:53:42 238 [Note] Slave SQL thread initialized, starting replication in log 'FIRST' at position 4, relay log './slave-relay-bin.000001' position: 4; GTID position ''
2022-12-12 10:53:42 238 [Note] Slave SQL thread exiting, replication stopped in log 'FIRST' at position 4; GTID position '', master: 127.0.0.1:16540
2022-12-12 10:53:42 237 [Note] Slave I/O thread exiting, read up to log 'master-bin.000001', position 329; GTID position , master 127.0.0.1:16540
while for the passing test in another build (https://buildbot.mariadb.org/#/builders/208) the error log shows that the CHANGE MASTER statement contains the master_log_file filename:
2023-04-14 16:13:00 239 [Note] 'CHANGE MASTER TO executed'. Previous state master_host='127.0.0.1', master_port='16240', master_log_file='master-bin.000001', master_log_pos='329'. New state master_host='127.0.0.1', master_port='16240', master_log_file='master-bin.000001', master_log_pos='329'.
2023-04-14 16:13:00 239 [Note] Previous Using_Gtid=Slave_Pos. New Using_Gtid=No
and both IO and SQL threads know of the log file:
2023-04-14 16:13:00 235 [Note] Previous Using_Gtid=Slave_Pos. New Using_Gtid=Slave_Pos
2023-04-14 16:13:00 237 [Note] Slave I/O thread: Start asynchronous replication to master 'root@127.0.0.1:16240' in log '' at position 4
2023-04-14 16:13:00 238 [Note] Slave SQL thread initialized, starting replication in log 'FIRST' at position 4, relay log './slave-relay-bin.000001' position: 4; GTID position ''
2023-04-14 16:13:00 237 [Note] Slave I/O thread: connected to master 'root@127.0.0.1:16240',replication starts at GTID position ''
2023-04-14 16:13:00 238 [Note] Error reading relay log event: slave SQL thread was killed
2023-04-14 16:13:00 238 [Note] Slave SQL thread exiting, replication stopped in log 'master-bin.000001' at position 329; GTID position '', master: 127.0.0.1:16240
2023-04-14 16:13:00 237 [Note] Slave I/O thread exiting, read up to log 'master-bin.000001', position 329; GTID position , master 127.0.0.1:16240
Notably, the error log for the failure does NOT show the message “Error reading relay log event: slave SQL thread was killed”
Elkin I am unsure if there is a timing bug with Case A to investigate. Otherwise, the expected error message can simply be made more general so that it would accept either ‘.’ or ‘master-bin.000001’, or the "USING_GTID=NO" could be removed as the 1236 error is still achievable when the index file is manually changed.