[MDEV-27777] Some Galera tests fail on FreeBSD with "Could not find target log" Created: 2022-02-08  Updated: 2022-03-22  Resolved: 2022-02-22

Status: Closed
Project: MariaDB Server
Component/s: Galera SST, Platform FreeBSD, Tests, wsrep
Affects Version/s: 10.5
Fix Version/s: 10.9.0, 10.2.44, 10.3.35, 10.4.25, 10.5.16, 10.6.8, 10.7.4, 10.8.3

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Julius Goryavsky
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Blocks
is blocked by MDEV-27524 Incorrect binlogs after Galera SST us... Closed
Problem/Incident
is caused by MDEV-27524 Incorrect binlogs after Galera SST us... Closed
Relates
relates to MDEV-27737 Wsrep SST scripts not working on FreeBSD Closed

 Description   

The output below is from a 10.5-based branch. I didn't check if it's reproducible on other versions.

galera.rpl_row_annotate 'innodb'         [ fail ]
        Test ended at 2022-02-08 15:24:13
 
CURRENT_TEST: galera.rpl_row_annotate
mysqltest: At line 40: query 'SHOW BINLOG EVENTS IN 'mysqld-bin.000001' FROM $start_pos' failed: 1220: Error when executing command SHOW BINLOG EVENTS: Could not find target log

Indeed,

$ ls -l var/log/galera.rpl_row_annotate-innodb/mysqld.2/data/mysqld-bin*
-rw-rw----  1 ec2-user  ec2-user   373 Feb  8 15:24 var/log/galera.rpl_row_annotate-innodb/mysqld.2/data/mysqld-bin.000002
-rw-rw----  1 ec2-user  ec2-user  1289 Feb  8 15:24 var/log/galera.rpl_row_annotate-innodb/mysqld.2/data/mysqld-bin.000003
-rw-rw----  1 ec2-user  ec2-user    20 Feb  8 15:24 var/log/galera.rpl_row_annotate-innodb/mysqld.2/data/mysqld-bin.index

There is the file for the 1st node:

$ ls -l var/log/galera.rpl_row_annotate-innodb/mysqld.1/data/mysqld-bin*
-rw-rw----  1 ec2-user  ec2-user  1289 Feb  8 15:24 var/log/galera.rpl_row_annotate-innodb/mysqld.1/data/mysqld-bin.000001
-rw-rw----  1 ec2-user  ec2-user    20 Feb  8 15:24 var/log/galera.rpl_row_annotate-innodb/mysqld.1/data/mysqld-bin.index

From the error log on the 2nd node:

2022-02-08 15:24:09 0 [Note] WSREP: (279ad9e2-8591, 'tcp://0.0.0.0:16005') turning message relay requesting off
WSREP_SST: [INFO] Extracting binlog files: (20220208 15:24:09.N)
x mysqld-bin.000002
WSREP_SST: [INFO] Joiner cleanup: rsync PID=60582, stunnel PID=0 (20220208 15:24:09.N)



 Comments   
Comment by Elena Stepanova [ 2022-02-08 ]

The problem doesn't seem to be caused by SST transferring wrong logs.

At the very beginning of the test it does RESET MASTER on both nodes.
On the second node, it doesn't work. If I add show binary logs or check mysqld-bin.index file, it shows

+show binary logs;
+Log_name	File_size
+mysqld-bin.000003	329
 RESET MASTER;
+show binary logs;
+Log_name	File_size
+mysqld-bin.000003	329

Adding a sleep after RESET MASTER doesn't help, so it's not a race condition (or if it is, it's longer than 10 seconds which I tried).
And it cannot be coming from the donor, it's rsync and anyway, the donor only has 2 logs (or even one if I do RESET MASTER on the first node before).

But at the same time RESET MASTER on the 1st node seems to work as expected.

Comment by Elena Stepanova [ 2022-02-08 ]

Further experiments show that RESET does work, but it resets to 000003 instead of the expected 000001:

SET GLOBAL wsrep_on=0;
SHOW BINARY LOGS;
Log_name	File_size
mysqld-bin.000003	329
FLUSH BINARY LOGS;
FLUSH BINARY LOGS;
SHOW BINARY LOGS;
Log_name	File_size
mysqld-bin.000003	377
mysqld-bin.000004	421
mysqld-bin.000005	373
RESET MASTER;
SHOW BINARY LOGS;
Log_name	File_size
mysqld-bin.000003	329
SET GLOBAL wsrep_on=1;

Comment by Andrei Elkin [ 2022-02-08 ]

This is a wsrep case. The failure to reset is caused by a left-over 'mysqld-bin.000002' that survived
in the data directory at the end of the following galera_cluster.inc heading part:

--source include/galera_init.inc
--source include/have_innodb.inc
--source include/galera_wait_ready.inc
 
SHOW BINARY LOGS;               # => two files 1,2
SET GLOBAL wsrep_on=0;
RESET MASTER;

the block's RESET MASTER. SET GLOBAL wsrep_on=0 must be doing something to confuse the following RM.
In normal replication RM is of course tested extensively to never produce such effect.

Comment by Jan Lindström (Inactive) [ 2022-02-09 ]

I can't repeat this on my laptop using Ubuntu

 # On node_2
 connection node_2;
 SET GLOBAL wsrep_on=OFF;
+show binary logs;
+Log_name	File_size
+mysqld-bin.000002	373
+mysqld-bin.000003	329
 RESET MASTER;
+show binary logs;
+Log_name	File_size
+mysqld-bin.000001	329
 SET GLOBAL wsrep_on=ON;
 # On node_1
 connection node_1;
 SET GLOBAL wsrep_on=OFF;
+show binary logs;
+Log_name	File_size
+mysqld-bin.000001	377
+mysqld-bin.000002	373
 RESET MASTER;
+show binary logs;
+Log_name	File_size
+mysqld-bin.000001	329
 SET GLOBAL wsrep_on=ON;

Comment by Jan Lindström (Inactive) [ 2022-02-10 ]

It is now clear that this is because wsrep_sst_rsync script. Binlog files are incorrectly transfered to node_2

c2-user@freebsd:~/MariaDBEnterprise/mysql-test/var/mysqld.2/data $ more mysqld-bin.index
./mysqld-bin.000003
ec2-user@freebsd:~/MariaDBEnterprise/mysql-test/var $ more mysqld.1/data/mysqld-bin.index
./mysqld-bin.000001
./mysqld-bin.000002

Binlog file mysqld-bin.000002 is transferred to joiner (node_2) but mysqld-bin.index has incorrect contents. RESET MASTER works as designed and finds 02 and creates next available i.e. 03.

Comment by Julius Goryavsky [ 2022-02-18 ]

jplindst The content of the changes regarding binlogs is common to all systems and has already been fixed in MDEV-27524 + MDEV-27740, as for FreeBSD-specific changes, they have been added in the commit for MDEV-27737 (but I will rename it to MDEV-27777 since it will be more accurate).

Comment by Julius Goryavsky [ 2022-02-22 ]

Fixed, https://github.com/MariaDB/server/commit/fac9224d85e879ff06e549c18446375d850ef83a

Generated at Thu Feb 08 09:55:30 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.