Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-27777

Some Galera tests fail on FreeBSD with "Could not find target log"

Details

    Description

      The output below is from a 10.5-based branch. I didn't check if it's reproducible on other versions.

      galera.rpl_row_annotate 'innodb'         [ fail ]
              Test ended at 2022-02-08 15:24:13
       
      CURRENT_TEST: galera.rpl_row_annotate
      mysqltest: At line 40: query 'SHOW BINLOG EVENTS IN 'mysqld-bin.000001' FROM $start_pos' failed: 1220: Error when executing command SHOW BINLOG EVENTS: Could not find target log
      

      Indeed,

      $ ls -l var/log/galera.rpl_row_annotate-innodb/mysqld.2/data/mysqld-bin*
      -rw-rw----  1 ec2-user  ec2-user   373 Feb  8 15:24 var/log/galera.rpl_row_annotate-innodb/mysqld.2/data/mysqld-bin.000002
      -rw-rw----  1 ec2-user  ec2-user  1289 Feb  8 15:24 var/log/galera.rpl_row_annotate-innodb/mysqld.2/data/mysqld-bin.000003
      -rw-rw----  1 ec2-user  ec2-user    20 Feb  8 15:24 var/log/galera.rpl_row_annotate-innodb/mysqld.2/data/mysqld-bin.index
      

      There is the file for the 1st node:

      $ ls -l var/log/galera.rpl_row_annotate-innodb/mysqld.1/data/mysqld-bin*
      -rw-rw----  1 ec2-user  ec2-user  1289 Feb  8 15:24 var/log/galera.rpl_row_annotate-innodb/mysqld.1/data/mysqld-bin.000001
      -rw-rw----  1 ec2-user  ec2-user    20 Feb  8 15:24 var/log/galera.rpl_row_annotate-innodb/mysqld.1/data/mysqld-bin.index
      

      From the error log on the 2nd node:

      2022-02-08 15:24:09 0 [Note] WSREP: (279ad9e2-8591, 'tcp://0.0.0.0:16005') turning message relay requesting off
      WSREP_SST: [INFO] Extracting binlog files: (20220208 15:24:09.N)
      x mysqld-bin.000002
      WSREP_SST: [INFO] Joiner cleanup: rsync PID=60582, stunnel PID=0 (20220208 15:24:09.N)
      

      Attachments

        Issue Links

          Activity

            elenst Elena Stepanova added a comment - - edited

            The problem doesn't seem to be caused by SST transferring wrong logs.

            At the very beginning of the test it does RESET MASTER on both nodes.
            On the second node, it doesn't work. If I add show binary logs or check mysqld-bin.index file, it shows

            +show binary logs;
            +Log_name	File_size
            +mysqld-bin.000003	329
             RESET MASTER;
            +show binary logs;
            +Log_name	File_size
            +mysqld-bin.000003	329
            

            Adding a sleep after RESET MASTER doesn't help, so it's not a race condition (or if it is, it's longer than 10 seconds which I tried).
            And it cannot be coming from the donor, it's rsync and anyway, the donor only has 2 logs (or even one if I do RESET MASTER on the first node before).

            But at the same time RESET MASTER on the 1st node seems to work as expected.

            elenst Elena Stepanova added a comment - - edited The problem doesn't seem to be caused by SST transferring wrong logs. At the very beginning of the test it does RESET MASTER on both nodes. On the second node, it doesn't work. If I add show binary logs or check mysqld-bin.index file, it shows +show binary logs; +Log_name File_size +mysqld-bin.000003 329 RESET MASTER; +show binary logs; +Log_name File_size +mysqld-bin.000003 329 Adding a sleep after RESET MASTER doesn't help, so it's not a race condition (or if it is, it's longer than 10 seconds which I tried). And it cannot be coming from the donor, it's rsync and anyway, the donor only has 2 logs (or even one if I do RESET MASTER on the first node before). But at the same time RESET MASTER on the 1st node seems to work as expected.

            Further experiments show that RESET does work, but it resets to 000003 instead of the expected 000001:

            SET GLOBAL wsrep_on=0;
            SHOW BINARY LOGS;
            Log_name	File_size
            mysqld-bin.000003	329
            FLUSH BINARY LOGS;
            FLUSH BINARY LOGS;
            SHOW BINARY LOGS;
            Log_name	File_size
            mysqld-bin.000003	377
            mysqld-bin.000004	421
            mysqld-bin.000005	373
            RESET MASTER;
            SHOW BINARY LOGS;
            Log_name	File_size
            mysqld-bin.000003	329
            SET GLOBAL wsrep_on=1;
            

            elenst Elena Stepanova added a comment - Further experiments show that RESET does work, but it resets to 000003 instead of the expected 000001: SET GLOBAL wsrep_on=0; SHOW BINARY LOGS; Log_name File_size mysqld-bin.000003 329 FLUSH BINARY LOGS; FLUSH BINARY LOGS; SHOW BINARY LOGS; Log_name File_size mysqld-bin.000003 377 mysqld-bin.000004 421 mysqld-bin.000005 373 RESET MASTER; SHOW BINARY LOGS; Log_name File_size mysqld-bin.000003 329 SET GLOBAL wsrep_on=1;
            Elkin Andrei Elkin added a comment - - edited

            This is a wsrep case. The failure to reset is caused by a left-over 'mysqld-bin.000002' that survived
            in the data directory at the end of the following galera_cluster.inc heading part:

            --source include/galera_init.inc
            --source include/have_innodb.inc
            --source include/galera_wait_ready.inc
             
            SHOW BINARY LOGS;               # => two files 1,2
            SET GLOBAL wsrep_on=0;
            RESET MASTER;
            
            

            the block's RESET MASTER. SET GLOBAL wsrep_on=0 must be doing something to confuse the following RM.
            In normal replication RM is of course tested extensively to never produce such effect.

            Elkin Andrei Elkin added a comment - - edited This is a wsrep case. The failure to reset is caused by a left-over 'mysqld-bin.000002' that survived in the data directory at the end of the following galera_cluster.inc heading part: --source include/galera_init.inc --source include/have_innodb.inc --source include/galera_wait_ready.inc   SHOW BINARY LOGS; # => two files 1 , 2 SET GLOBAL wsrep_on= 0 ; RESET MASTER; the block's RESET MASTER. SET GLOBAL wsrep_on=0 must be doing something to confuse the following RM. In normal replication RM is of course tested extensively to never produce such effect.

            I can't repeat this on my laptop using Ubuntu

             # On node_2
             connection node_2;
             SET GLOBAL wsrep_on=OFF;
            +show binary logs;
            +Log_name	File_size
            +mysqld-bin.000002	373
            +mysqld-bin.000003	329
             RESET MASTER;
            +show binary logs;
            +Log_name	File_size
            +mysqld-bin.000001	329
             SET GLOBAL wsrep_on=ON;
             # On node_1
             connection node_1;
             SET GLOBAL wsrep_on=OFF;
            +show binary logs;
            +Log_name	File_size
            +mysqld-bin.000001	377
            +mysqld-bin.000002	373
             RESET MASTER;
            +show binary logs;
            +Log_name	File_size
            +mysqld-bin.000001	329
             SET GLOBAL wsrep_on=ON;
            

            jplindst Jan Lindström (Inactive) added a comment - I can't repeat this on my laptop using Ubuntu # On node_2 connection node_2; SET GLOBAL wsrep_on=OFF; +show binary logs; +Log_name File_size +mysqld-bin.000002 373 +mysqld-bin.000003 329 RESET MASTER; +show binary logs; +Log_name File_size +mysqld-bin.000001 329 SET GLOBAL wsrep_on=ON; # On node_1 connection node_1; SET GLOBAL wsrep_on=OFF; +show binary logs; +Log_name File_size +mysqld-bin.000001 377 +mysqld-bin.000002 373 RESET MASTER; +show binary logs; +Log_name File_size +mysqld-bin.000001 329 SET GLOBAL wsrep_on=ON;

            It is now clear that this is because wsrep_sst_rsync script. Binlog files are incorrectly transfered to node_2

            c2-user@freebsd:~/MariaDBEnterprise/mysql-test/var/mysqld.2/data $ more mysqld-bin.index
            ./mysqld-bin.000003
            ec2-user@freebsd:~/MariaDBEnterprise/mysql-test/var $ more mysqld.1/data/mysqld-bin.index
            ./mysqld-bin.000001
            ./mysqld-bin.000002
            

            Binlog file mysqld-bin.000002 is transferred to joiner (node_2) but mysqld-bin.index has incorrect contents. RESET MASTER works as designed and finds 02 and creates next available i.e. 03.

            jplindst Jan Lindström (Inactive) added a comment - It is now clear that this is because wsrep_sst_rsync script. Binlog files are incorrectly transfered to node_2 c2-user@freebsd:~/MariaDBEnterprise/mysql-test/var/mysqld.2/data $ more mysqld-bin.index ./mysqld-bin.000003 ec2-user@freebsd:~/MariaDBEnterprise/mysql-test/var $ more mysqld.1/data/mysqld-bin.index ./mysqld-bin.000001 ./mysqld-bin.000002 Binlog file mysqld-bin.000002 is transferred to joiner (node_2) but mysqld-bin.index has incorrect contents. RESET MASTER works as designed and finds 02 and creates next available i.e. 03.

            jplindst The content of the changes regarding binlogs is common to all systems and has already been fixed in MDEV-27524 + MDEV-27740, as for FreeBSD-specific changes, they have been added in the commit for MDEV-27737 (but I will rename it to MDEV-27777 since it will be more accurate).

            sysprg Julius Goryavsky added a comment - jplindst The content of the changes regarding binlogs is common to all systems and has already been fixed in MDEV-27524 + MDEV-27740 , as for FreeBSD-specific changes, they have been added in the commit for MDEV-27737 (but I will rename it to MDEV-27777 since it will be more accurate).
            sysprg Julius Goryavsky added a comment - Fixed, https://github.com/MariaDB/server/commit/fac9224d85e879ff06e549c18446375d850ef83a

            People

              sysprg Julius Goryavsky
              elenst Elena Stepanova
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.