Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-36105

Binlog expiry broken when slave_connections_needed_for_purge > 0

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Not a Bug
    • 11.4.5
    • N/A
    • Replication
    • None

    Description

      If there are binary logs to be purged at primary server start up within the value of expire_logs_days and slave_connections_needed_for_purge > 0 then the server appears to not purge logs regardless of whether a connecting replica node has processed the event or not. Binlog expiry never occurs after the note is shown.

      Steps to replicate:

      • Set up replication between two servers
      • Create some events on the primary and flush logs

        CREATE DATABASE testing;
        CREATE TABLE testing.t1 (id int);
        INSERT INTO TABLE testing.t1 VALUES(1);
        FLUSH BINARY LOGS;
        

      • Ensure the replica has processed all events on the primary. The binary log file should match the output from show master status on the primary:

        mariadb -e "show all slaves status\G"|grep \ Master_Log_File
        

      • Set binlog_expire_logs_seconds=10 on the primary and restart

      At this point the primary should show a similar note in the error log:

      2025-02-17 11:23:34 0 [Note] Binary log 'rocky9-MariaDB1-bin.000001' is not purged because less than 'slave_connections_needed_for_purge' slaves have processed it
      

      Binary logs will not ever be expired after this note is shown, though a manual PURGE BINARY LOGS will work.

      Attachments

        Activity

          I can reproduce it, for example like this:

          --source include/master-slave.inc
           
          CREATE DATABASE testing;
          CREATE TABLE testing.t1 (id varchar(8000));
          INSERT INTO testing.t1 VALUES('a');
          FLUSH BINARY LOGS;
           
          --sync_slave_with_master
           
          --connection master
          SHOW BINARY LOGS;
          --let $rpl_server_number= 1
          --let $rpl_server_parameters= --binlog_expire_logs_seconds=2 --slave_connections_needed_for_purge=1
          --sleep 2
          --source include/rpl_restart_server.inc
           
          --connection master
          SHOW BINARY LOGS;
           
          # ...
          

          11.4.5

          include/rpl_restart_server.inc [server_number=1 parameters: --binlog_expire_logs_seconds=2 --slave_connections_needed_for_purge=1]
          connection master;
          SHOW BINARY LOGS;
          Log_name	File_size
          master-bin.000001	888
          master-bin.000002	410
          master-bin.000003	343
          

          However, I don't see it to be much of a problem. It's understandable that if by the time the log is to be purged (in this case upon the primary restart) the replica is not connected (in this case because it takes time for the replica to reconnect after the primary restart), the moment will be missed; but it's just a temporary situation, the log won't stay there forever, it will be purged automatically next time, be it due to the normal rotation (when 000003 size is exceeded and it rotates to 000004), or upon FLUSH BINARY LOGS.
          For example:

          --source include/master-slave.inc
           
          CREATE DATABASE testing;
          CREATE TABLE testing.t1 (id varchar(8000));
          INSERT INTO testing.t1 VALUES('a');
          FLUSH BINARY LOGS;
           
          --sync_slave_with_master
           
          --connection master
          SHOW BINARY LOGS;
          --let $rpl_server_number= 1
          --let $rpl_server_parameters= --binlog_expire_logs_seconds=2 --slave_connections_needed_for_purge=1
          --sleep 2
          --source include/rpl_restart_server.inc
           
          --connection master
          SET binlog_format=ROW;
           
          --sync_slave_with_master
           
          --connection master
          SHOW BINARY LOGS;
          INSERT INTO testing.t1 VALUES (REPEAT('a',8000));
          SHOW BINARY LOGS;
           
          DROP DATABASE testing;
           
          --source include/rpl_end.inc
          

          include/rpl_restart_server.inc [server_number=1 parameters: --binlog_expire_logs_seconds=2 --slave_connections_needed_for_purge=1]
          connection master;
          SET binlog_format=ROW;
          connection slave;
          connection master;
          SHOW BINARY LOGS;
          Log_name	File_size
          master-bin.000001	888
          master-bin.000002	410
          master-bin.000003	343
          INSERT INTO testing.t1 VALUES (REPEAT('a',8000));
          SHOW BINARY LOGS;
          Log_name	File_size
          master-bin.000003	8663
          master-bin.000004	387
          DROP DATABASE testing;
          

          So yes, 000001 is preserved after restart as the replica hasn't connected yet, but after the connection is re-established and normal replication resumed, it gets purged as usual.

          Assigning to bnestere for further evaluation.

          elenst Elena Stepanova added a comment - I can reproduce it, for example like this: --source include/master-slave.inc   CREATE DATABASE testing; CREATE TABLE testing.t1 (id varchar (8000)); INSERT INTO testing.t1 VALUES ( 'a' ); FLUSH BINARY LOGS;   --sync_slave_with_master   --connection master SHOW BINARY LOGS; --let $rpl_server_number= 1 --let $rpl_server_parameters= --binlog_expire_logs_seconds=2 --slave_connections_needed_for_purge=1 --sleep 2 --source include/rpl_restart_server.inc   --connection master SHOW BINARY LOGS;   # ... 11.4.5 include/rpl_restart_server.inc [server_number=1 parameters: --binlog_expire_logs_seconds=2 --slave_connections_needed_for_purge=1] connection master; SHOW BINARY LOGS; Log_name File_size master-bin.000001 888 master-bin.000002 410 master-bin.000003 343 However, I don't see it to be much of a problem. It's understandable that if by the time the log is to be purged (in this case upon the primary restart) the replica is not connected (in this case because it takes time for the replica to reconnect after the primary restart), the moment will be missed; but it's just a temporary situation, the log won't stay there forever, it will be purged automatically next time, be it due to the normal rotation (when 000003 size is exceeded and it rotates to 000004), or upon FLUSH BINARY LOGS. For example: --source include/master-slave.inc   CREATE DATABASE testing; CREATE TABLE testing.t1 (id varchar (8000)); INSERT INTO testing.t1 VALUES ( 'a' ); FLUSH BINARY LOGS;   --sync_slave_with_master   --connection master SHOW BINARY LOGS; --let $rpl_server_number= 1 --let $rpl_server_parameters= --binlog_expire_logs_seconds=2 --slave_connections_needed_for_purge=1 --sleep 2 --source include/rpl_restart_server.inc   --connection master SET binlog_format=ROW;   --sync_slave_with_master   --connection master SHOW BINARY LOGS; INSERT INTO testing.t1 VALUES (REPEAT( 'a' ,8000)); SHOW BINARY LOGS;   DROP DATABASE testing;   --source include/rpl_end.inc include/rpl_restart_server.inc [server_number=1 parameters: --binlog_expire_logs_seconds=2 --slave_connections_needed_for_purge=1] connection master; SET binlog_format=ROW; connection slave; connection master; SHOW BINARY LOGS; Log_name File_size master-bin.000001 888 master-bin.000002 410 master-bin.000003 343 INSERT INTO testing.t1 VALUES (REPEAT('a',8000)); SHOW BINARY LOGS; Log_name File_size master-bin.000003 8663 master-bin.000004 387 DROP DATABASE testing; So yes, 000001 is preserved after restart as the replica hasn't connected yet, but after the connection is re-established and normal replication resumed, it gets purged as usual. Assigning to bnestere for further evaluation.
          Elkin Andrei Elkin added a comment - - edited

          As elenst shows by the test there's no issue in purging except it was not intuitively clear that show-binary-log would list something like
          master-bin.000003...8663 at time the slave has processed that log.

          However the presence of master-bin.000003 in the active logs is justified by the docs with

          Possible purges happen at startup and at binary log rotation.

          That is even though the slave is in sync, in order that specific log gets purged master has to rotate binlog e.g through explicit FLUSH LOGS.

          Furthermore, the description did not mention notes thrown
          Binary log 'binlog-m1.000001' is not purged because...
          when slave_connections_needed_for_purge changes its value.
          This apparently needs documenting, and I suggest we close the ticket with that.

          Elkin Andrei Elkin added a comment - - edited As elenst shows by the test there's no issue in purging except it was not intuitively clear that show-binary-log would list something like master-bin.000003...8663 at time the slave has processed that log. However the presence of master-bin.000003 in the active logs is justified by the docs with Possible purges happen at startup and at binary log rotation. That is even though the slave is in sync, in order that specific log gets purged master has to rotate binlog e.g through explicit FLUSH LOGS . Furthermore, the description did not mention notes thrown Binary log 'binlog-m1.000001' is not purged because... when slave_connections_needed_for_purge changes its value. This apparently needs documenting, and I suggest we close the ticket with that.
          Elkin Andrei Elkin added a comment -

          The lack of an expected purge is explained. In the same comments a related subject of emerged NOTE in the master error log is handled in KB's slave_connections_needed_for_purge.

          Elkin Andrei Elkin added a comment - The lack of an expected purge is explained . In the same comments a related subject of emerged NOTE in the master error log is handled in KB's slave_connections_needed_for_purge .

          People

            Elkin Andrei Elkin
            Ali.maria Alasdair Haswell
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.