[MDEV-36105] Binlog expiry broken when slave_connections_needed_for_purge > 0 - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Not a Bug
Affects Version/s: 11.4.5
Fix Version/s: N/A
Component/s: Replication
Labels:
None

Description

If there are binary logs to be purged at primary server start up within the value of expire_logs_days and slave_connections_needed_for_purge > 0 then the server appears to not purge logs regardless of whether a connecting replica node has processed the event or not. Binlog expiry never occurs after the note is shown.

Steps to replicate:

Set up replication between two servers

Create some events on the primary and flush logs

CREATE DATABASE testing;

CREATE TABLE testing.t1 (id int);

INSERT INTO TABLE testing.t1 VALUES(1);

FLUSH BINARY LOGS;

Ensure the replica has processed all events on the primary. The binary log file should match the output from show master status on the primary:
mariadb -e "show all slaves status\G"|grep \ Master_Log_File
Set binlog_expire_logs_seconds=10 on the primary and restart

At this point the primary should show a similar note in the error log:

2025-02-17 11:23:34 0 [Note] Binary log 'rocky9-MariaDB1-bin.000001' is not purged because less than 'slave_connections_needed_for_purge' slaves have processed it

Binary logs will not ever be expired after this note is shown, though a manual PURGE BINARY LOGS will work.

Attachments

Activity

Ascending order - Click to sort in descending order

Elena Stepanova added a comment - 2025-02-17 14:12

I can reproduce it, for example like this:

--source include/master-slave.inc

CREATE DATABASE testing;

CREATE TABLE testing.t1 (id varchar(8000));

INSERT INTO testing.t1 VALUES('a');

FLUSH BINARY LOGS;

--sync_slave_with_master

--connection master

SHOW BINARY LOGS;

--let $rpl_server_number= 1

--let $rpl_server_parameters= --binlog_expire_logs_seconds=2 --slave_connections_needed_for_purge=1

--sleep 2

--source include/rpl_restart_server.inc

--connection master

SHOW BINARY LOGS;

# ...

11.4.5
include/rpl_restart_server.inc [server_number=1 parameters: --binlog_expire_logs_seconds=2 --slave_connections_needed_for_purge=1]
connection master;
SHOW BINARY LOGS;
Log_name File_size
master-bin.000001 888
master-bin.000002 410
master-bin.000003 343

However, I don't see it to be much of a problem. It's understandable that if by the time the log is to be purged (in this case upon the primary restart) the replica is not connected (in this case because it takes time for the replica to reconnect after the primary restart), the moment will be missed; but it's just a temporary situation, the log won't stay there forever, it will be purged automatically next time, be it due to the normal rotation (when 000003 size is exceeded and it rotates to 000004), or upon FLUSH BINARY LOGS.
For example:

--source include/master-slave.inc

CREATE DATABASE testing;

CREATE TABLE testing.t1 (id varchar(8000));

INSERT INTO testing.t1 VALUES('a');

FLUSH BINARY LOGS;

--sync_slave_with_master

--connection master

SHOW BINARY LOGS;

--let $rpl_server_number= 1

--let $rpl_server_parameters= --binlog_expire_logs_seconds=2 --slave_connections_needed_for_purge=1

--sleep 2

--source include/rpl_restart_server.inc

--connection master

SET binlog_format=ROW;

--sync_slave_with_master

--connection master

SHOW BINARY LOGS;

INSERT INTO testing.t1 VALUES (REPEAT('a',8000));

SHOW BINARY LOGS;

DROP DATABASE testing;

--source include/rpl_end.inc

include/rpl_restart_server.inc [server_number=1 parameters: --binlog_expire_logs_seconds=2 --slave_connections_needed_for_purge=1]

connection master;

SET binlog_format=ROW;

connection slave;

connection master;

SHOW BINARY LOGS;

Log_name	File_size

master-bin.000001	888

master-bin.000002	410

master-bin.000003	343

INSERT INTO testing.t1 VALUES (REPEAT('a',8000));

SHOW BINARY LOGS;

Log_name	File_size

master-bin.000003	8663

master-bin.000004	387

DROP DATABASE testing;

So yes, 000001 is preserved after restart as the replica hasn't connected yet, but after the connection is re-established and normal replication resumed, it gets purged as usual.

Assigning to bnestere for further evaluation.

Elena Stepanova added a comment - 2025-02-17 14:12 I can reproduce it, for example like this: --source include/master-slave.inc CREATE DATABASE testing; CREATE TABLE testing.t1 (id varchar (8000)); INSERT INTO testing.t1 VALUES ( 'a' ); FLUSH BINARY LOGS; --sync_slave_with_master --connection master SHOW BINARY LOGS; --let $rpl_server_number= 1 --let $rpl_server_parameters= --binlog_expire_logs_seconds=2 --slave_connections_needed_for_purge=1 --sleep 2 --source include/rpl_restart_server.inc --connection master SHOW BINARY LOGS; # ... 11.4.5 include/rpl_restart_server.inc [server_number=1 parameters: --binlog_expire_logs_seconds=2 --slave_connections_needed_for_purge=1] connection master; SHOW BINARY LOGS; Log_name File_size master-bin.000001 888 master-bin.000002 410 master-bin.000003 343 However, I don't see it to be much of a problem. It's understandable that if by the time the log is to be purged (in this case upon the primary restart) the replica is not connected (in this case because it takes time for the replica to reconnect after the primary restart), the moment will be missed; but it's just a temporary situation, the log won't stay there forever, it will be purged automatically next time, be it due to the normal rotation (when 000003 size is exceeded and it rotates to 000004), or upon FLUSH BINARY LOGS. For example: --source include/master-slave.inc CREATE DATABASE testing; CREATE TABLE testing.t1 (id varchar (8000)); INSERT INTO testing.t1 VALUES ( 'a' ); FLUSH BINARY LOGS; --sync_slave_with_master --connection master SHOW BINARY LOGS; --let $rpl_server_number= 1 --let $rpl_server_parameters= --binlog_expire_logs_seconds=2 --slave_connections_needed_for_purge=1 --sleep 2 --source include/rpl_restart_server.inc --connection master SET binlog_format=ROW; --sync_slave_with_master --connection master SHOW BINARY LOGS; INSERT INTO testing.t1 VALUES (REPEAT( 'a' ,8000)); SHOW BINARY LOGS; DROP DATABASE testing; --source include/rpl_end.inc include/rpl_restart_server.inc [server_number=1 parameters: --binlog_expire_logs_seconds=2 --slave_connections_needed_for_purge=1] connection master; SET binlog_format=ROW; connection slave; connection master; SHOW BINARY LOGS; Log_name File_size master-bin.000001 888 master-bin.000002 410 master-bin.000003 343 INSERT INTO testing.t1 VALUES (REPEAT('a',8000)); SHOW BINARY LOGS; Log_name File_size master-bin.000003 8663 master-bin.000004 387 DROP DATABASE testing; So yes, 000001 is preserved after restart as the replica hasn't connected yet, but after the connection is re-established and normal replication resumed, it gets purged as usual. Assigning to bnestere for further evaluation.

Andrei Elkin added a comment - 2025-02-17 15:23 - edited

As elenst shows by the test there's no issue in purging except it was not intuitively clear that show-binary-log would list something like
master-bin.000003...8663 at time the slave has processed that log.

However the presence of master-bin.000003 in the active logs is justified by the docs with

Possible purges happen at startup and at binary log rotation.

That is even though the slave is in sync, in order that specific log gets purged master has to rotate binlog e.g through explicit FLUSH LOGS.

Furthermore, the description did not mention notes thrown
Binary log 'binlog-m1.000001' is not purged because...
when slave_connections_needed_for_purge changes its value.
This apparently needs documenting, and I suggest we close the ticket with that.

Andrei Elkin added a comment - 2025-02-17 15:23 - edited As elenst shows by the test there's no issue in purging except it was not intuitively clear that show-binary-log would list something like master-bin.000003...8663 at time the slave has processed that log. However the presence of master-bin.000003 in the active logs is justified by the docs with Possible purges happen at startup and at binary log rotation. That is even though the slave is in sync, in order that specific log gets purged master has to rotate binlog e.g through explicit FLUSH LOGS . Furthermore, the description did not mention notes thrown Binary log 'binlog-m1.000001' is not purged because... when slave_connections_needed_for_purge changes its value. This apparently needs documenting, and I suggest we close the ticket with that.

Andrei Elkin added a comment - 2025-02-17 16:52

The lack of an expected purge is explained. In the same comments a related subject of emerged NOTE in the master error log is handled in KB's slave_connections_needed_for_purge.

Andrei Elkin added a comment - 2025-02-17 16:52 The lack of an expected purge is explained . In the same comments a related subject of emerged NOTE in the master error log is handled in KB's slave_connections_needed_for_purge .

MariaDB Server

Binlog expiry broken when slave_connections_needed_for_purge > 0

Details

Description

Attachments

Activity

People

Dates

Git Integration