[MXS-4658] Post reboot binlog router entered stuck state Created: 2023-06-29  Updated: 2023-09-27  Resolved: 2023-06-30

Status: Closed
Project: MariaDB MaxScale
Component/s: binlogrouter
Affects Version/s: 22.08.4
Fix Version/s: 2.5.26, 6.4.8, 22.08.7, 23.02.3

Type: Bug Priority: Minor
Reporter: Bryan Bancroft (Inactive) Assignee: Niclas Antti
Resolution: Duplicate Votes: 0
Labels: None
Environment:

ubuntu, on prem vm
2 mxs with keepalived, 2 local regular async, 1 off site skysql DR replication with binlog router


Issue Links:
Relates
relates to MXS-4631 Manually deleting log files breaks th... Closed

 Description   

After scheduled OS patching the following happened with a binlog router.

DB server reported
[Warning] Aborted connection 207574 to db: 'unconnected' user: 'maxscale' host: '10.90.194.37' (Got an error writing communication packets)

maxscale was erroring with
2023-06-28 15:13:14 error : (Replication-Proxy); Error received during replication from '10.90.194.79:3306': Could not open /var/lib/maxscale/binlogs//mariadb-bin.000049 for STOP_EVENT addition

Slave status was stuck in slave_io = connecting or timed out.

The binlog on maxscale was not malformed but just the header message.

Resolution was deleting all files in /var/lib/maxscale/binlog then turning service back on. This then properly repulled the bins and resumed operating properly.

The standby maxscale node did not suffer this problem



 Comments   
Comment by Niclas Antti [ 2023-06-30 ]

It looks like the file /var/lib/maxscale/binlogs//mariadb-bin.000049 really did not exist. Maybe maxscale was not properly taken down.
Deleting all the files works, although you may have to set the starting gtid manually in case the server does not have binlogs from the very beginning.

Once this fix https://jira.mariadb.org/browse/MXS-4631 is released, binlogrouter should figure out the situation by itself.

Generated at Thu Feb 08 04:30:13 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.