[MDEV-29785] Duplicate gtid event in binlog Created: 2022-10-13  Updated: 2022-10-19

Status: Open
Project: MariaDB Server
Component/s: Galera, Replication
Affects Version/s: 10.4.26
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Jesse Assignee: Angelique Sklavounos (Inactive)
Resolution: Unresolved Votes: 0
Labels: None
Environment:

RHEL7 derivate, 3 node galera cluster


Attachments: Text File duplicate_event_binlog.txt    

 Description   

As the GTIDs drift during upgrade process of a galera node (mysql_upgrade --skip-write-binlog --system-tables) and to resynchonize the nodes I ran:

SET SESSION gtid_seq_no=<a nice number above the current gtid>;
INSERT INTO hurfdurf (a) VALUES (1);

With 10.4.26 the gtid set event was recorded twice in the binlog causing binlog replication to stop. This is a new bug introduced after 10.4.21 as previous upgrades using the same method worked fine. 10.5.17 with set session wsrep_gtid_seq_no is not affected.

Workaround:
1. stop slave, switch to file+position after the duplicate event and start slave
2. flush binlog at source as you can't continue from a file which has the duplicate event even if you're past the duplicate event itself.
3. stop slave, flip back to gtid based replication, start slave



 Comments   
Comment by Angelique Sklavounos (Inactive) [ 2022-10-14 ]

Hi ospi

Thank you for the report. I will try to reproduce, but could I please get more details on the replication topology of the setup? Also, what is show slave status output when replication stops?

Thank you.

Comment by Jesse [ 2022-10-18 ]

Actually this isn't galera related. More likely wsrep_gtid_mode related bug

Steps to reproduce with podman
/opt/MDEV-29785/server.cnf contains the following:

[server]
log-bin=1
wsrep_gtid_mode=1
server-id=777

Podman steps

podman run -dt --rm -v /opt/MDEV-29785:/etc/mysql/conf.d --name mariadb-master -e MYSQL_ROOT_PASSWORD=root mariadb:10.4
podman run -dt --rm --name mariadb-slave -e MYSQL_ROOT_PASSWORD=root mariadb:10.4
 
# Start slave
change_master="
CHANGE MASTER TO MASTER_HOST=\"$(podman inspect mariadb-master -f '{{ .NetworkSettings.IPAddress }}')\",
MASTER_USER=\"root\",
MASTER_PASSWORD=\"root\",
MASTER_USE_GTID=slave_pos;
START SLAVE;"
podman exec -it mariadb-slave mysql -uroot -proot -e "$change_master"
 
# Do something on master
podman exec -it mariadb-master mysql -uroot -proot -e "create database derp"
# verify replication is OK
podman exec -it mariadb-slave mysql -uroot -proot -e "show databases"
 
# This works : bump gtid seq no and create a database
podman exec -it mariadb-master mysql -uroot -proot -e "SET SESSION gtid_seq_no=1000; CREATE DATABASE derp_also;"
 
# This works : bump gtid seq no and create a table
podman exec -it mariadb-master mysql -uroot -proot -e "SET SESSION gtid_seq_no=2000; CREATE TABLE derp.duh (id int unsigned auto_increment primary key);"
 
# This doesn't work : bump gtid and INSERT
podman exec -it mariadb-master mysql -uroot -proot -e "SET SESSION gtid_seq_no=3000; INSERT INTO derp.duh (id) VALUES (null);"
 
# Slave stoppped
podman exec -it mariadb-slave mysql -uroot -proot -e "show slave status \G"
# Binlogs show duplicate event for GTID 0-777-3000 trans
podman exec -it mariadb-master /bin/bash -c "/usr/bin/mysqlbinlog /var/lib/mysql/\$(tail -n1 /var/lib/mysql/1.index)"

Generated at Thu Feb 08 10:11:16 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.