[MDEV-25809] Cant start replication. MIssing rows Created: 2021-05-28  Updated: 2021-06-04

Status: Open
Project: MariaDB Server
Component/s: mariabackup
Affects Version/s: 10.5.6, 10.5.10
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Dutchak Vitalij Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None
Environment:

Ubuntu 18.04 LTS



 Description   

I have tried to setup one-way replication from wsrep cluster to another wsrep cluster.
I was use this article https://mariadb.com/kb/en/configuring-mariadb-replication-between-two-mariadb-galera-clusters/

I was choose one server from first cluster as replica master and put in config:

log-bin
binlog_format=ROW
log-basename=backup
expire_logs_days=2
log_slave_updates=ON
max_binlog_size=1073741824

Then i was made full backup and preapre it:

mariabackup --backup --user=user --password password --target-dir=/var/lib/backup/repl/
mariabackup --prepare --target-dir=/var/lib/backup/repl/ --use-memory=32G

Then copy to another server which i choose as replication slave.
And copt files back

mariabackup --copy-back --target-dir=/var/lib/backup/repl/ 
chown mysql:mysql /path/to/data/ -R

Got from file xtrabackup_binlog_info filename and pos (backup-bin.000017 and 47155747)

Then I was bootstrap new wsrep cluster with only 1 node and trying to start replication between this node and master.

CHANGE MASTER TO 
   MASTER_HOST="master-node", 
   MASTER_PORT=3306, 
   MASTER_USER="replica",  
   MASTER_PASSWORD="secret", 
   MASTER_LOG_FILE='backup-bin.000017',
   MASTER_LOG_POS=47155747;
 
START SLAVE;

And got error Node was dropped from the cluster. I was check error.log and got error:

2021-05-28  9:55:43 8723 [ERROR]  Slave SQL: Could not execute Update_rows_v1 event on table s1.table1; Can't find record in 'table1', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log backup-bin.000017, end_log_pos 47501643, Gtid 1-102-2391016143, Internal MariaDB error code: 1032
2021-05-28  9:55:43 8723 [ERROR] Slave SQL: Node has dropped from cluster, Gtid 1-102-2391016143, Internal MariaDB error code: 1047
2021-05-28  9:55:43 8723 [Note] Slave SQL thread exiting, replication stopped in log 'backup-bin.000017' at position 47501209

So i was checked what happend in binlog at this pos and find 1 record update. I was check this record on master and slave, it's present on master but not at slave. So I was checked does this record present in binlog and find insert query at position 47071167

As I understand problem in the backup operation or prepare operation. so I have not full data after backup or wrong position after prepare operation.

I was try to repeat it couple times but got same results. What I can check in configs or maybe I was missing something?



 Comments   
Comment by Dutchak Vitalij [ 2021-06-04 ]

Find out problem. Sorry for distracting.
Problem was in combination

innodb_flush_log_at_trx_commit = 0

and NVME drives.

innodb_flush_log_at_trx_commit = 1

fix the problem

Generated at Thu Feb 08 09:40:34 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.