[MDEV-30627] Slave unable to sync with master Created: 2023-02-09  Updated: 2023-02-17

Status: Open
Project: MariaDB Server
Component/s: Backup, Replication
Affects Version/s: 10.6.11
Fix Version/s: 10.6

Type: Bug Priority: Major
Reporter: Kevin Wang Assignee: Angelique Sklavounos (Inactive)
Resolution: Unresolved Votes: 0
Labels: None
Environment:

RedHat 8



 Description   

Step to reproduce:

  1. make a full backup on master node using mariabackup
  2. copy the backup to slave node
  3. on slave node, restore the backup in /var/lib/mysql, then start mariadb service
  4. set CHANGE MASTER to master node, then start slave

problems:
when replication just start, it does replicate from master, but quickly, in "show slave status", I got:

                    Last_Errno: 1062
                    Last_Error: Could not execute Write_rows_v1 event on table InboundBankStatements.MT940Transactions; Duplicate entry 'DK3852950010031982-2023-11-1-DKK' for key 'AccountNrYearStatementSequenceCurrency', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log mysql-bin.008366, end_log_pos 81077987

after "set global SQL_SLAVE_SKIP_COUNTER = 1;", new error appeared:

                    Last_Errno: 1452
                    Last_Error: Could not execute Write_rows_v1 event on table InboundBankStatements.MT940Balances; Cannot add or update a child row: a foreign key constraint fails (`InboundBankStatements`.`MT940Balances`, CONSTRAINT `_balance_trx_fk1` FOREIGN KEY (`TrxId`) REFERENCES `MT940Transactions` (`Id`) ON DELETE CASCADE), Error_code: 1452; handler error HA_ERR_NO_REFERENCED_ROW; the event's master log mysql-bin.008366, end_log_pos 81078709

After downgraded MariaDB on slave node to 10.6.10, then repeat the steps above, the replication worked nicely.



 Comments   
Comment by Andrei Elkin [ 2023-02-09 ]

kevinwang1975, thanks for the report! We definitely need a more detailed description of your steps, necessarily providing us with exact commands esp. for pp.3,4.
p.4 may be the most critical as through it one specifies the slave state in terms of binlog file:pos or, recommended, GTID. Unless you do it already that way I suggest master_use_gtid=....
Should you fail to set the slave state correctly an error like your slave experienced is inevitable.

Comment by Kevin Wang [ 2023-02-09 ]

Hi Andrei,

Step 3)
The backup data dir copied from the master node to "/var/lib/mysql" on the slave node, and restore using the following command

mariabackup --parallel=16 --target-dir "/var/lib/mysql" --decompress --remove-original
chown mysql:mysql -R /var/lib/mysql
systemctl start mariadb

Step 4)
Bin log filename and position was read from "/var/lib/mysql/xtrabackup_binlog_info" (from mariabackup data files)

MariaDB [(none)]> CHANGE MASTER TO
    MASTER_HOST='<MASTER_DB_HOST>',
    MASTER_USER='<REPLICATION_USER>',
    MASTER_PASSWORD='<REPLICATION_PASSWORD>',
    MASTER_LOG_FILE='<MASTER_BIN_LOG_FILE>',
    MASTER_LOG_POS=<MASTER_BIN_LOG_POS>,
    MASTER_CONNECT_RETRY=300,
    MASTER_SSL_CERT='/etc/my.cnf.d/ssl/server-cert.pem',
    MASTER_SSL_KEY='/etc/my.cnf.d/ssl/server-key.pem',
    MASTER_SSL_CA='/etc/my.cnf.d/ssl/ca-cert.pem',
    MASTER_SSL_VERIFY_SERVER_CERT=1,
    MASTER_SSL_CIPHER='TLSv1.2',
    MASTER_SSL=1;
 
MariaDB [(none)]> START SLAVE;

The commands above have been used more than once with different versions of MariaDB with no issue. e.g. 10.4, 10.6, etc. We only ran into issue this time with 10.6.11 we recently upgraded to.

Comment by Angelique Sklavounos (Inactive) [ 2023-02-15 ]

Hi kevinwang1975

I'm trying to reproduce this and wanted to confirm/ask a few things.

  1. What was the exact command used to create the backup on the master node?
  2. Was the backup created with 10.6.11?
  3. Could you please attach the my.cnf files for both servers to the ticket?
  4. If possible, could you please upload the dump and/or query logs (item e) to our private FTP server?

Thank you.

Comment by Kevin Wang [ 2023-02-16 ]

Hi Angelique,

1. backup command:

/usr/bin/mariabackup --backup --no-version-check --socket=/var/lib/mysql/mysql.sock \
  --parallel=4 --compress-threads=4 \
  --extra-lsndir=/var/tmp --compress --target-dir=/mnt/backup

2. yes, backup was made with 10.6.11

About 3 and 4, as it is a PROD system, I don't have to freedom to share the content, sorry.

Generated at Thu Feb 08 10:17:41 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.