[MDEV-11128] Asynchronous replication slave to MariaDB Galera Cluster failed after upgrade to 10.1.18 version - Jira

XML

Word

Printable

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Duplicate
Affects Version/s: 10.1.18
Fix Version/s: 10.1.19
Component/s: Galera, Replication
Labels:
None
Environment:
Mariadb Galera cluster 10.1.18 at Amazon AWS EC2 m4.4xl, 3 nodes, CentOS 6.8 x86_64. Plus asynchronous replication slave at the same OS/MariaDB for backup purposes.

Description

I have Mariadb Galera cluster at Amazon AWS EC2 m4.4xl, 3 nodes, and attached asynchronous replication slave used the same MariaDB, attached using GTID, for backup purposes. After recent update Galera cluster nodes from 10.1.17 to 10.1.18 version the async replication slave stopped with random errors like:

[ERROR] Slave SQL: Could not execute Write_rows_v1 event on table ... Cannot add or
update a child row: a foreign key constraint fails ... Error_code: 1452; handler error HA_ERR_NO_REFERENCED_ROW; the event's master log mysql-bin.000531, end_log_pos 342554778, Gtid 0-101132-703126803, Internal MariaDB error code: 1452
[ERROR] Slave SQL: Could not execute Update_rows_v1 event on table ... Can't find record in ... Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000523, end_log_pos 309442918, Gtid 0-101133-702983782, Internal MariaDB error code: 1032

and similar. The replication was broken and I can't recover it currently. I tried to restore it several times with no success. For creation async replication slave I used percona-xtrabackup innobackupex, at first it was version 2.3.5, then the latest ver 2.4.4. The my.cnf configuration options are the same on Galera nodes and async replica (except disabled WSREP, innodb buffer size and different server_id), and this configuration is stable for more than a year already. During the last async replica recovery attempts I tried to use and MASTER_LOG_FILE plus MASTER_LOG_POS, and switched to GTID by setting gtid_slave_pos and CHANGE MASTER TO master_use_gtid=slave_pos - every time replication stops at the same position with the same error. Of course during different recovery attempts I had different MASTER_LOG_POS and/or GTID values, but it was on the same place on each recovery attempt.

Currently I'm trying to downgrade Mariadb Galera cluster back to 10.1.17, this should help. But definitely in 10.1.18 something was changed with binary logging, probably log_slave_updates=1 partially ignored, or innobackupex became incompatible with new 10.1.18 changes on making dump on Galera cluster.

Attachments

Issue Links

is duplicated by

MDEV-10944 GALERA log-slave-updates REGRESSION FAILURE - after upgrading from 10.1.17 to 10.1.18

Closed

Activity

People

Assignee:: Nirbhay Choubey (Inactive)

Reporter:: Kaidalov Pavel

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 2016-10-24 21:25

Updated:: 2016-10-25 15:15

Resolved:: 2016-10-25 15:15

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.