Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-11128

Asynchronous replication slave to MariaDB Galera Cluster failed after upgrade to 10.1.18 version

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 10.1.18
    • Fix Version/s: 10.1.19
    • Component/s: Galera, Replication
    • Labels:
      None
    • Environment:
      Mariadb Galera cluster 10.1.18 at Amazon AWS EC2 m4.4xl, 3 nodes, CentOS 6.8 x86_64. Plus asynchronous replication slave at the same OS/MariaDB for backup purposes.

      Description

      I have Mariadb Galera cluster at Amazon AWS EC2 m4.4xl, 3 nodes, and attached asynchronous replication slave used the same MariaDB, attached using GTID, for backup purposes. After recent update Galera cluster nodes from 10.1.17 to 10.1.18 version the async replication slave stopped with random errors like:

      • [ERROR] Slave SQL: Could not execute Write_rows_v1 event on table ... Cannot add or
        update a child row: a foreign key constraint fails ... Error_code: 1452; handler error HA_ERR_NO_REFERENCED_ROW; the event's master log mysql-bin.000531, end_log_pos 342554778, Gtid 0-101132-703126803, Internal MariaDB error code: 1452
      • [ERROR] Slave SQL: Could not execute Update_rows_v1 event on table ... Can't find record in ... Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000523, end_log_pos 309442918, Gtid 0-101133-702983782, Internal MariaDB error code: 1032

      and similar. The replication was broken and I can't recover it currently. I tried to restore it several times with no success. For creation async replication slave I used percona-xtrabackup innobackupex, at first it was version 2.3.5, then the latest ver 2.4.4. The my.cnf configuration options are the same on Galera nodes and async replica (except disabled WSREP, innodb buffer size and different server_id), and this configuration is stable for more than a year already. During the last async replica recovery attempts I tried to use and MASTER_LOG_FILE plus MASTER_LOG_POS, and switched to GTID by setting gtid_slave_pos and CHANGE MASTER TO master_use_gtid=slave_pos - every time replication stops at the same position with the same error. Of course during different recovery attempts I had different MASTER_LOG_POS and/or GTID values, but it was on the same place on each recovery attempt.

      Currently I'm trying to downgrade Mariadb Galera cluster back to 10.1.17, this should help. But definitely in 10.1.18 something was changed with binary logging, probably log_slave_updates=1 partially ignored, or innobackupex became incompatible with new 10.1.18 changes on making dump on Galera cluster.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              nirbhay_c Nirbhay Choubey (Inactive)
              Reporter:
              kpvmaria Kaidalov Pavel
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.