Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-21397

rsync SST fails if donor is running >=10.1.36 and joiner is running < 10.1.36

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Won't Fix
    • Affects Version/s: 10.1.32, 10.1.38
    • Fix Version/s: N/A
    • Component/s: Galera SST
    • Labels:
      None

      Description

      An rsync SST will fail if the donor node is running MariaDB 10.1.36 or later and if the joiner is running MariaDB 10.1.35 or before.

      We can test this by performing the following process:

      • Let's say that you start with a cluster running MariaDB 10.1.32 (wsrep_25.23)
      • Then upgrade one node to 10.1.38 (wsrep_25.24).
      • Then restart a node that is still running MariaDB 10.1.32.
      • Ensure that this node requests an rsync SST from the node running MariaDB 10.1.38.

      The SST should fail if the node is using the rsync SST method.

      On the joiner node, it fails like this:

      2019-12-17 16:58:57 139757220779776 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 50)
      2019-12-17 16:58:57 139757553883904 [Note] WSREP: Requesting state transfer: success, donor: 1
      2019-12-17 16:58:57 139757553883904 [Note] WSREP: GCache history reset: old(00000000-0000-0000-0000-000000000000:0) -> new(1216d81e-16e3-11ea-83b5-475d0cb60198:50)
      2019-12-17 16:58:57 139757220779776 [Warning] WSREP: 1.0 (10.9.29.37): State transfer to 0.0 (10.9.29.36) failed: -255 (Unknown error 255)
      2019-12-17 16:58:57 139757220779776 [ERROR] WSREP: gcs/src/gcs_group.cpp:gcs_group_handle_join_msg():736: Will never receive state. Need to abort.
      2019-12-17 16:58:57 139757220779776 [Note] WSREP: gcomm: terminating thread
      2019-12-17 16:58:57 139757220779776 [Note] WSREP: gcomm: joining thread
      2019-12-17 16:58:57 139757220779776 [Note] WSREP: gcomm: closing backend
      

      On the donor node, it fails like this:

      2019-12-17 16:58:56 140514099697408 [Note] WSREP: Flushing tables for SST...
      2019-12-17 16:58:56 140514099697408 [Note] WSREP: Provider paused at 1216d81e-16e3-11ea-83b5-475d0cb60198:50 (76)
      2019-12-17 16:58:56 140514099697408 [Note] WSREP: Tables flushed.
      @ERROR: Unknown module 'rsync_sst-data_dir'
      rsync error: error starting client-server protocol (code 5) at main.c(1516) [sender=3.0.9]
      WSREP_SST: [ERROR] rsync innodb_data_home_dir returned code 5: (20191217 16:58:57.152)
      2019-12-17 16:58:57 140514099697408 [ERROR] WSREP: Failed to read from: wsrep_sst_rsync --role 'donor' --address '10.9.29.36:4444/rsync_sst' --socket '/mariadb/run/mysql.sock' --datadir '/mariadb/persistent/' --defaults-file '/usr/local/mariadb/mariadb_base/etc/my.cnf' '' --gtid '1216d81e-16e3-11ea-83b5-475d0cb60198:50' --gtid-domain-id '0'
      2019-12-17 16:58:57 140514099697408 [ERROR] WSREP: Process completed with error: wsrep_sst_rsync --role 'donor' --address '10.9.29.36:4444/rsync_sst' --socket '/mariadb/run/mysql.sock' --datadir '/mariadb/persistent/' --defaults-file '/usr/local/mariadb/mariadb_base/etc/my.cnf' '' --gtid '1216d81e-16e3-11ea-83b5-475d0cb60198:50' --gtid-domain-id '0': 255 (Unknown error 255)
      2019-12-17 16:58:57 140514099697408 [Note] WSREP: resuming provider at 76
      2019-12-17 16:58:57 140514099697408 [Note] WSREP: Provider resumed.
      2019-12-17 16:58:57 140514099697408 [ERROR] WSREP: Command did not run: wsrep_sst_rsync --role 'donor' --address '10.9.29.36:4444/rsync_sst' --socket '/mariadb/run/mysql.sock' --datadir '/mariadb/persistent/' --defaults-file '/usr/local/mariadb/mariadb_base/etc/my.cnf' '' --gtid '1216d81e-16e3-11ea-83b5-475d0cb60198:50' --gtid-domain-id '0'
      2019-12-17 16:58:57 140516960229120 [Warning] WSREP: 1.0 (10.9.29.37): State transfer to 0.0 (10.9.29.36) failed: -255 (Unknown error 255)
      2019-12-17 16:58:57 140516960229120 [Note] WSREP: Shifting DONOR/DESYNCED -> JOINED (TO: 50)
      

      We can see that the failure occurred while transferring data using the rsync module called rsync_sst-data_dir. This module is used by the rsync SST script to refer to the path defined by innodb_data_home_dir. Support for this was only added in MariaDB 10.1.36 by MDEV-10754.

      This basically means that a node running MariaDB 10.1.36 or later cannot be an rsync SST donor for a node running MariaDB 10.1.35 or before.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              jplindst Jan Lindström
              Reporter:
              juan.vera Juan
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: