Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-13657

One of the nodes can't rejoin to cluster with Xtrabackup-v2 SST method

    XMLWordPrintable

Details

    Description

      A few days ago we Upgraded our cluster (3 nodes) From MariaDB 10.1 to 10.2.
      Today one of the nodes was corrupted.
      So we decide to rejoin node to cluster with SST method to take full backup and back again.
      But when I start mariadb, an error occurred after a few seconds.
      I checked the error log but I did not understand why does this error occur?.
      This problem did not occur when we were on 10.0 or 10.1 version and any time any node needs to SST method, everything worked well. but now...

      This is the Error log of Donor node:

      WSREP_SST: [INFO] Evaluating innobackupex   --no-version-check  $tmpopts $INNOEXTRA --galera-info --stream=$sfmt $itmpdir 2>${DATA}/innobackup.backup.log | pigz | socat -u stdio TCP:xx.xx.xx.xx:4444; RC=( ${PIPESTATUS[@]} ) (20170827 15:51:01.065)
      2017-08-27 15:51:01 140610122925824 [Warning] Aborted connection 834 to db: 'unconnected' user: 'xtrabackup' host: 'localhost' (Got an error reading communication packets)
      WSREP_SST: [ERROR] innobackupex finished with error: 1.  Check /home/mysql//innobackup.backup.log (20170827 15:51:01.210)
      WSREP_SST: [ERROR] Cleanup after exit with status:22 (20170827 15:51:01.212)
      2017-08-27 15:51:01 140609528502016 [ERROR] WSREP: Failed to read from: wsrep_sst_xtrabackup-v2 --role 'donor' --address 'xx.xx.xx.xx:4444/xtrabackup_sst//1' --socket '/home/mysql/mysql.sock' --datadir '/home/mysql/'    --binlog 'binlog' --gtid 'dda62f30-e8ff-11e5-b5ce-3754b5ddaee3:8614408762' --gtid-domain-id '0'
      2017-08-27 15:51:01 140609528502016 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'donor' --address 'xx.xx.xx.xx:4444/xtrabackup_sst//1' --socket '/home/mysql/mysql.sock' --datadir '/home/mysql/'    --binlog 'binlog' --gtid 'dda62f30-e8ff-11e5-b5ce-3754b5ddaee3:8614408762' --gtid-domain-id '0': 22 (Invalid argument)
      2017-08-27 15:51:01 140609528502016 [ERROR] WSREP: Command did not run: wsrep_sst_xtrabackup-v2 --role 'donor' --address 'xx.xx.xx.xx:4444/xtrabackup_sst//1' --socket '/home/mysql/mysql.sock' --datadir '/home/mysql/'    --binlog 'binlog' --gtid 'dda62f30-e8ff-11e5-b5ce-3754b5ddaee3:8614408762' --gtid-domain-id '0'
      2017-08-27 15:51:01 140656739600128 [Warning] WSREP: 0.0 (node3): State transfer to 2.0 (node1) failed: -22 (Invalid argument)
      2017-08-27 15:51:01 140656739600128 [Note] WSREP: Shifting DONOR/DESYNCED -> JOINED (TO: 8614408973)
      

      And this is the Innodb.backup.log:

      170827 15:51:01 innobackupex: Starting the backup operation
       
      IMPORTANT: Please check that the backup run completes successfully.
                 At the end of a successful backup run innobackupex
                 prints "completed OK!".
       
      170827 15:51:01 Connecting to MySQL server host: localhost, user: xtrabackup, password: set, port: not set, socket: /home/mysql/mysql.sock
      Using server version 10.2.8-MariaDB-log
      innobackupex version 2.3.9 based on MySQL server 5.6.24 Linux (x86_64) (revision id: fde0e3e)
      xtrabackup: uses posix_fadvise().
      xtrabackup: cd to /home/mysql/
      xtrabackup: open files limit requested 0, set to 16364
      xtrabackup: using the following InnoDB configuration:
      xtrabackup:   innodb_data_home_dir = ./
      xtrabackup:   innodb_data_file_path = ibdata1:12M:autoextend
      xtrabackup:   innodb_log_group_home_dir = ./
      xtrabackup:   innodb_log_files_in_group = 2
      xtrabackup:   innodb_log_file_size = 1073741824
      xtrabackup: using O_DIRECT
      InnoDB: No valid checkpoint found.
      InnoDB: If this error appears when you are creating an InnoDB database,
      InnoDB: the problem may be that during an earlier attempt you managed
      InnoDB: to create the InnoDB data files, but log file creation failed.
      InnoDB: If that is the case, please refer to
      InnoDB: http://dev.mysql.com/doc/refman/5.6/en/error-creating-innodb.html
      

      Error log of Joiner:

      WSREP_SST: [INFO] Waiting for SST streaming to complete! (20170827 15:55:51.479)
      2017-08-27 15:55:52 139946286794496 [Note] WSREP: (7a3de297, 'tcp://0.0.0.0:4567') connection to peer 7a3de297 with addr tcp://xx.xx.xx.xx:4567 timed out, no messages seen in PT3S
      2017-08-27 15:55:52 139946286794496 [Note] WSREP: (7a3de297, 'tcp://0.0.0.0:4567') turning message relay requesting off
      WSREP_SST: [ERROR] xtrabackup_checkpoints missing, failed innobackupex/SST on donor (20170827 15:56:01.566)
      WSREP_SST: [ERROR] Cleanup after exit with status:2 (20170827 15:56:01.568)
      2017-08-27 15:56:01 139919355795200 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address 'xx.xx.xx.xx' --datadir '/home/mysql/'   --parent '1313' --binlog 'binlog' : 2 (No such file or directory)
      2017-08-27 15:56:01 139919355795200 [ERROR] WSREP: Failed to read uuid:seqno and wsrep_gtid_domain_id from joiner script.
      2017-08-27 15:56:01 139946409769088 [ERROR] WSREP: SST failed: 2 (No such file or directory)
      2017-08-27 15:56:01 139946409769088 [ERROR] Aborting
       
      2017-08-27 15:56:01 139919364187904 [Warning] WSREP: 1.0 (node2): State transfer to 2.0 (node1) failed: -22 (Invalid argument)
      2017-08-27 15:56:01 139919364187904 [ERROR] WSREP: gcs/src/gcs_group.cpp:gcs_group_handle_join_msg():736: Will never receive state. Need to abort.
      

      and my.cnf setting(this is same in the nodes)

      #
      # This group is read both both by the client and the server
      # use it for options that affect everything
      #
      [client-server]
       
      #
      # include all files from the config directory
      #
      !includedir /etc/my.cnf.d
       
      [mysqld]
      skip-name-resolve
      datadir         = /home/mysql/
      tmpdir          = /home/mysqltmp/
      port            = 3306
      socket          = /home/mysql/mysql.sock
       
      # General #
      log_error                      = /home/mysql/node1.err
      default-storage-engine         = innodb
      query_cache_limit              = 0
      query_cache_size               = 0
      query_cache_type               = 0
      bind-address                   = 0.0.0.0
      max-connections                = 500
      max-allowed-packet             = 15M
      table_open_cache               = 60000
      tmp_table_size                 = 512M
       
      #FTS
      innodb_ft_result_cache_limit=256M
       
       
      # Binlog #
      binlog_format                  = ROW
      sync_binlog                    = 0
       
      # Galera #
      wsrep_on=on
      wsrep_provider=/usr/lib64/galera/libgalera_smm.so
      wsrep_cluster_address="gcomm://xx.xx.xx.xx,xx.xx.xx.xx,xx.xx.xx.xx"
      wsrep_cluster_name='galera_cluster'
      wsrep_node_address='xx.xx.xx.xx'
      wsrep_node_name='node1'
      wsrep_provider_options="gcache.size=25G;gcs.fc_limit=4000;gcache.page_size=512M;gcs.fc_master_slave=YES;gcs.fc_factor=1.0;"
      wsrep_sst_method=xtrabackup-v2
      wsrep_sst_auth=xtrabackup:1111111
       
       
      #Replication
      gtid_domain_id=1
      server_id=1
      log_slave_updates=1
      log_bin=binlog
      expire_logs_days=2
       
      # Innodb #
      innodb_file_format             = BARRACUDA
      innodb-flush-method            = O_DIRECT
      innodb_autoinc_lock_mode       = 2
      innodb_log_file_size           = 1G
      innodb_log_buffer_size         = 512M
      innodb_file_per_table          = 1
      innodb_flush_log_at_trx_commit = 0
      innodb_buffer_pool_size        = 100G
      innodb_buffer_pool_instances   = 20
      innodb_write_io_threads        = 16
      innodb_read_io_threads         = 16
      innodb_change_buffering        = all
      transaction-isolation          = READ-COMMITTED
       
      [sst]
      compressor="pigz"
      decompressor="pigz -d"
      

      Attachments

        Activity

          People

            anikitin Andrii Nikitin (Inactive)
            HamoonDBA Hamoon Mohammadian Pour
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.