[MDEV-13657] One of the nodes can't rejoin to cluster with Xtrabackup-v2 SST method Created: 2017-08-27  Updated: 2017-08-28  Resolved: 2017-08-28

Status: Closed
Project: MariaDB Server
Component/s: Galera, Galera SST, Storage Engine - InnoDB
Affects Version/s: 10.2.8
Fix Version/s: N/A

Type: Bug Priority: Critical
Reporter: Hamoon Mohammadian Pour Assignee: Andrii Nikitin (Inactive)
Resolution: Not a Bug Votes: 0
Labels: galera, innodb, mariadb
Environment:

Centos 7



 Description   

A few days ago we Upgraded our cluster (3 nodes) From MariaDB 10.1 to 10.2.
Today one of the nodes was corrupted.
So we decide to rejoin node to cluster with SST method to take full backup and back again.
But when I start mariadb, an error occurred after a few seconds.
I checked the error log but I did not understand why does this error occur?.
This problem did not occur when we were on 10.0 or 10.1 version and any time any node needs to SST method, everything worked well. but now...

This is the Error log of Donor node:

WSREP_SST: [INFO] Evaluating innobackupex   --no-version-check  $tmpopts $INNOEXTRA --galera-info --stream=$sfmt $itmpdir 2>${DATA}/innobackup.backup.log | pigz | socat -u stdio TCP:xx.xx.xx.xx:4444; RC=( ${PIPESTATUS[@]} ) (20170827 15:51:01.065)
2017-08-27 15:51:01 140610122925824 [Warning] Aborted connection 834 to db: 'unconnected' user: 'xtrabackup' host: 'localhost' (Got an error reading communication packets)
WSREP_SST: [ERROR] innobackupex finished with error: 1.  Check /home/mysql//innobackup.backup.log (20170827 15:51:01.210)
WSREP_SST: [ERROR] Cleanup after exit with status:22 (20170827 15:51:01.212)
2017-08-27 15:51:01 140609528502016 [ERROR] WSREP: Failed to read from: wsrep_sst_xtrabackup-v2 --role 'donor' --address 'xx.xx.xx.xx:4444/xtrabackup_sst//1' --socket '/home/mysql/mysql.sock' --datadir '/home/mysql/'    --binlog 'binlog' --gtid 'dda62f30-e8ff-11e5-b5ce-3754b5ddaee3:8614408762' --gtid-domain-id '0'
2017-08-27 15:51:01 140609528502016 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'donor' --address 'xx.xx.xx.xx:4444/xtrabackup_sst//1' --socket '/home/mysql/mysql.sock' --datadir '/home/mysql/'    --binlog 'binlog' --gtid 'dda62f30-e8ff-11e5-b5ce-3754b5ddaee3:8614408762' --gtid-domain-id '0': 22 (Invalid argument)
2017-08-27 15:51:01 140609528502016 [ERROR] WSREP: Command did not run: wsrep_sst_xtrabackup-v2 --role 'donor' --address 'xx.xx.xx.xx:4444/xtrabackup_sst//1' --socket '/home/mysql/mysql.sock' --datadir '/home/mysql/'    --binlog 'binlog' --gtid 'dda62f30-e8ff-11e5-b5ce-3754b5ddaee3:8614408762' --gtid-domain-id '0'
2017-08-27 15:51:01 140656739600128 [Warning] WSREP: 0.0 (node3): State transfer to 2.0 (node1) failed: -22 (Invalid argument)
2017-08-27 15:51:01 140656739600128 [Note] WSREP: Shifting DONOR/DESYNCED -> JOINED (TO: 8614408973)

And this is the Innodb.backup.log:

170827 15:51:01 innobackupex: Starting the backup operation
 
IMPORTANT: Please check that the backup run completes successfully.
           At the end of a successful backup run innobackupex
           prints "completed OK!".
 
170827 15:51:01 Connecting to MySQL server host: localhost, user: xtrabackup, password: set, port: not set, socket: /home/mysql/mysql.sock
Using server version 10.2.8-MariaDB-log
innobackupex version 2.3.9 based on MySQL server 5.6.24 Linux (x86_64) (revision id: fde0e3e)
xtrabackup: uses posix_fadvise().
xtrabackup: cd to /home/mysql/
xtrabackup: open files limit requested 0, set to 16364
xtrabackup: using the following InnoDB configuration:
xtrabackup:   innodb_data_home_dir = ./
xtrabackup:   innodb_data_file_path = ibdata1:12M:autoextend
xtrabackup:   innodb_log_group_home_dir = ./
xtrabackup:   innodb_log_files_in_group = 2
xtrabackup:   innodb_log_file_size = 1073741824
xtrabackup: using O_DIRECT
InnoDB: No valid checkpoint found.
InnoDB: If this error appears when you are creating an InnoDB database,
InnoDB: the problem may be that during an earlier attempt you managed
InnoDB: to create the InnoDB data files, but log file creation failed.
InnoDB: If that is the case, please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.6/en/error-creating-innodb.html

Error log of Joiner:

WSREP_SST: [INFO] Waiting for SST streaming to complete! (20170827 15:55:51.479)
2017-08-27 15:55:52 139946286794496 [Note] WSREP: (7a3de297, 'tcp://0.0.0.0:4567') connection to peer 7a3de297 with addr tcp://xx.xx.xx.xx:4567 timed out, no messages seen in PT3S
2017-08-27 15:55:52 139946286794496 [Note] WSREP: (7a3de297, 'tcp://0.0.0.0:4567') turning message relay requesting off
WSREP_SST: [ERROR] xtrabackup_checkpoints missing, failed innobackupex/SST on donor (20170827 15:56:01.566)
WSREP_SST: [ERROR] Cleanup after exit with status:2 (20170827 15:56:01.568)
2017-08-27 15:56:01 139919355795200 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address 'xx.xx.xx.xx' --datadir '/home/mysql/'   --parent '1313' --binlog 'binlog' : 2 (No such file or directory)
2017-08-27 15:56:01 139919355795200 [ERROR] WSREP: Failed to read uuid:seqno and wsrep_gtid_domain_id from joiner script.
2017-08-27 15:56:01 139946409769088 [ERROR] WSREP: SST failed: 2 (No such file or directory)
2017-08-27 15:56:01 139946409769088 [ERROR] Aborting
 
2017-08-27 15:56:01 139919364187904 [Warning] WSREP: 1.0 (node2): State transfer to 2.0 (node1) failed: -22 (Invalid argument)
2017-08-27 15:56:01 139919364187904 [ERROR] WSREP: gcs/src/gcs_group.cpp:gcs_group_handle_join_msg():736: Will never receive state. Need to abort.

and my.cnf setting(this is same in the nodes)

#
# This group is read both both by the client and the server
# use it for options that affect everything
#
[client-server]
 
#
# include all files from the config directory
#
!includedir /etc/my.cnf.d
 
[mysqld]
skip-name-resolve
datadir         = /home/mysql/
tmpdir          = /home/mysqltmp/
port            = 3306
socket          = /home/mysql/mysql.sock
 
# General #
log_error                      = /home/mysql/node1.err
default-storage-engine         = innodb
query_cache_limit              = 0
query_cache_size               = 0
query_cache_type               = 0
bind-address                   = 0.0.0.0
max-connections                = 500
max-allowed-packet             = 15M
table_open_cache               = 60000
tmp_table_size                 = 512M
 
#FTS
innodb_ft_result_cache_limit=256M
 
 
# Binlog #
binlog_format                  = ROW
sync_binlog                    = 0
 
# Galera #
wsrep_on=on
wsrep_provider=/usr/lib64/galera/libgalera_smm.so
wsrep_cluster_address="gcomm://xx.xx.xx.xx,xx.xx.xx.xx,xx.xx.xx.xx"
wsrep_cluster_name='galera_cluster'
wsrep_node_address='xx.xx.xx.xx'
wsrep_node_name='node1'
wsrep_provider_options="gcache.size=25G;gcs.fc_limit=4000;gcache.page_size=512M;gcs.fc_master_slave=YES;gcs.fc_factor=1.0;"
wsrep_sst_method=xtrabackup-v2
wsrep_sst_auth=xtrabackup:1111111
 
 
#Replication
gtid_domain_id=1
server_id=1
log_slave_updates=1
log_bin=binlog
expire_logs_days=2
 
# Innodb #
innodb_file_format             = BARRACUDA
innodb-flush-method            = O_DIRECT
innodb_autoinc_lock_mode       = 2
innodb_log_file_size           = 1G
innodb_log_buffer_size         = 512M
innodb_file_per_table          = 1
innodb_flush_log_at_trx_commit = 0
innodb_buffer_pool_size        = 100G
innodb_buffer_pool_instances   = 20
innodb_write_io_threads        = 16
innodb_read_io_threads         = 16
innodb_change_buffering        = all
transaction-isolation          = READ-COMMITTED
 
[sst]
compressor="pigz"
decompressor="pigz -d"



 Comments   
Comment by Andrii Nikitin (Inactive) [ 2017-08-27 ]

You must upgrade xtrabackup to 2.4 , because 2.3 cannot handle 10.2 and expectedly shows that error 'No valid checkpoint found'.

Comment by Hamoon Mohammadian Pour [ 2017-08-28 ]

Oh Thank you Andrii
We used xtrabackup 2.3.9.
After upgraded to 2.4.8, the problem is solved

Generated at Thu Feb 08 08:07:18 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.