[MDEV-13789] mariabackup galera SST fail Created: 2017-09-13  Updated: 2022-01-27  Resolved: 2022-01-25

Status: Closed
Project: MariaDB Server
Component/s: Galera SST
Affects Version/s: 10.2.7
Fix Version/s: 10.1.31, 10.2.14

Type: Bug Priority: Major
Reporter: TAO ZHOU Assignee: Julius Goryavsky
Resolution: Fixed Votes: 1
Labels: contribution, foundation
Environment:

FreeBSD 11.1


Issue Links:
Blocks
is blocked by MDEV-14030 Remove or Merge wsrep_sst_mariabackup Closed
Problem/Incident
causes MDEV-15355 upgrade to 10.1.31 break Galera SST/I... Closed
Relates
relates to MDEV-13478 Full SST sync fails because of the er... Closed

 Description   

I am running mariadb 10.2.7 mulit-master with 3 nodes.

On the joiner node, I get the following error:

mbstream: Can't create/write to file './ib_logfile0' (Errcode: 2 "No such file or directory")
mbstream: failed to create file.
2017/09/13 12:23:03 socat[77169] E write(1, 0x801e46f40, 8192): Broken pipe
++ RC=(${PIPESTATUS[@]})

I tried to change the streaming method to tar in wsrep_sst_mariabackup and it seems mariabackup only support xbstream.



 Comments   
Comment by TAO ZHOU [ 2017-09-14 ]

I have made it work.
It's a bug in wsrep_sst_mariabackup.
The SST script first clean up everything before the the backup begins.
The problem is that it creates the .ssh directory and then runs the cleanup.

Simple fix is move 'mkdir -p ${DATA}/.ssh' after the cleanup.

--- wsrep_sst_mariabackup.orig	2017-09-13 13:42:27.850390000 +1000
+++ wsrep_sst_mariabackup	2017-09-14 12:28:49.954901000 +1000
@@ -889,15 +889,15 @@
             wsrep_log_info "WARNING: Stale temporary SST directory: ${DATA}/.sst from previous state transfer. Removing"
             rm -rf ${DATA}/.sst
         fi
+
+        wsrep_log_info "Cleaning the existing datadir and innodb-data/log directories"
+        find $ib_home_dir $ib_log_dir $ib_undo_dir $DATA -mindepth 1  -regex $cpat  -prune  -o -exec rm -rfv {} 1>&2 \+
+
         mkdir -p ${DATA}/.sst
         (recv_joiner $DATA/.sst "${stagemsg}-SST" 0 0) &
         jpid=$!
         wsrep_log_info "Proceeding with SST"
 
-
-        wsrep_log_info "Cleaning the existing datadir and innodb-data/log directories"
-        find $ib_home_dir $ib_log_dir $ib_undo_dir $DATA -mindepth 1  -regex $cpat  -prune  -o -exec rm -rfv {} 1>&2 \+
-
         tempdir=$(parse_cnf mysqld log-bin "")
         if [[ -n ${tempdir:-} ]];then
             binlog_dir=$(dirname $tempdir)

Comment by Andrii Nikitin (Inactive) [ 2017-09-21 ]

I definitely see good reasoning here, but .sst folder should be excluded from `find` command according to '$cpat' value defined in the script earlier.
This what I tried to prove that :

$ mkdir datatest
$ cd datatest/
$ mkdir .sst
$ touch .sst/aaa
$ touch aaa
$ find .
.
./.sst
./.sst/aaa
./aaa
$ cpat='.*galera\.cache$\|.*sst_in_progress$\|.*\.sst$\|.*gvwstate\.dat$\|.*grastate\.dat$\|.*\.err$\|.*\.log$\|.*RPM_UPGRADE_MARKER$\|.*RPM_UPGRADE_HISTORY$'
$ find . -mindepth 1  -regex $cpat  -prune -o -exec echo xx: {} +
xx: ./aaa

Last line doesn't print .sst folder , so find command wouldn't attempt to clean it up.

After writing conclusions above (and basing on my previous bash experience) - I decided to try the same commands depending on bash nullglob configuration:

$ shopt -s nullglob
$ find $(pwd) -mindepth 1  -regex '.*\.sst' -prune -o -exec echo xx: {} +
xx: /home/a/datatest/aaa
$ find $(pwd) -mindepth 1  -regex $cpat -prune -o -exec echo xx: {} +
xx: /home/a/datatest/.sst /home/a/datatest/.sst/aaa /home/a/datatest/aaa
$ shopt -u nullglob
$ find $(pwd) -mindepth 1  -regex $cpat -prune -o -exec echo xx: {} +
xx: /home/a/datatest/aaa

Is there chance that line "shopt -s nullglob" present somewhere in .bashrc or similar places in your environment? Could you try the same commands in your environment and confirm if you observe the same behavior with default nullglob ?
I believe to make script tolerant to nullglob configuraition - it is required to wrap $cpat with double quotes mark "" - I shall open dedicated bug for that :

$ shopt -u nullglob
$ find $(pwd) -mindepth 1  -regex $cpat -prune -o -exec echo xx: {} +
xx: /home/a/datatest/aaa
$ shopt -s nullglob
$ find $(pwd) -mindepth 1  -regex $cpat -prune -o -exec echo xx: {} +
xx: /home/a/datatest/.sst /home/a/datatest/.sst/aaa /home/a/datatest/aaa
$ find $(pwd) -mindepth 1  -regex "$cpat" -prune -o -exec echo xx: {} +
xx: /home/a/datatest/aaa

Comment by TAO ZHOU [ 2017-09-23 ]

What bash version you were using? I was running it on freebsd. Maybe I was using csh. Is find command buit-in inside the shell? I found it was not the problem with the $cpat, but the '-o' option. Because with -o, it always deletes everything, no matter what regex you put there. This also removes grastate and gvwstate.dat, causing further issues.

Comment by Andrii Nikitin (Inactive) [ 2017-09-25 ]

I've checked csh on FreeBSD 11.1 and after commands below find ignores .sst folder, so problem not in '-o' option

mkdir datatest
datatest/
.sst
.sst/aaa
touch aaa
find . -mindepth 1  -regex '.*\.sst$'  -prune -o -exec echo {} +

Comment by TAO ZHOU [ 2017-10-11 ]

Hi Andrii,

If I use '.*\.sst$', I got the same result as you did. But it didn't work with $cpat.
The following command didn't work either.

find . -mindepth 1  -regex '.*\.sst$\|.*grastate\.dat$'  -prune -o -exec echo {} +

I think the BSD find might use different regular expressions because I tried installing gnu find and it worked.

gfind . -mindepth 1  -regex '.*\.sst$\|.*grastate\.dat$'  -prune -o -exec echo {} +

$
$ cpat='.galera\.cache$|.*sst_in_progress$|.\.sst$|.gvwstate\.dat$|.*grastate\.dat$|.\.err$|.*\.log$'
$ find . -mindepth 1 -regex $cpat -prune -o -exec echo xx: {} +
xx: ./grastate.dat ./.sst ./.sst/aaa
$

Comment by TAO ZHOU [ 2017-10-11 ]

After some testing, I found the following command works in FreeBSD. And you cannot escape the OR operator "|".

$ find -E . -mindepth 1   -regex '.*galera\.cache$|.*sst_in_progress$|.*\.sst$|.*gvwstate\.dat$|.*grastate\.dat$|.*\.err$|.*\.log$' -prune -o -exec echo {} +

Comment by Daniel Black [ 2018-01-30 ]

commit https://github.com/MariaDB/server/pull/560/commits/be83785d7363f1792d0be8a2e9c9b59c8e01392e corrected this for wsrep_sst_xtrabackup-v2. Because MDEV-14030 had a plan to merge the two sst mechanims once that is resolved this will be fixed also.

Comment by Julius Goryavsky [ 2022-01-25 ]

Closed as this issue has been fixed for a long time (back in 10.1.31 and 10.2.14) and this fix has already been migrated to all other versions, however the new fix for MDEV-27524 contains further enhancements to correctly handle --datadir, --innodb -data-home-dir and other directories.

Generated at Thu Feb 08 08:08:22 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.