[MDEV-8735] galera rsync sst - tar of binlog to file before transfer waste of IO and space Created: 2015-09-03  Updated: 2020-12-07  Resolved: 2020-12-07

Status: Closed
Project: MariaDB Server
Component/s: Galera SST
Affects Version/s: 10.0.21-galera
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Daniel Black Assignee: Jan Lindström (Inactive)
Resolution: Won't Fix Votes: 0
Labels: beginner-friendly, galera
Environment:

rhel6


Issue Links:
Relates
relates to MDEV-15436 If log_bin and log_bin_index is diffe... Closed

 Description   

started sst transfer.

donor ran out of space due to the created wsrep_sst_binlog.tar

rather than creating the tar can the rsync method just add the binlog index files to the filter?

https://github.com/MariaDB/server/blob/10.0-galera/scripts/wsrep_sst_rsync.sh#L159



 Comments   
Comment by Daniel Black [ 2015-09-04 ]

rsync has a --include-from= which could point directly at the index file perhaps.

Comment by Daniel Black [ 2015-09-04 ]

Work in progress. patch wasn't sufficient to copy binlogs (or need it on both ends of connection).

--- /usr/bin/wsrep_sst_rsync.orig       2015-09-03 15:23:23.443343327 +0900
+++ /usr/bin/wsrep_sst_rsync    2015-09-04 08:12:08.562332162 +0900
@@ -79,9 +79,7 @@
 MAGIC_FILE="$WSREP_SST_OPT_DATA/rsync_sst_complete"
 rm -rf "$MAGIC_FILE"
 
-BINLOG_TAR_FILE="$WSREP_SST_OPT_DATA/wsrep_sst_binlog.tar"
 BINLOG_N_FILES=1
-rm -f "$BINLOG_TAR_FILE" || :
 
 if ! [ -z $WSREP_SST_OPT_BINLOG ]
 then
@@ -114,7 +112,7 @@
 
 # New filter - exclude everything except dirs (schemas) and innodb files
 FILTER=(-f '- /lost+found' -f '- /.fseventsd' -f '- /.Trashes'
-        -f '+ /wsrep_sst_binlog.tar' -f '+ /ib_lru_dump' -f '+ /ibdata*' -f '+ /*/' -f '- /*')
+        -f '+ /ib_lru_dump' -f '+ /ibdata*' -f '+ /*/' -f '- /*')
 
 if [ "$WSREP_SST_OPT_ROLE" = "donor" ]
 then
@@ -148,16 +146,10 @@
             # Prepare binlog files
             pushd $BINLOG_DIRNAME &> /dev/null
             binlog_files_full=$(tail -n $BINLOG_N_FILES ${BINLOG_FILENAME}.index)
-            binlog_files=""
             for ii in $binlog_files_full
             do
-                binlog_files="$binlog_files $(basename $ii)"
+                FILTER+=( -f  '+ /'$(basename $ii) )
             done
-            if ! [ -z "$binlog_files" ]
-            then
-                wsrep_log_info "Preparing binlog files for transfer:"
-                tar -cvf $BINLOG_TAR_FILE $binlog_files >&2
-            fi
             popd &> /dev/null
         fi
 
@@ -307,7 +299,6 @@
             # Clean up old binlog files first
             rm -f ${BINLOG_FILENAME}.*
             wsrep_log_info "Extracting binlog files:"
-            tar -xvf $BINLOG_TAR_FILE >&2
             for ii in $(ls -1 ${BINLOG_FILENAME}.*)
             do
                 echo ${BINLOG_DIRNAME}/${ii} >> ${BINLOG_FILENAME}.index
@@ -329,6 +320,4 @@
     exit 22 # EINVAL
 fi
 
-rm -f $BINLOG_TAR_FILE || :
-
 exit 0

Comment by Nirbhay Choubey (Inactive) [ 2015-11-19 ]

danblack This is a good idea. A few things to consider:

  • How will it play with different versions of SST scripts on donor/joiner?
  • How will it work when binlog directory on joiner is other than or outside data directory?
Comment by Daniel Black [ 2015-11-20 ]

for compatibility to an older joiner:

On the donor side I can see this done with the batch mode of rsync. rsync --write-batch created a file that can can be effectively used by the protocol. The trick would be to getting process generating the same format, a header, the tar file to standard out, and a footer. So (echo ... ; tar ... : echo ) | rsync .... --read-batch=- , I haven't checked how this plays out with other sync options present.

The batch file format looking at the rsync source seem to be exactly what is sent over the wire. I haven't found a good document for the wire protocol (https://code.activestate.com/recipes/577518-rsync-algorithm/ ?).

For joiner side, can't see anything plausible using rsync (maybe a process to filter the entire stream, handle the tar separately, and push the rest to rsync --read-batch), handing a tar -x process off a name pipe is too dumb for rsync.

But overall, maybe if it can't be done neatly a new rsync sst implementation like rsync_v2 to addresss this and other deficiencies of rsyncv1.

Up to you of course.

Comment by Daniel Black [ 2018-01-14 ]

Actually, ignore previous, this is a lot easier than I thought.

Looking at the current wsrep_sst_rsync method the server creates two module names. There just needs to be a third created for the binlogs.

Compatibility:

donor:

  • Donor will attempt to rsync binlogs to rsync://$WSREP_SST_OPT_ADDR/rsync_sst_binlogs, if it fails, fall back to creating a tar file.

joiner:

  • Create rsync module for $MODULE_binlogs pointing at the right path.
  • Existing code will unpack a tarball if that was transferred.
Generated at Thu Feb 08 07:29:23 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.