-------------------------------------------------------------------------------

Rsync only updates from a source directory to a destinations for which either
can be local or remote.  IE one direction only though timestamps can prevent
newer files from being overwritten.

A new application "unison" is a by-directional syncronization tool.  But
relies on a cache of the directory information.  The cache can take a bit of
time to set up, but once done unison works vary well.  As such it understands
file deletions better than rsync, and knows when the same file on both areas
were updated creating a conflict.

-------------------------------------------------------------------------------
Get a listing from a rsync server

   rsync -avz rsync://samba.anu.edu.au/rsyncftp

-------------------------------------------------------------------------------
Initial syncronization

This is better achieved using full directory copy due to the number of files
involved rather than with rsync. There is little difference in total network
traffic when doing that initial copy.

   tar -C /source/dir -jcf - . | ssh remotehost 'cd /target/dir && tar -jxf -'

The 'j' flag means bzip2 the data during the tar process.

After that rsync can be used to setup its checksum caches, for future updates.

ASIDE: Using the "gzip --rsyncable" flag lets you create a gzip tar file that
rsync can transfer more easily.  That is just rsync the deltas of the tar file
and not the whole file.

-------------------------------------------------------------------------------
Remote to Local transfers

You can specify multiple remote to local file/directory transfers
more easilly with...

    rsync [options]    remote:'file1 file2 dir1 dir2'  .

-------------------------------------------------------------------------------
Remote Commands in Rync ;-)

Some people may not be aware that you can do some tricky things with
remote shell expansion in rsync. The bit after the colon is passed to
the remote shell for expansion, which allows you to not only use
wildcards, but also to use remote programs to specify which files to
transfer (this doesn't work with a rsync daemon as there is no remote
shell).

For example:

    rsync -avze ssh rana:'`find transfer/ -name *.gz`' .

would transfer all *.gz files from the transfer/ directory on the
remote system to the current directory. That's a fairly pointless
example but remember that you can use _any_ options that find accepts,
or even pass the result through a pipe like this:

    rsync -avze ssh \
       rana:'`find transfer/ -user tridge -name *.gz | grep smb`' \
       /tmp/

Many of the fancier things can be done like this. For more complex cases
you can write a shell script at the other end that outputs a file list,
then just call that shell script as part of the rsync process, for
example:

     rsync -avze ssh rana:'`bin/mylist`' /dest/

would run the command "bin/mylist" on the other end and use the
resulting file list as the list of files to transfer.

-------------------------------------------------------------------------------
Specifing the location of the remote "rsync" command.

Depending on the remote accounts shell, and the method used to contact
the remote account, the path to the rsync command (or for other things
like the xterm) may not be defined.

The solution is to...

Define location remote server in the PATH environment (prefered)...

  Login Shell...
    * Set default path in   ".cshrc"   or other shell `dot' file that
      will be automatically read before executing a remote command.
      WARNING: not all shells read a `dot' file

  Ssh Communications...
    * Set in ".ssh/environment"  file (local or remote?)

    * Add shell Commands ".ssh/rc" on the remote machine

  System Wide...
    * Set in in the remote machines global /etc files
      EG:  "/etc/environment"  or  "/etc/default/login" or equivelent

    * set in the ssh config files "/etc/ssh_config" or "/etc/sshrc"
      (local or remote machine?)

As part of rsync command...

 * Use the  --rsync-path giving posible locations rsync MAY be located in.

 * Compile in a default  --rsync-path  into the local rsync

Best idea is to use the defaults on the remote system and update those
defaults if posible.  When that fails then fall back to using a --rsync-path
command line option.

-------------------------------------------------------------------------------
Remote rsync as root

This runs the remote rsync as root (for permissions), by calling it using sudo.
However sudo often needs a password. Resulting in an error...

    sudo: no tty present and no askpass program specified
or
    Pseudo-terminal will not be allocated because stdin is not a terminal.

To get this to work we set up sudo to use a password helper program.

rsync -av -e 'ssh -X' \
  --rsync-path='SUDO_ASKPASS=/usr/libexec/openssh/ssh-askpass sudo -A rsync' \
  /some/local/path user@remote:/some/remote/path

You can find a graphical askpass program using...
  locate askpass
  locate ask-pass

-------------------------------------------------------------------------------
Making incremental backups with rsync...

Moved to https://antofthy.gitlab.io/info/usage/rsync_backup.hints"

-------------------------------------------------------------------------------
Includes and Excludes

You'll probably be better off using an --exclude-from file.

If you end with an --exclude "*", be sure to include every parent directory of
files that you want to include, or the files below those directories will be
ignored.

It's also safest to start all the include patterns with "/" to make sure they
match the beginning of paths; otherwise the patterns may match the end of some
other pathnames.

You have to include all parent directories for the file you want to include,
though you can exclude other files and directories in that tree.

You also need to include "./" because apparently that is the default
top-level directory name.  For example:

    rsync -r --exclude-from exclude_file remote_host:sub1 local_dir

with an exclude_file that contains

    + ./
    + /sub1/
    + /sub1/sub2/
    + /sub1/sub2/file1
    + /sub1/sub2/file2
    - /*

will retrieve only files sub1/sub2/file1 and sub1/sub2/file2.  I didn't
precede the "./" with a "/" because in the above example it is actually
"/sub1/./" that would be needed if a complete path were given.  If you drop
the "sub1" from the command line above, this exclude_file still works.

The patterns are processed in order and as soon as it hits the exclude * it
will stop looking.


Alturnative...
   do a --include $FILES_TO_TRANSFER   then add a  --exclude '*'

It turns out that if you have no wildcards in your includes and an exclude
'*' at the end, you will trigger an optimization in which the files are
directly opened and most of the include/exclude processing is skipped.
A side effect of this is that you don't actually need to include the parent
directories, although it might not be a good idea to depend on that feature.

---

This fails to to distribute the userf/fsubdir directory

  + /usera/
  + /userf/fsubdir/
  - /userf/*
  - /*

It seems that rsync does not compare every file in the source to its
include/exclude list.  That is, once it finds a directory is excluded, it
doesn't then test subdirectories.  So in this case, it checked /users/userf,
which didn't match "+ /userf/fsubdir/" but did match "- /*".

Thus it didn't ever try /users/userf/fsubdir.
This to oly include that sub-directory, but ignore anything else
you need to use...

  + /usera/
  + /userf/
  + /userf/fsubdir/
  - /userf/*
  - /*

With this rsync tests /users/userf and finds the "+ /userf" so it creates that
directory at the receiving end.  It then tests /users/userf/fsubdir and finds
"+ /userf/fsubdir/" and creates that and copies the included files.  Anything
else in /users/userf is excluded by the "- /userf/*".

Essentually you need to include sub-directorys wanted and then exclude all the
other parts you don't want!  That is two lined for every parent directory
leading to the file you want to actually include.

---
Exclude filename or types globally

You need to know that rsync applies each pattern both to individual name
components as it visits them and to the entire paths at that point.  So an
exclude of ".*" will exclude all dot files in every directory but it will not
exclude sub-directory under them which doesn't begin with dot.  So if you say

    + .netscape/
    - .*

you will exclude things like

    .file
    sub/.file2
    sub1/sub2/.file3
    .netscape/.file4

but you will not exclude

    .netscape/file5
    .netscape/dir1

because of your "+ .netscape/"

So for top level, always start with /
but for general filenames in any included directory, don't start with /

-------------------------------------------------------------------------------
Delete before or after?

The --delete option will recover disk space before starting transfers,
which can be important when space is tight, however it has to scan the
file system once before it even started transfers.  That is two file
system scans are needed.

The --delete-after does the deletes after the transfers have finished,
and as such rsync has already completed the main file system scan.  As
such it has a better performance.  But space will not be freed until
all the new files has been transfered, so you could run out of space
when you actually have enough.


-------------------------------------------------------------------------------
Can you delete source files after transfered ?

Not automatically. But I've done things like:

h=mailserver
d=Maildir
(
        rsync -vaze ssh $h:$d/. $d/ 2>&1 | \
                perl -lne 'print "'$d'/$_" if /^\S+[^\s\/]$/'
) | ssh $h perl -lne unlink

I.e. do an rsync for the pull, tear the names out of the "rsync -v"
output, and pipe 'em back over another ssh to delete the files.

Bennett Todd <bet@rahul.net>


WARNING: Rsync -v lists files that ARE being updated. It does not list
which HAVE been update!  That is it will also list failed transfers.

As such if rsync aborts half way though a file, you could delete a file
which was NOT FULLY TRANSFERED.  Also the current file being transfered
my not be the last listed as directories and file permissions may be
updated in parallel.

It also lists directorys before files (for directory creation).

NOTE: the option  --delete-after  just delays the deletion off the
temporary files on the *destination* side until after all the files have
been transferred. It does NOT delete the file from the *source* side.

You could to patch the source with a "--move-files" option to actually move
files between machines.                  Wayne Davison <wayne@blorf.net>

UPDATE: see new --remove-source-files option

-------------------------------------------------------------------------------
Setting up a secure ssh-rsync daemon on a remote machine.

From rsync@samba.anu.edu.au  Fri Dec 19 05:11:04 1997
From: reynhout@quesera.com
Subject: Re: Using rsync with sshd "command="?
Date:       Fri, 19 Dec 1997 06:10:41 +1100

> I'd like to use rsync with the sshd facility of restricting a given
> public key to a specific command with the "command=" facility of
> authorized_keys.

This is what I do to make this work:

thalia is my desktop, with a big disk array.  talulah is a remote box
that I need to keep synchronized in case of disaster.

>From thalia, I run rsync with the shell set to ssh_wrapper, which
decides how to reach talulah, and runs ssh with the appropriate
args (some of the remote boxes I use this for are behind an SSL
proxy.)

On talulah, sshd is hardwired to run rsync_wrapper, which examines
the SSH_ORIGINAL_COMMAND environment variable, verifies that it
begins with the full path of the rsync executable, and contains only
normal, non-meta characters.  In quick-hack perl:

$line=$ENV{SSH_ORIGINAL_COMMAND};
if ( $line =~ /^\/opt\/bin\/rsync --server --sender / ) {
  # this regexp will need tweaking to handle unusual
  # (but legal) characters in paths.  eg: [_\.]
  ($safeline=$line) =~ s|[^\w\s\d\-\/]||g;
  if ( $line ne $safeline ) { exit 1; }
  system("$line");
} else {
  exit 1;
}

If all these tests are passed, I just run SSH_ORIGINAL_COMMAND.
If any are not passed, I exit.  I used to print a diagnostic, but
rsync on thalia just realized that it isn't getting what it wanted
and said "Incorrect version information.  Is your shell clean?" or
something like that.  It's important that these wrapper scripts
DO NOT OUTPUT ANYTHING, even debugging info, status info, etc.
Otherwise rsync will be interfered with.

**It would be great if rsync would instead say:
**
**   Bad response from remote rsync instance (is your shell clean?):
**   <the first line of output>
**
**or similar.  But for now I just trap the error code.

In this manner, I can change the arguments passed to rsync, the
directories, etc without having to worry about how it appears
when it hits talulah.  Fwiw, the command-line I end up running
(right now) is:

/opt/bin/rsync --server --sender -vnlHogDtprI --delete /

 From a security perspective, I'm relying on my wrapper script
(on talulah) catching any clever attempts.  I'm also allowing
incoming ssh ONLY from thalia (for this user) and placing
limitations on who has login (and physical) access to thalia.
Subversion would require knocking thalia off the net, and getting
thalia's private key.  Even that should only allow you to get a
copy of arbitrary files on talulah.  It shouldn't allow any sort
of shell access or execution of arbitrary commands.

Anyone have any comments?

D Andrew Reynhout
reynhout@quesera.com   "You've got your whole life to do something,
reynhout@milkcrate.com  and that's not very long..." -ani difranco

-------------------------------------------------------------------------------
Network Connection Command

You can pass arguments to the network connection command, For example...

     rsync {options} --rsh 'ssh -x' {files}

Will run ssh WITHOUT it setting up the unneeded X windows communications.

However rsync's parsing of the --rsh option is a highly simplistic
"space" parsing. As such...

     rsync {options}  --rsh 'ssh -x -o "BatchMode yes"' {files}

will NOT work as expected. Ssh will receive the options '"BatchMode' and
'yes"' seperatally, neither of which are valid options.

As a work around in this case, most ssh programs will accept a "=" instead of
a space between option and its argument. (Only openssh-2.1 does not seem to
allow this).

As such for ssh you can aviod the problem with...

     rsync {options}  --rsh 'ssh -x -o BatchMode=yes' {files}


-------------------------------------------------------------------------------
RSync Limitations...

Since rsync builds in memory and transfers the whole file list
before it starts moving any files, a big tree causes (a) a large
memory requirement (b) a long delay before any files starts moving.

The amount of data is basically irrelevant for memory usage. The thing
that matters is the total number of files. Rsync will use about 80
bytes per file at each end (this is very rough, take it as a rule of
rule of thumb only).

This is one of the bad consequences of developing rsync on a machine
with 256MB of ram :)        -- Tridge

I've thought about this as well, and even toyed with writing a simple
script that would call rsync for each top level directory in a similarly
large tree -- but if a tree is relatively small this might actually make
things slower.          -- rsync user

Alternative...
  "Unison" caches the directory information before hand, as such it
knows about file deletions and even file moves. It also does not have to
scan directories so heavily so tends to be a lot faster that rsync on large
directory synchronization.

However caching is only useful for regular file coping, and not so
useful for once off coping, without some final cleanup of the cached
data.

Its purpose remember is for bi-directional synchronization, NOT
uni-directional copying.  However it does have flags to make it work in
one direction only.

-------------------------------------------------------------------------------
Limiting Network/Disk I/O

   -bwlimit=10000      # limit transfer to 10MB/s

-------------------------------------------------------------------------------
Rsync over Tor
via a SSH tunnel or socat

#!/bin/bash
#bash shell script for rsync retrieval of QSH
#QSH Network Mirror #2 (AKA "C4") - c4q7xthv2kltrqso.onion port 873

socat TCP4-LISTEN:46055,fork \
      SOCKS4A:localhost:c4q7xthv2kltrqso.onion:873,socksport=9050 \
      >qsh_rsync_log.txt 2>&1 &

echo "Please wait..."
rsync -v -z --timeout=300 --contimeout=12 rsync://localhost:46055 ./qsh

sleep 12s

#kills socat
kill %-

-------------------------------------------------------------------------------