------------------------------------------------------------------------------- Rsync only updates from a source directory to a destinations for which either can be local or remote. IE one direction only though timestamps can prevent newer files from being overwritten. A new application "unison" is a by-directional syncronization tool. But relies on a cache of the directory information. The cache can take a bit of time to set up, but once done unison works vary well. As such it understands file deletions better than rsync, and knows when the same file on both areas were updated creating a conflict. ------------------------------------------------------------------------------- Get a listing from a rsync server rsync -avz rsync://samba.anu.edu.au/rsyncftp ------------------------------------------------------------------------------- Initial syncronization This is better achieved using full directory copy due to the number of files involved rather than with rsync. There is little difference in total network traffic when doing that initial copy. tar -C /source/dir -jcf - . | ssh remotehost 'cd /target/dir && tar -jxf -' The 'j' flag means bzip2 the data during the tar process. After that rsync can be used to setup its checksum caches, for future updates. ASIDE: Using the "gzip --rsyncable" flag lets you create a gzip tar file that rsync can transfer more easily. That is just rsync the deltas of the tar file and not the whole file. ------------------------------------------------------------------------------- Remote to Local transfers You can specify multiple remote to local file/directory transfers more easilly with... rsync [options] remote:'file1 file2 dir1 dir2' . ------------------------------------------------------------------------------- Remote Commands in Rync ;-) Some people may not be aware that you can do some tricky things with remote shell expansion in rsync. The bit after the colon is passed to the remote shell for expansion, which allows you to not only use wildcards, but also to use remote programs to specify which files to transfer (this doesn't work with a rsync daemon as there is no remote shell). For example: rsync -avze ssh rana:'`find transfer/ -name *.gz`' . would transfer all *.gz files from the transfer/ directory on the remote system to the current directory. That's a fairly pointless example but remember that you can use _any_ options that find accepts, or even pass the result through a pipe like this: rsync -avze ssh \ rana:'`find transfer/ -user tridge -name *.gz | grep smb`' \ /tmp/ Many of the fancier things can be done like this. For more complex cases you can write a shell script at the other end that outputs a file list, then just call that shell script as part of the rsync process, for example: rsync -avze ssh rana:'`bin/mylist`' /dest/ would run the command "bin/mylist" on the other end and use the resulting file list as the list of files to transfer. ------------------------------------------------------------------------------- Specifing the location of the remote "rsync" command. Depending on the remote accounts shell, and the method used to contact the remote account, the path to the rsync command (or for other things like the xterm) may not be defined. The solution is to... Define location remote server in the PATH environment (prefered)... Login Shell... * Set default path in ".cshrc" or other shell `dot' file that will be automatically read before executing a remote command. WARNING: not all shells read a `dot' file Ssh Communications... * Set in ".ssh/environment" file (local or remote?) * Add shell Commands ".ssh/rc" on the remote machine System Wide... * Set in in the remote machines global /etc files EG: "/etc/environment" or "/etc/default/login" or equivelent * set in the ssh config files "/etc/ssh_config" or "/etc/sshrc" (local or remote machine?) As part of rsync command... * Use the --rsync-path giving posible locations rsync MAY be located in. * Compile in a default --rsync-path into the local rsync Best idea is to use the defaults on the remote system and update those defaults if posible. When that fails then fall back to using a --rsync-path command line option. ------------------------------------------------------------------------------- Remote rsync as root This runs the remote rsync as root (for permissions), by calling it using sudo. However sudo often needs a password. Resulting in an error... sudo: no tty present and no askpass program specified or Pseudo-terminal will not be allocated because stdin is not a terminal. To get this to work we set up sudo to use a password helper program. rsync -av -e 'ssh -X' \ --rsync-path='SUDO_ASKPASS=/usr/libexec/openssh/ssh-askpass sudo -A rsync' \ /some/local/path user@remote:/some/remote/path You can find a graphical askpass program using... locate askpass locate ask-pass ------------------------------------------------------------------------------- Making incremental backups with rsync... Moved to https://antofthy.gitlab.io/info/usage/rsync_backup.hints" ------------------------------------------------------------------------------- Includes and Excludes You'll probably be better off using an --exclude-from file. If you end with an --exclude "*", be sure to include every parent directory of files that you want to include, or the files below those directories will be ignored. It's also safest to start all the include patterns with "/" to make sure they match the beginning of paths; otherwise the patterns may match the end of some other pathnames. You have to include all parent directories for the file you want to include, though you can exclude other files and directories in that tree. You also need to include "./" because apparently that is the default top-level directory name. For example: rsync -r --exclude-from exclude_file remote_host:sub1 local_dir with an exclude_file that contains + ./ + /sub1/ + /sub1/sub2/ + /sub1/sub2/file1 + /sub1/sub2/file2 - /* will retrieve only files sub1/sub2/file1 and sub1/sub2/file2. I didn't precede the "./" with a "/" because in the above example it is actually "/sub1/./" that would be needed if a complete path were given. If you drop the "sub1" from the command line above, this exclude_file still works. The patterns are processed in order and as soon as it hits the exclude * it will stop looking. Alturnative... do a --include $FILES_TO_TRANSFER then add a --exclude '*' It turns out that if you have no wildcards in your includes and an exclude '*' at the end, you will trigger an optimization in which the files are directly opened and most of the include/exclude processing is skipped. A side effect of this is that you don't actually need to include the parent directories, although it might not be a good idea to depend on that feature. --- This fails to to distribute the userf/fsubdir directory + /usera/ + /userf/fsubdir/ - /userf/* - /* It seems that rsync does not compare every file in the source to its include/exclude list. That is, once it finds a directory is excluded, it doesn't then test subdirectories. So in this case, it checked /users/userf, which didn't match "+ /userf/fsubdir/" but did match "- /*". Thus it didn't ever try /users/userf/fsubdir. This to oly include that sub-directory, but ignore anything else you need to use... + /usera/ + /userf/ + /userf/fsubdir/ - /userf/* - /* With this rsync tests /users/userf and finds the "+ /userf" so it creates that directory at the receiving end. It then tests /users/userf/fsubdir and finds "+ /userf/fsubdir/" and creates that and copies the included files. Anything else in /users/userf is excluded by the "- /userf/*". Essentually you need to include sub-directorys wanted and then exclude all the other parts you don't want! That is two lined for every parent directory leading to the file you want to actually include. --- Exclude filename or types globally You need to know that rsync applies each pattern both to individual name components as it visits them and to the entire paths at that point. So an exclude of ".*" will exclude all dot files in every directory but it will not exclude sub-directory under them which doesn't begin with dot. So if you say + .netscape/ - .* you will exclude things like .file sub/.file2 sub1/sub2/.file3 .netscape/.file4 but you will not exclude .netscape/file5 .netscape/dir1 because of your "+ .netscape/" So for top level, always start with / but for general filenames in any included directory, don't start with / ------------------------------------------------------------------------------- Delete before or after? The --delete option will recover disk space before starting transfers, which can be important when space is tight, however it has to scan the file system once before it even started transfers. That is two file system scans are needed. The --delete-after does the deletes after the transfers have finished, and as such rsync has already completed the main file system scan. As such it has a better performance. But space will not be freed until all the new files has been transfered, so you could run out of space when you actually have enough. ------------------------------------------------------------------------------- Can you delete source files after transfered ? Not automatically. But I've done things like: h=mailserver d=Maildir ( rsync -vaze ssh $h:$d/. $d/ 2>&1 | \ perl -lne 'print "'$d'/$_" if /^\S+[^\s\/]$/' ) | ssh $h perl -lne unlink I.e. do an rsync for the pull, tear the names out of the "rsync -v" output, and pipe 'em back over another ssh to delete the files. Bennett Todd WARNING: Rsync -v lists files that ARE being updated. It does not list which HAVE been update! That is it will also list failed transfers. As such if rsync aborts half way though a file, you could delete a file which was NOT FULLY TRANSFERED. Also the current file being transfered my not be the last listed as directories and file permissions may be updated in parallel. It also lists directorys before files (for directory creation). NOTE: the option --delete-after just delays the deletion off the temporary files on the *destination* side until after all the files have been transferred. It does NOT delete the file from the *source* side. You could to patch the source with a "--move-files" option to actually move files between machines. Wayne Davison UPDATE: see new --remove-source-files option ------------------------------------------------------------------------------- Setting up a secure ssh-rsync daemon on a remote machine. From rsync@samba.anu.edu.au Fri Dec 19 05:11:04 1997 From: reynhout@quesera.com Subject: Re: Using rsync with sshd "command="? Date: Fri, 19 Dec 1997 06:10:41 +1100 > I'd like to use rsync with the sshd facility of restricting a given > public key to a specific command with the "command=" facility of > authorized_keys. This is what I do to make this work: thalia is my desktop, with a big disk array. talulah is a remote box that I need to keep synchronized in case of disaster. >From thalia, I run rsync with the shell set to ssh_wrapper, which decides how to reach talulah, and runs ssh with the appropriate args (some of the remote boxes I use this for are behind an SSL proxy.) On talulah, sshd is hardwired to run rsync_wrapper, which examines the SSH_ORIGINAL_COMMAND environment variable, verifies that it begins with the full path of the rsync executable, and contains only normal, non-meta characters. In quick-hack perl: $line=$ENV{SSH_ORIGINAL_COMMAND}; if ( $line =~ /^\/opt\/bin\/rsync --server --sender / ) { # this regexp will need tweaking to handle unusual # (but legal) characters in paths. eg: [_\.] ($safeline=$line) =~ s|[^\w\s\d\-\/]||g; if ( $line ne $safeline ) { exit 1; } system("$line"); } else { exit 1; } If all these tests are passed, I just run SSH_ORIGINAL_COMMAND. If any are not passed, I exit. I used to print a diagnostic, but rsync on thalia just realized that it isn't getting what it wanted and said "Incorrect version information. Is your shell clean?" or something like that. It's important that these wrapper scripts DO NOT OUTPUT ANYTHING, even debugging info, status info, etc. Otherwise rsync will be interfered with. **It would be great if rsync would instead say: ** ** Bad response from remote rsync instance (is your shell clean?): ** ** **or similar. But for now I just trap the error code. In this manner, I can change the arguments passed to rsync, the directories, etc without having to worry about how it appears when it hits talulah. Fwiw, the command-line I end up running (right now) is: /opt/bin/rsync --server --sender -vnlHogDtprI --delete / From a security perspective, I'm relying on my wrapper script (on talulah) catching any clever attempts. I'm also allowing incoming ssh ONLY from thalia (for this user) and placing limitations on who has login (and physical) access to thalia. Subversion would require knocking thalia off the net, and getting thalia's private key. Even that should only allow you to get a copy of arbitrary files on talulah. It shouldn't allow any sort of shell access or execution of arbitrary commands. Anyone have any comments? D Andrew Reynhout reynhout@quesera.com "You've got your whole life to do something, reynhout@milkcrate.com and that's not very long..." -ani difranco ------------------------------------------------------------------------------- Network Connection Command You can pass arguments to the network connection command, For example... rsync {options} --rsh 'ssh -x' {files} Will run ssh WITHOUT it setting up the unneeded X windows communications. However rsync's parsing of the --rsh option is a highly simplistic "space" parsing. As such... rsync {options} --rsh 'ssh -x -o "BatchMode yes"' {files} will NOT work as expected. Ssh will receive the options '"BatchMode' and 'yes"' seperatally, neither of which are valid options. As a work around in this case, most ssh programs will accept a "=" instead of a space between option and its argument. (Only openssh-2.1 does not seem to allow this). As such for ssh you can aviod the problem with... rsync {options} --rsh 'ssh -x -o BatchMode=yes' {files} ------------------------------------------------------------------------------- RSync Limitations... Since rsync builds in memory and transfers the whole file list before it starts moving any files, a big tree causes (a) a large memory requirement (b) a long delay before any files starts moving. The amount of data is basically irrelevant for memory usage. The thing that matters is the total number of files. Rsync will use about 80 bytes per file at each end (this is very rough, take it as a rule of rule of thumb only). This is one of the bad consequences of developing rsync on a machine with 256MB of ram :) -- Tridge I've thought about this as well, and even toyed with writing a simple script that would call rsync for each top level directory in a similarly large tree -- but if a tree is relatively small this might actually make things slower. -- rsync user Alternative... "Unison" caches the directory information before hand, as such it knows about file deletions and even file moves. It also does not have to scan directories so heavily so tends to be a lot faster that rsync on large directory synchronization. However caching is only useful for regular file coping, and not so useful for once off coping, without some final cleanup of the cached data. Its purpose remember is for bi-directional synchronization, NOT uni-directional copying. However it does have flags to make it work in one direction only. ------------------------------------------------------------------------------- Limiting Network/Disk I/O -bwlimit=10000 # limit transfer to 10MB/s ------------------------------------------------------------------------------- Rsync over Tor via a SSH tunnel or socat #!/bin/bash #bash shell script for rsync retrieval of QSH #QSH Network Mirror #2 (AKA "C4") - c4q7xthv2kltrqso.onion port 873 socat TCP4-LISTEN:46055,fork \ SOCKS4A:localhost:c4q7xthv2kltrqso.onion:873,socksport=9050 \ >qsh_rsync_log.txt 2>&1 & echo "Please wait..." rsync -v -z --timeout=300 --contimeout=12 rsync://localhost:46055 ./qsh sleep 12s #kills socat kill %- -------------------------------------------------------------------------------