rsync over ssh copy
December 8, 2014
Introduction
If you are in the field of bioinformatics, you are most likely have to move large data files, genomes, exomes, ...
If you are moving this data between secured servers using Secure Shell, or SSH, you don't have too much options. From a GUI perspective you can use an SFTP such as FileZilla(Multiplatform) or winSCP(windows only), and on the command line side you can either use scp
, sftp
or rsync
with ssh. It is the latter option that we are covering in this post, as it offers a greater bandwidth and results in much faster data transfers :clap:
Basic command
NOTE: rsync should be installed in both ends of the transfer.
The rsync basic command is very simple:
rsync [SOURCE] [DESTINATION]
Obviously if you are trying to copy to a remote location the command will look like:
rsync [SOURCE] [rsync://[USER@]HOST[:PORT]/DEST] #the port isn't necessary
Now I like to introduce some of the options that I always use:
- -v: for the verbose mode to keep track of what's going on.
- -c: for checksum, to make sure nothing was lost during the process.
- -P: for progress, to follow the ongoing syncing.
- -e: specifies the remote shell to use
So the command will become:
rsync -vcP [SOURCE] [DESTINATION]
Speedup command
Now if you are note transferring super secured data -hence HIPAA protected data- you can use the arcfour cipher (weaker than the default) which is less cpu-bound:
rsync -vcP -e "ssh -c arcfour" [SOURCE] [DESTINATION]
To transfer large data files such as genomes or exomes for the first time to the [DESTINATION] server, use the --inplace option, which accepts resuming interrupted transfers in case of network issues:
rsync -vcP --inplace -e "ssh -c arcfour" [SOURCE] [DESTINATION]