rsync over ssh copy

December 8, 2014

Introduction

If you are in the field of bioinformatics, you are most likely have to move large data files, genomes, exomes, ...

If you are moving this data between secured servers using Secure Shell, or SSH, you don't have too much options. From a GUI perspective you can use an SFTP such as FileZilla(Multiplatform) or winSCP(windows only), and on the command line side you can either use scp, sftp or rsync with ssh. It is the latter option that we are covering in this post, as it offers a greater bandwidth and results in much faster data transfers :clap:

Basic command

NOTE: rsync should be installed in both ends of the transfer.

The rsync basic command is very simple:

rsync [SOURCE] [DESTINATION]

Obviously if you are trying to copy to a remote location the command will look like:

rsync [SOURCE] [rsync://[USER@]HOST[:PORT]/DEST] #the port isn't necessary

Now I like to introduce some of the options that I always use:

  • -v: for the verbose mode to keep track of what's going on.
  • -c: for checksum, to make sure nothing was lost during the process.
  • -P: for progress, to follow the ongoing syncing.
  • -e: specifies the remote shell to use

So the command will become:

rsync -vcP [SOURCE] [DESTINATION]

Speedup command

Now if you are note transferring super secured data -hence HIPAA protected data- you can use the arcfour cipher (weaker than the default) which is less cpu-bound:

rsync -vcP -e "ssh -c arcfour" [SOURCE] [DESTINATION]

To transfer large data files such as genomes or exomes for the first time to the [DESTINATION] server, use the --inplace option, which accepts resuming interrupted transfers in case of network issues:

rsync -vcP --inplace -e "ssh -c arcfour" [SOURCE] [DESTINATION]
Note: You can achieve faster transfers using OpenSSH HPN an ssh upgrad for high performance networks. Read here.

Source: Blazing Fast File Transfer with Rsync and SSH HPN

Comments