'liftOver' genomic coordinates

January 18, 2015

Introduction

With SNP data coming in various formats and genome builds (e.g. for human genomes hg18/hg19). There is the need to compare this data, combine it annotate or simply can't use it with a tool that supports only hg19, or vice-versa. There are multiple correct answers to this question, most of which are more or less simple to use.

Solutions

The solution depends on what kind of data you have, its size and the infrastructure you are using. You can split the specter of available tools into two main groups:

UCSC liftOver and derivatives:

  1. UCSC liftOver: liftOver is available as a webapp that you can use to do your conversion. It is also available as a command line tool, that requires JDK which could be a limitation for some. Add to that the tool is only free for research purposes and involves a $1000 one-time fee for commercial applications. Another important limitation is its support only bed files, which turns to be totaly impractical in NGS times.

  2. Galaxy: Supports BED, GFF, GTF input.) (Based on UCSC liftOver tool) supports BED, GFF, GTF input.

  3. pyliftover: Very limited since it “only does conversion of point coordinates, that is, unlike liftOver, it does not convert ranges, nor does it provide any special facilities to work with BED files”. But it frees the user from the jdk requirement.

  4. rtracklayer: A bioconductor package that implements liftOver and has the same functionalities as liftOver

  5. flo: A pipeline implementation for different reference genome builds of the same species. A rather complicated solution for a simple conversion.

Other solutions:

  1. NCBI Remap: It has similar functionalities as liftOver, but the conversion process is different. It has a webapp, and it is also available as an API for NCBI Remap.

  2. The Ensembl API: Part of Ensembl core API, it offers 'Coordinates', 'Coordinate systems', 'Transform', and 'Transfer'. Ensembl also has a simple web service for coordinate conversions.

  3. CrossMap: My personal favorite solution, it is a "CrossMap is a program for convenient conversion of genome coordinates (or annotation files) between different assemblies (such as Human hg18 (NCBI36) <> hg19 (GRCh37), Mouse mm9 (MGSCv37) <> mm10 (GRCm38))". Its major strength over the other tools, is its supports of a wide range of file formats, especially NGS related file formats including SAM/BAM, Wiggle/BigWig, BED, GFF/GTF, VCF. It is also python based and arguably faster than any other option especially when dealing with beg BAM files.

Final thoughts

If you are mostly dealing with small datasets and don't have a command line knowledge, you are using a non-unix based system, or simply don't have the right permissions on the machine I recommend CrossMap.

Comments