institute of bioinformatics and systems biology / mips

Font size »A . A+ . A++ .

Gepard

Gepard (German: "cheetah", Backronym for "GEnome PAir - Rapid Dotter") allows the calculation of dotplots even for large sequences like chromosomes or bacterial genomes.


Reference:

Krumsiek J, Arnold R, Rattei T. Gepard: A rapid and sensitive tool for creating dotplots on genome scale. Bioinformatics 2007; 23(8): 1026-8. PMID: 17309896

Use cases

  1. Online comparison of partial or complete genomes from the PEDANT databases without downloading any sequence data.
  2. Online comparison of a user-supplied nucleotide sequence against a genome from the PEDANT database.
  3. Local comparison two of nucleotide or amino acid sequences from user-specified files.
  4. Batch dotplot functionality provided by command line access to Gepard


Features

  • Rapid calculation of dotplots (<2min for E.coli self-plot on a standard computer)
  • Preconfigured parameters => simply specify two sequences and create the dotplot (3 clicks)
  • Easy-to-use interface (mouse zooming, context-sensitive help)
  • Image exports (multiple formats)
  • Should work on any common OS due to Java software architecture
  • Genes covered by the dotplot are linked to their report webpages in the PEDANT database
  • Coloring of genes by functional classification (uses data from PEDANT)
  • Persistent storage of suffix arrays (avoids recalculation) 
Gepard guarantees the privacy of all input data, does not store user data remotely and does not contain any form of malware.

Screenshot



Gepard application in remote mode displaying a dotplot of Escheria coli vs. Shigelia flexneri with encolored functional annotations (Click image for larger version of the screenshot)

System requirements

Gepard requires the Java Runtime Environment Version 5.0 or later (http://www.java.com/download/).

It has been tested on the following operating systems:

  • Microsoft Windows 2000 & XP
  • KDE 3+4 on Linux/Un*x system
  • MacOS 10.x


Download

Latest version: 1.30 (Version changes)

  1. Java Web Start - The convenient way to launch Gepard. Click the following link and Java Web Start should take of care downloading and starting Gepard. This also ensures that you are always running the latest version of Gepard.

    Note: Gepard requires special security rights (like access to the file system). You thus have to trust the certificate which will show up when launching the program.

    For more information on the different startup scripts for different amounts of free memory see Memory Issues below.

    Launch normal version (512MB)

    Launch low memory version (256MB)
    Launch high memory version (1024MB)


  2. Download the program

    • Download archive - Download a compressed archive containing the required JAR files, startup scripts and an offline version of the tutorial.

      gepard-1.30.zip
      gepard-1.30.tar.gz

    • Download JNLP file - You can also download the Java Web Start descriptor files (see above) to your computer. After downloading all required data once, the program will then run without an internet connections. Right-click on the links and select "Save as" from the pop-up menu.


Bugs

All known bugs should be fixed in the latest release of Gepard. Thanks to the anonymous bug report senders!

Source code

To get a copy of Gepard's source code please

Tutorial

Read the tutorial online. An offline version of the tutorial is included in the download archive above.

Method

Gepard utlizies suffix arrays for rapid heuristic dotplot calculation. For large dotplots it searches exact word matches of a certain length (10 by default) from one sequence in the suffix array of the other sequence. As an arbitary word is found in log(n) time within a suffix array this method reduces complexity of the dotplot calculation from O(m*n) to O(m * log n) (where n is the length of the longer, m the length of the shorter sequence). For small dotplots the classical window-based dotplot calculation is utilized.

Memory issues / Vmatch support

The program uses the "Skew" algorithm to calculate the suffix arrays. This algorithm is very memory-intense so Gepard might require a large amount of available memory.
Unfortunately, the Java VMs for all operating systems have to be given the maximum amount of available memory at startup.
This is why there are different startup scripts for different machines.

The following table shows the approximate maximum sequence size (assuming a self-plot) for each memory setting. This includes both suffix array and dot matrix calculation.

256MB~10 million base pairs
512MB~20 million base pairs
1024MB~40 million base pairs


Gepard supports the program "mkvtree" from the Vmatch packages which is able to calculate persistent suffix arrays in very short time and with very little memory usage. Gepard will attempt to use this external binary automatically if it can be located in the programs directory or in the environment variable PATH.

If you are using Vmatch with Gepard you may run the low-memory version of Gepard as the mkvtree binary will run outside the Java VM.

Contact



Last change: Feb, 2010