institute of bioinformatics and systems biology / mips

Font size »A . A+ . A++ .

The MIPS cDNA Consortium Group -- The PRIMATOR Databases

 

Overview

The complete assignment of all protein-coding and non-protein-coding transcripts of the human genome is still a major challenge for today's “omics”. A comparative analysis of additional transcripts of species closely related to man, such as orangutan, gorilla and chimpanzee, will definitely yield valuable information for a better understanding of human disease-related splice variants and evolutionary conserved gene structures.
Therefore, a pipeline for the analysis, annotation and presentation (PRIMATOR) of so far uncharacterized orangutan (Pongo pygmaeus) and human transcripts has been established at MIPS (Munich Information Center for Protein Sequences) within the German cDNA Consortium.

The Core Annotation Databases

The core annotation database provides manually added information for every cDNA regarding to its open reading frame (ORF) property, clone quality, chromosomal localization and a short description of the protein. All attributes used within the annotation process can be used for selecting clones of interest by using a user-friendly search page.
By using a defined terminology and a standardized guideline developed at MIPS a consistent and reliable cDNA annotation is granted.
Genomic alignments of all cDNAs are made available as custom tracks for the UCSC Genome Browser.


The Disease Sets

A nonredundant set of PRIMATR cDNAs (human and orangutan) have been mapped to UniProt entries containing a disease note. Up to now 454 human and 443 orangutan cDNAs could be identified this way. The Interface provides a search option by clone ID, SwissProt ID or disease name.

 

 
The pESTdb

As a whole genome scaffold of the orangutan (Pongo pygmaeus) will not be available for the near future, comparisons of genes between human and its closest relatives chimpanzee, bonobo and orangutan are limited by the number of available data for orangutan. To solve this, an collection of EST-based transcript tags has been created by the German cDNA Consortium, providing a resource for genome-scale bioinformatic analysis. The primate EST database (pESTdb) is based on assembled transcripts tags, created from 5'-end single-pass sequences of 34,006 randomly selected orangutan cDNA clones of 4 different tissues.
Possible orthologue proteins, and genome based mapping / clustering, as well as functional categorization based on FunCat and observed expression levels are shown. The tissues used for the generation of the cDNA libraries are derived from subspecies Pongo pygmaeus abelii (Sumatran orangutan), provided by the Max-Planck Institute for Evolutionary Anthropology.