PANTHER
Content | |
---|---|
Description | The PANTHER database classifies gene products into families |
Data types captured | Gene families |
Contact | |
Research center | University of Southern California |
Authors | Paul D Thomas |
Primary citation | PMID 12520017 |
Access | |
Website | [1] |
Miscellaneous | |
Bookmarkable entities | yes |
In bioinformatics the PANTHER (Protein ANalysis THrough Evolutionary Relationships) classification system is a biological database that can be used to classify and identify the function of gene products.[1] PANTHER is part of the Gene Ontology Reference Genome Project[2] designed to classify proteins and their genes for high-throughput analysis.
The project consists of both manual curation and bioinformatics algorithms.[3] Proteins are classified according to family (and subfamily), molecular function, biological process and pathway. It is one of the databases feeding into the European Bioinformatics Institute's InterPro database.[4]
PANTHER
Protein Analysis Through Evolutuonary Relationship (PANTHER) is a database aim to delivery high-throughput analysis of proteins sequences. One of the interesting feature of this PANTHER is that user can browse the database based on protein function. Another interesting feature of PANTHER db it draws protein sequence in pythalogic tree format for each sub-family.
Application of PANTHER:
The most important application of PANTHER is to accurately infer the function of uncharacterized genes from any organism based on their evolutionary relationships to genes with known functions. By combining gene function, ontology, pathways and statistical analysis tools, PANTHER enables biologists to analyze large-scale, genome-wide data obtained from the current advance technology including: sequencing, proteomics or gene expression experiments.
PANTHER History
- 1998:Project was launched at Molecular Application Group.
- 1999:Acquired by Celera Genomics.
- 2000:PANTHER 1 released in Celera Discovery Systems (CDS).
- 2001: PANTHER 2 released, which is used in the annotationon of the first published human genome Celera.
- 2002: PANTHER 3 released. PANTHER annotations are integrated in FlyBase.Moved to ABI
- 2003: PANTHER 4 released with the public release of PANTHER Classification System.
- 2005: PANTHER 5 released with PANTHER Pathway and analysis tool.Establish collabora3on with Interpro.
- 2006: PANTHER 6 released. Move to SRI.
- 2010: PANTHER 7 released.
- 2011: Move to USC.
- 2012: PANTHER 8 released.
- 2014: PANTHER 9 released
Phylogenetic Tree
In panther there is a pythogenetic tree for each of the protein families. Each of the node in this tree is annotated gene attributes. This gene attribute depicts ‘subfamily membership’, ‘protein class’ and gene function. Intersting part of this tree functionality is that internal node also depicts evolutionary event i.e speciation, gene duplication, horizontal gene transfer. In addition to this tree are annoted with gene ontology(GO) terms. Gene ontology is part of reference genome project. To generate pythologic tree PANTHER uses GIGA algorithm. GIGA uses species tree to develop tree construction on every iterationit attemp to reconcile tree in event form of speciation, gene duplication
PANTHER Library Data Generation Process
The process for data generation is divided in three step
- Family Clustering
- Pythologentic Tree Building
- Annotation of Tree Nodes
Family Clustering
Sequence Set:
PANTHER trees depicts gene family evolution from broad selection of fully sequences genes. PANTHER have one sequence per gene so that tree can represent event occurred over the course of evolution i.e duplication, speciation. PANTHER genomes set are selected based on following criteria. The set should include major experimatal model organism, this will assit in depicting functional information of the organism which are less studied. The set should include broad taxonomic range of other genomes, prederably fully sequenced and annoted this will help relating experimental model organism.
Family Clusters:
Following are the requirements for family clusters
- The family must contain atleast five total memebers and one member atleast from a Genetic Ontology (GO) reference genome
- The data should have sequence alignment of good qulity so it can support pythologenetic event
- The assessment of multiple aligened sequence is done by assessing length of the aligned sequence, atleast 30 sites aligned across 75% or more of family member
Phylogenetic Tree Building
For each family multiple sequence are aligned using default setting of MAFFT, any column which aligned less than 75 % of the sequence is removed. This data is then used as an input for GIGA program. The output tree from GIGA are labelled. Each internalnode labelled as whether divergence event happened as speciation or gene duplication.
Annotation of tree nodes
Each node in PANTHER tree is annotated with heritable attribute. Herritable attribute can be of three types subfamily membership, gene function and protein class membership. These annotation of nodes applies to primary sequence which was used to construct tree. In applying these annotation to primariy sequence simple evolutionary principle is used ie. Each node annotation Is propagated by its decendece node
PANTHER Components
PANTHER/LIB (PANTHER library): Library consist of collection of books.Each of this book represent multiple sequence alignment of protein family, HMM & family tree
PANTHER/X (PANTEHR index): Index contains abbreviated ontology which assist in summerizing, navigating molecular function and biological function.
PANTHER Pathways:
PANTHER includes 176 pathway using CellDesigner tool. PANTHER path ways can be downloaded in following file formats.
- Systems Biology Market Language (SBML)
- Systems Biology Graphical Notation (SBGN - ML)
- BioPax
How-To
Search By Keyword:
- Enter search term in textbox and press ‘Go’. User can also perform wildcard search by using ‘*’. e.g hero -> her*
- The page will display number of records matching the keyword. Click on number to view all the genes information. User can filter by species
- Click on gene identifier to view detail page.
Search By Keyword:
- Enter search term in textbox and press ‘Go’. User can also perform wildcard search by using ‘*’. e.g hero -> her*
- The page will display number of records matching the keyword. Click on number to view all the genes information. User can filter by species
- Click on gene identifier to view detail page.
Recent versions of PANTHER and their Statistics and Updates:
Version 6.0 (2006):
Version 6 uses UniProt [14] sequences as training sequences. There are 19132 UniProt training sequences directly associated with the pathway components. This version has ~1500 reactions in 130 pathways, and the number of pathways associated with subfamilies were expanded. PANTHER became a member of the InterPro Consortium. The availability of PANTHER data was improved (the HMMs can be downloaded by FTP). The PANTHER/LIB version 6.1 contains 221609 UniProt sequences from 53 organisms, grouped into 5546 families and 24561 subfamilies.
Version 7.0 and 7.2 (2009):
In this version the phylogenetic trees represent speciation and gene duplication events. Identification of gene orthologs is possible. We see more support for alternative database identifiers for genes, proteins and microarray probes. PANTHER version 7 uses the SBGN standard to depict biological pathways. Version 7 includes 48 set of genomes. To define the new families and in collaboration with the European Bioinformatics Institute’s InterPro group , approximately 1000 families of non-animal genomes were added in this version. The sources of gene sets included model organism databases, Ensembl [11] genome annotation and Entrez Gene[12]. Since this version, a stable identifier to each node in the tree is used. This stable identifier is a nine-digit number with the prefix PTN (stand for PANTHER tree node).
Version 8.0 (2012):
The reference proteome set maintained by the UniProt resource is used in this version of PANTHER (http://www.ebi.ac.uk/reference_proteomes/). It includes 82 set of genomes (approximately double compared with version 7) and 991985 protein coding genes from which 642319 genes (64.75%) have been used for family clusters. The source of gene sets is UniProt.
Version 9.0:
This version contains 7180 protein families, divided into 52,768 functionally distinct protein subfamilies. Version 9.0 has genomes of all 85 organisms.
External References
References
- ^ Thomas, PD.; Kejariwal, A.; Campbell, MJ.; Mi, H.; Diemer, K.; Guo, N.; Ladunga, I.; Ulitsky-Lazareva, B.; et al. (Jan 2003). "PANTHER: a browsable database of gene products organized by biological function, using curated protein family and subfamily classification". Nucleic Acids Res. 31 (1): 334–41. doi:10.1093/nar/gkg115. PMC 165562. PMID 12520017.
- ^ GO Reference Genome Annotation Project
- ^ Mi, H.; Muruganujan, A.; Thomas, PD. (Jan 2013). "PANTHER in 2013: modeling the evolution of gene function, and other gene attributes, in the context of phylogenetic trees". Nucleic Acids Res. 41 (Database issue): D377–86. doi:10.1093/nar/gks1118. PMID 23193289.
- ^ Hunter, S.; Jones, P.; Mitchell, A.; Apweiler, R.; Attwood, TK.; Bateman, A.; Bernard, T.; Binns, D.; et al. (Jan 2012). "InterPro in 2011: new developments in the family and domain prediction database". Nucleic Acids Res. 40 (Database issue): D306–12. doi:10.1093/nar/gkr948. PMC 3245097. PMID 22096229.