Ensembl Genomes: Difference between revisions
Added BioMart subsection |
→Variant Effect Predictor: added infromation about virtual instances |
||
Line 114: | Line 114: | ||
VEP users also have the possibility of viewing and manipulating all the jobs associated with their session by browsing the "Recent Tickets" tab. I this tab the users can view the status of their search (success, queued, running or failed) and save, delete or resubmit jobs <ref>{{cite web|title=VEP jobs|url=http://asia.ensembl.org/info/docs/tools/vep/online/input.html#ident|website=ensembl.org|publisher=Ensembl|accessdate=11 September 2014}}</ref>. |
VEP users also have the possibility of viewing and manipulating all the jobs associated with their session by browsing the "Recent Tickets" tab. I this tab the users can view the status of their search (success, queued, running or failed) and save, delete or resubmit jobs <ref>{{cite web|title=VEP jobs|url=http://asia.ensembl.org/info/docs/tools/vep/online/input.html#ident|website=ensembl.org|publisher=Ensembl|accessdate=11 September 2014}}</ref>. |
||
The second option to use VEP is by downloading the source code for its use in UNIX environments <ref>{{cite web|title=VEP script download|url=http://asia.ensembl.org/info/docs/tools/vep/script/vep_download.html#windows|website=ensembl.org|publisher=Ensembl|accessdate=11 September 2014}}</ref>. All the features are equal between the online and script versions. |
The second option to use VEP is by downloading the source code for its use in UNIX environments <ref>{{cite web|title=VEP script download|url=http://asia.ensembl.org/info/docs/tools/vep/script/vep_download.html#windows|website=ensembl.org|publisher=Ensembl|accessdate=11 September 2014}}</ref>. All the features are equal between the online and script versions. VEP can also be used with online instances like Galaxy. |
||
When a VEP job is completed the output is a tabular file that contains the following columns <ref>{{cite web|title=VEP Output|url=http://asia.ensembl.org/info/docs/tools/vep/script/vep_download.html#windows|website=ensembl.org|publisher=Ensembl Genomes|accessdate=11 September 2014}}</ref>: |
When a VEP job is completed the output is a tabular file that contains the following columns <ref>{{cite web|title=VEP Output|url=http://asia.ensembl.org/info/docs/tools/vep/script/vep_download.html#windows|website=ensembl.org|publisher=Ensembl Genomes|accessdate=11 September 2014}}</ref>: |
Revision as of 08:53, 11 September 2014
Content | |
---|---|
Description | An integrative resource for genome-scale data from non-vertebrate species. |
Data types captured | Genomic database |
Organisms | pan |
Contact | |
Research center | European Bioinformatics Institute |
Primary citation | Kersey & al. (2012)[1] |
Release date | 2009 |
Access | |
Website | http://ensemblgenomes.org/ |
Miscellaneous | |
Data release frequency | 4/5 times per year |
Version | Release 23 (July 2013) |
Ensembl Genomes is a scientific project to provide genome-scale data from non-vertebrate species .[1] The project is run by the European Bioinformatics Institute, and was launched in 2009 using the Ensembl technology .[2] The main objective of the Ensembl Genomes database is to complement the main Ensembl database by introducing five additional web pages to include genome data for bacteria, fungi, invertebrate metazoa, plants, and protists.[3] For each of the domains, the Ensembl tools are available for manipulation, analysis and visualization of genome data.
Ensembl Genomes is an open project, and most of the code, tools, and data are available to the public[4]. Ensembl and Ensembl Genomes software uses a "permissive Apachestyle open-source"[5] license, making it free for all users.
Using Ensembl Genomes
Ensembl Genome allows to find specific gene or DNA sequence data or whole karyotypes. Genes can be accessed by name, gene symbol, accession number or Variant ID. Genome search by species is also possible using the corresponding binomial nomenclature. Searching for a particular species using Ensembl Genomes redirects to a new page in which all the tools and information available for the species are shown. In general, each result page contains the following sections:
- Genome Assembly
- Comparative Genomics
- Gene annotation
- Variation
A karyotype is available for some species in Ensembl Genomes.[6] If the karyotype is available there will be a link to it in the Gene Assembly section of the species page. Alternatively if users are in the ‘Location’ tab they can also view the karyotype by selecting ‘Whole genome’ in the left-hand menu. Users can click on a location within the karyotype to zoom in to one specific chromosome or a genomic region.[6] This will open the ‘Location’ Tab.
In the 'Location' tab, users can browse genes, variations, sequence conservation, and other types of annotation along the genome.[7] The 'Region in detail' is highly configurable and scalable, and users can choose what they want to see by clicking on the 'Configure this page' button at the bottom of the left-hand menu. By adding and removing tracks users will be able to select the type of data they want to have included in the displays.[7] Data from the following categories can be easily added or removed from this 'Location' tab view: 'Sequence and assembly', 'Genes and transcripts', 'mRNA and protein alignments', 'Other DNA alignments', 'Germline variation', 'Comparative genomics', among others.[7] Users can also change the display options such as the width.[7] A further option allows users to reset the configuration back to the default settings.[7]
More specific information about a select gene can be found in the ‘Gene’ tab. Users can get to this page by searching for desired gene in the search bar and clicking on the gene ID or by clicking on one of the genes shown in the ‘Location’ tab view. The ‘Gene’ tab contains gene-specific information such as gene structure, number of transcripts, position on the chromosome and homology information.[1] This information can be accessed via the menu on the left-hand side.
A 'Transcript' tab will also appear when a user chooses to view a gene. The 'Transcript' tab contains much of the same information as the 'Gene' tab, however it is focused on only one transcript.[1]
Tools
Adding Custom tracks to Ensembl Genomes
Ensembl Genomes allows comparing and visualising user data while browsing karyotypes and genes. Most Ensembl Genomes views include an ‘Add your data’ or ‘Manage your data’ button that will allow the user to upload new tracks containing reads or sequences to Ensembl Genomes or to modify data that has been previously uploaded.[8] The uploaded data can be visualised in region views or over the whole karyotype. The uploaded data can be localised using Chromosome Coordinates or BAC Clone Coordinates.[9] The following methods can be used to upload a data file to any Ensembl Genomes page:[10]
- Files smaller than 5 MB can be either uploaded directly from any computer or from a web location (URL) to the Ensembl servers.
- Lager files can only be uploaded from web locations (URL).
- BAM files can only be uploaded using the URL-based approach. The index file (.bam.bai) should be located in the same webserver.
- A Distributed Annotation System source can be attached from web locations.
The following file types are supported by Ensembl Genomes:[11]
- BED
- BedGraph
- Generic
- GFF/GTF
- PSL
- WIG
- BAM
- BigBed
- BigWig
- VCF
The data is uploaded temporarily into the servers. Registered users can log in and save their data for future reference. It is possible to share and access the uploaded data using and an assigned URL[12]. Users are also allowed to delete their custom tracks from Ensembl Genomes.
BioMart
BioMart is a programming free search engine incorporated in Ensembl and Ensembl Genomes (except for Ensembl Bacteria) for the purpose of mining and extracting genomic data from the Ensembl databases in table formats like HTML, TSV, CSV or XLS[13].
BLAST
A BLAST interface is provided to allow users to search for DNA or protein sequences against the Ensembl Genomes. It can be accessed by the header, located on top of all Ensembl Genome pages, titled BLAST. The BLAST search can be configured to search against individual species or collections of species (maximum of 25). There is a taxonomic browser to allow the selection of taxonomically related species. [14]
Sequence Search
Ensembl Genomes provides a second sequence search tool, that uses an algorithm based on Exonerate, that is provided by European Nucleotide Archive. [15]This tool can be accessed by the header, located on top of all Ensembl Genome pages, titled Sequence Search. Users can then choose whether they would like Exonerate to search against all species in the Ensembl Genomes division or against all species in Ensembl Genomes. They can also choose the 'Maximum E-value', which will limit the results that appear to those with E-values below the maximum. Finally users can choose to use an alternative search mode by selecting 'Use spliced query'.
Variant Effect Predictor
The Variant Effect Predictor is one of the most used tools in Ensembl and Ensembl Genomes. It allows to explore and analyse what is the effect that the variants (SNPs, CNVs, indels or structural variations) have on a particular gene, sequence, protein, transcript or transcription factor[16]. To use VEP, the users must input the location of their variants and the nucleotide variations to generate the following results[17]:
- Genes and transcripts affected by the variant
- Location of the variants
- How the variant affects the protein synthesis (e.g. generating a stop codon)
- Comparison with other databases to find equal known variants
There are two ways in which the users can access the VEP. The first form is online-based. In this page, the user generates an input by selection the following parameters[18]:
- Species to be compared. The default database for comparison is Ensembl Transcripts, but for some species, other sources can be selected.
- Name for the uploaded data (this is optional, but it will make easier to identify the data if many VEP jobs have been performed)
- Selection of the input format for the data. If an incorrect file format is selected, VEP will throw an error when running.
- Fields for data upload. Users can upload data from their computers, from an URL-based location or by copying directly their contents into a text box.
Data upload to VEP supports VCF, pileup, HGVS notations and a default format[19]. The default format is a whitespace-separated file that contains the data in columns. The first five columns indicate the chromosome, start location, end location, allele (pair of alleles separated by a '/', with the reference allele first) and the strand (+ for forward or – for reverse)[20]. The sixth column is a variation identifier and it is optional. If it is left in blank, VEP will assign an identifier to in output file.
VEP also provides additional identifier options to the users, extra options to complement the output and filtering[21]. The filtering options allow features like removal of known variants from results, returning variants in exons only, and restriction of results to specific consequences of the variants[22].
VEP users also have the possibility of viewing and manipulating all the jobs associated with their session by browsing the "Recent Tickets" tab. I this tab the users can view the status of their search (success, queued, running or failed) and save, delete or resubmit jobs [23].
The second option to use VEP is by downloading the source code for its use in UNIX environments [24]. All the features are equal between the online and script versions. VEP can also be used with online instances like Galaxy.
When a VEP job is completed the output is a tabular file that contains the following columns [25]:
- Uploaded variation - as chromosome_start_alleles
- Location - in standard coordinate format (chr:start or chr:start-end)
- Allele - the variant allele used to calculate the consequence
- Gene - Ensembl stable ID of affected gene
- Feature - Ensembl stable ID of feature
- Feature type - type of feature. Currently one of Transcript, RegulatoryFeature, MotifFeature.
- Consequence - consequence type of this variation
- Position in cDNA - relative position of base pair in cDNA sequence
- Position in CDS - relative position of base pair in coding sequence
- Position in protein - relative position of amino acid in protein
- Amino acid change - only given if the variation affects the protein-coding sequence
- Codon change - the alternative codons with the variant base in upper case
- Co-located variation - known identifier of existing variation
- Extra - this column contains extra information as key=value pairs separated by ";". Displays extra identifiers.
Other common output formats for VEP include JSON and VDF formats[26].
Current species
- The bacterial division of Ensembl now contains all bacterial genomes that have been completely sequenced, annotated and submitted to the International Nucleotide Sequence Database Collaboration (European Nucleotide Archive, GenBank and the DNA Database of Japan).[27] The current dataset contains 15,270 genomes.[28]
- Ensembl Fungi contains 52 genomes [29]
- Ensembl Metazoa contains 54 genomes [30]
- Ensembl Plants contains 38 genomes [31]
- Ensembl Protists contains 32 genomes [32]
Collaborations
Ensembl Genomes continuously expands the annotation data through collaboration with other organisations involved in genome annotation projects and research. The following organisations are collaborators of Ensembl Genomes:[33]
- AllBio
- Barley
- Culicoides sonorensis
- Gramene
- INFRAVEC
- Microme
- PomBase
- PhytoPath
- transPLANT
- Triticeae Genomics for Sustainable Agriculture
- VectorBase
- Wheat Rust Genomic Improvement
- WormBase
- WormBase ParaSite
See also
References
- ^ a b c d Template:Cite PMID Cite error: The named reference "kersey" was defined multiple times with different content (see the help page).
- ^ Template:Cite PMID
- ^ "About Ensembl Genomes". http://ensemblgenomes.org/info/about. Ensembl. Retrieved 2 September 2014.
{{cite web}}
: External link in
(help)|website=
- ^ Kinsella, Rhoda J.; Kähäri, Andreas; Syed, Haider; Zamora, Jorge; Proctor, Glenn; Spudich, Giulietta; Almeida-King, Jeff; Staines, Daniel; Derwent, Paul; Kerhournou, Arnaud; Kersey, Paul; Flicek, Paul (2011). "Ensembl BioMarts: a hub for data retrieval across taxonomic space". Database. 2011 (2011): 2. Retrieved 3 September 2014.
- ^ Flicek, Paul; Amode, Ridwan; Barrnell, Daniel; Beal, Kathryn; Billis, Konstantinos; Brent, Simon; Carvalho-Silva, Denise; Clapham, Peter; Coates, Guy; Fitzgerald, Stephen; Gil, Laurent; García Girón, Carlos; Gordon, Leo; Hourlier, Thibaut; Hunt, Sarah; Johnson, Nathan; Juttemann, Thomas; Kähäri, Andreas; Keenan, Stephen; Kulesha, Eugene; Martin, Fergal; Maurel, Thomas; McLaren, William; Murphy, Daniel; Nag, Rishi; Overduin, Bert; Pignatelli, Miguel; Pritchard, Bethan; Pritchard, Emily; Riat, Harpreet; Ruffier, Magali; Sheppard, Daniel; Taylor, Kieron; Thormann, Anja; Trevanion, Stephen; Vullo, Alessandro; Wilder, Steven; Wilson, Mark; Zadissa, Amonida; Aken, Brownen; Birney, Ewan; Cunningham, Fiona; Harrow, Jennifer; Herrero, Javier; Hubbard, Tim; Kinsella, Rhoda; Muffato, Matthieu; Parker, Anne; Spudich, Giulietta; Yates, Andy; Zerbino, Daniel; Searle, Stephen (2014). "Ensembl 2014". Nucleic Acids Research. 42: 1. Retrieved 8 September 2014.
- ^ a b "Whole Genome". Ensembl Genomes. Retrieved 7 September 2014.
- ^ a b c d e "Frequently Asked Questions". Ensembl Genomes. Retrieved 7 September 2014.
- ^ "Uploading your data to Ensembl". Ensembl Genomes. Ensembl Genomes. Retrieved 9 September 2014.
- ^ "Coordinates for data location in Ensembl Genomes". Ensembl Genomes. Ensembl Genomes. Retrieved 9 September 2014.
- ^ "Methods for data upload". Ensembl Plants. Ensembl Genomes. Retrieved 9 September 2014.
- ^ "Supported data files". Ensembl Plants. Ensembl Genomes. Retrieved 9 September 2014.
- ^ "Saving and Sharing data in Ensembl Genomes". Ensembl Plants. Ensembl Genomes.
- ^ "Data Mining in Ensembl with Data Mining in Ensembl with BioMart" (PDF). Ensembl: 2. 2014. Retrieved 11 September 2014.
- ^ "Frequently Asked Questions". Ensembl Genomes. Retrieved 11 September 2014.
- ^ "Frequently Asked Questions". Ensembl Genomes. Retrieved 11 September 2014.
- ^ "Variant Effect Predictor". ensembl.org. Ensembl. Retrieved 11 September 2014.
- ^ "Variant Effect Predictor results overview". ensembl.org. Ensembl. Retrieved 11 September 2014.
- ^ "Data input to VEP". ensembl.org. Ensembl. Retrieved 11 September 2014.
- ^ "VEP supported file formats". ensembl.org. Ensembl. Retrieved 11 September 2014.
- ^ "VEP default file". ensembl.org. Ensembl. Retrieved 11 September 2014.
- ^ "VEP options and extras". ensembl.org. Ensembl. Retrieved 11 September 2014.
- ^ "VEP filtering". ensembl.org. Ensembl. Retrieved 11 September 2014.
- ^ "VEP jobs". ensembl.org. Ensembl. Retrieved 11 September 2014.
- ^ "VEP script download". ensembl.org. Ensembl. Retrieved 11 September 2014.
- ^ "VEP Output". ensembl.org. Ensembl Genomes. Retrieved 11 September 2014.
- ^ "VEP Output formats". ensembl.org. Ensembl Genomes. Retrieved 11 September 2014.
- ^ Template:Cite PMID
- ^ "Ensembl Bacteria". Ensembl Genomes. Retrieved 6 September 2014.
- ^ "Ensembl Fungi Species". Ensembl Genomes. Retrieved 6 September 2014.
- ^ "Ensembl Metazoa Species". Ensembl Genomes. Retrieved 6 September 2014.
- ^ "Ensembl Plants Species". Ensembl Genomes. Retrieved 6 September 2014.
- ^ "Ensembl Protists Species". Ensembl Genomes. Retrieved 6 September 2014.
- ^ "Collaborators - Ensembl Genomes". http://ensemblgenomes.org/info/about/collaborations. Ensembl Genomes. Retrieved 3 September 2014.
{{cite web}}
: External link in
(help)|website=