Talk:Entrez
This article is rated Start-class on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | |||||||||||||||||||||||||||||||||||||||||||||||
|
Untitled
I'm really frustrated by the quality of the documentation here http://www.ncbi.nlm.nih.gov/entrez/query/enwiki/static/eutils_help.html Specifically I'd like to find a table of allowed retmode and rettype values broken down by database. I'm sure there was a table like this somewhere on the help page, but I can't find it. --Dan|(talk) 17:51, 28 January 2009 (UTC)
eUtils
I found the following information on eUtils invaluable (but hidden) [1]. I'll try to pear it down and add it to the article at some point. --Dan|(talk) 09:05, 29 January 2009 (UTC)
EFetch for Sequence and other Molecular Biology Databases
EFetch documenation is also available for the Literature, and Taxonomy databases.
EFetch: Retrieves records in the requested format from a list of one or more unique identifiers.
Base URL: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?
URL parameters
(NOTE:Utility parameters may be case sensitive. Use lower case characters for all parameters except WebEnv.)
- Database (db)
- Web Environment (WebEnv)
- Query key (query_key)
- Tool (tool)
- E-mail address (email)
- Record identifier (id)
- Display Numbers (retstart retmax)
- Parameters specific to sequence databases
- Retrieval mode or output format
- Retrieval type
Database (db)
- db=
Current database values:
- gene
- genome
- nucleotide
- nuccore
- nucest
- nucgss
- protein
- popset
- snp
- sequences (composite name, including nucleotide, protein, popset and genome)
Web Environment
History link value previously returned in XML results from ESearch and used with EFetch in place of primary ID result list.
- WebEnv=WgHmIcDG], etc.
Query_key
The value used for a history search number or previously returned in XML results from Esearch or EPost.
- query_key=6
Note: WebEnv is similar to the cookie that is set on a user's computers when accessing PubMed on the web. If the parameter usehistory=y is included in an ESearch URL both a WebEnv (cookie string) and query_key (history number) values will be returned in the results. Rather then using the retrieved PMIDs in an ESummary URL you may simply use the WebEnv and query_key values to retrieve the records. WebEnv will change for each ESearch query.
Tool
A string with no internal spaces that identifies the resource which is using Entrez links (e.g., tool=flybase). This argument is used to help NCBI provide better service to third parties generating Entrez queries from programs. As with any query system, it is sometimes possible to ask the same question different ways, with different effects on performance. NCBI requests that developers sending batch requests include a constant 'tool' argument for all requests using the utilities.
- tool=
E-mail Address
If you choose to provide an email address, we will use it to contact you if there are problems with your queries or if we are changing software interfaces that might specifically affect your requests. If you choose not to include an email address we cannot provide specific help to you, but you can still sign up for utilities-announce to receive general announcements.
- email=
Sequence Databases
You got to love this document.
Record Identifier
IDs required if WebEnv is not used.
- id=123,U12345,U12345.1,gb|U12345|
Current values:
- NCBI sequence number (GI)
- accession
- accession.version
- fasta
- GeneID
- genome ID
- seqid
Display Numbers
- retstart=x (x= sequential number of the first id retrieved - default=0 which will retrieve the first record)
- retmax=y (y= number of items retrieved)
Parameters specific to sequence databases
Sequence Strand, Start, Stop and Complexity Parameters
strand= | what strand of DNA to show (1=plus or 2=minus) |
seq_start+ | show sequence starting from this base number |
seq_stop= | show sequence ending on this base number |
complexity= | gi is often a part of a biological blob, containing other gis |
Complexity regulates the display:
- 0 - get the whole blob
- 1 - get the bioseq for gi of interest (default in Entrez)
- 2 - get the minimal bioseq-set containing the gi of interest
- 3 - get the minimal nuc-prot containing the gi of interest
- 4 - get the minimal pub-set containing the gi of interest
Retrieval Mode
- retmode=output format
Current values:
- xml
- html
- text
- asn.1
Retrieval Type
- rettype=output types based on database
Current values and descriptions: Type descriptions:
rettype
|
scope
|
Descriptions
|
native (full record) | all but Gene | Default format for viewing sequences. |
fasta | sequence only | FASTA view of a sequence. |
gb | nucleotide sequence only | GenBank view for sequences, constructed sequences will be shown as contigs (by pointing to its parts). |
gbc | nucleotide sequence only | INSDSeq structured flat file. |
gbwithparts | nucleotide sequence only | GenBank view for sequences, the sequence will always be shown. |
est | dbEST sequence only | EST Report. |
gss | dbGSS sequence only | GSS Report |
gp | protein sequence only | GenPept view |
gpc | protein sequence only | INSDSeq structured flat file. |
seqid | sequence only | To convert list of gis into list of seqids. |
acc | sequence only | To convert list of gis into list of accessions |
chr | dbSNP only | SNP Chromosome Report. |
flt | dbSNP only | SNP Flat File report. |
rsr | dbSNP only | SNP RS Cluster report. |
brief | dbSNP only | SNP ID list. |
docset | dbSNP only | SNP RS summary. |
Not all Retrieval Modes are possible with all Retrieval Types.
Sequence Options
native | fasta | gb | gbwithparts | est | gss | gp | seqid | acc | gbc | gpc | |
xml | x | x* | n/a | n/a | TBI | TBI | n/a | TBI | TBI | x | x |
text | x | x | x* | x* | x* | x* | x* | x | x | n/a | n/a |
html | x | x | x* | x* | x* | x* | x* | x | x | n/a | n/a |
asn.1 | x | n/a | n/a | n/a | n/a | n/a | n/a | x | n/a | n/a | n/a |
x = retrieval mode available * - existence of the mode depends on gi type TBI - to be implemented (not yet available) n/a - not available
Examples
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=5
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=5&rettype=gb
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=popset&id=12829836&rettype=gp
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=protein&id=8&rettype=gp
Entrez display format GBSeqXML: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=5&rettype=gb&retmode=xml http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=protein&id=8&rettype=gp&retmode=xml
Entrez display format TinySeqXML: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=5&rettype=fasta&retmode=xml
Entrez Gene, full display as xml: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=gene&id=2&retmode=xml
- Start-Class United States articles
- Low-importance United States articles
- Start-Class United States articles of Low-importance
- Start-Class National Institutes of Health articles
- WikiProject National Institutes of Health articles
- WikiProject United States articles
- Start-Class Academic Journal articles
- Journal articles needing infoboxes
- WikiProject Academic Journal articles
- Start-Class Molecular Biology articles
- Low-importance Molecular Biology articles
- Start-Class Computational Biology articles
- Mid-importance Computational Biology articles
- WikiProject Computational Biology articles
- All WikiProject Molecular Biology pages