Jump to content

Talk:BLOSUM: Difference between revisions

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Content deleted Content added
Line 33: Line 33:


"To reduce multiple contributions to amino acid pair frequencies from the most closely related members of a family, sequences are clustered within blocks and each cluster is weighted as a single sequence in counting pairs. This is done by specifying a clustering percentage in which sequence segments that are identical for at least that percentage of amino acids are grouped together."
"To reduce multiple contributions to amino acid pair frequencies from the most closely related members of a family, sequences are clustered within blocks and each cluster is weighted as a single sequence in counting pairs. This is done by specifying a clustering percentage in which sequence segments that are identical for at least that percentage of amino acids are grouped together."

Also, as I can read in the history of this article, the following statement used to be part of the references section:

"BLOSUM62 is for sequences of 62% OR GREATER sequence identity, not less than 62% (Voet, D., Voet,J., 2005)"

and this may well be what Voet & Voet claim. However, this is different from the following statement, which is now referenced with Voet & Voet:

"BLOSUM62 is the matrix calculated by using the observed substitutions between proteins which have 62% or more"

The BLOSUM62 matrix actually is calculated (primarily) from sequences which have 62% and less sequence identity. Still, IMHO, BLOSUM62 is designed for sequences with similarities around 62%. If I'ld want to compare sequences with a similarity of 80%, I'ld choose BLOSUM80.


Source:
Source:

Revision as of 21:30, 28 May 2007

Template:Wikiproject MCB

BLOSUM62: more or less than 62% identity?

"The Henikoffs took a big database of trusted alignments (their BLOCKS database), and (in effect) only counted pairwise sequence alignments related by less than some threshold percentage identity. A threshold of 62% identity or less resulted in the target frequencies for the BLOSUM62 matrix. An 80% threshold gave the more highly conserved target frequencies of the BLOSUM80 matrix, and a 45% threshold gave the more divergent BLOSUM45 matrix."

Source: Sean R. Eddy, Where did the BLOSUM62 alignment score matrix come from? Nature Biotechnology 22, 1035--1036 (2004) doi:10.1038/nbt0804-1035

http://www.nature.com/nbt/journal/v22/n8/full/nbt0804-1035.html


"In order to avoid over-weighting closely-related sequences, the Henikoffs replaced groups of proteins that have sequence identities higher than a threshold by either a single representative or a weighted average. The threshold of 62% produces the commonly used BLOSUM62 substitution matrix."

Source: Arthur M. Lesk, Introduction to Bioinformatics Oxford University Press, 2002, p.175

Winterschlaefer 15:52, 14 February 2007 (UTC)[reply]


For what I know a BLOSUM62 matrix is good for alignements which have 62% or MORE identity XApple 00:32, 25 February 2007 (UTC)[reply]


I agree with Winterschlaefer. For the BLOSUM62, the Henikoffs weighted all the sequences with similarity 62% or more as one single sequence, thus contributing less to the matrix. As the paper reads,

"To reduce multiple contributions to amino acid pair frequencies from the most closely related members of a family, sequences are clustered within blocks and each cluster is weighted as a single sequence in counting pairs. This is done by specifying a clustering percentage in which sequence segments that are identical for at least that percentage of amino acids are grouped together."

Also, as I can read in the history of this article, the following statement used to be part of the references section:

"BLOSUM62 is for sequences of 62% OR GREATER sequence identity, not less than 62% (Voet, D., Voet,J., 2005)"

and this may well be what Voet & Voet claim. However, this is different from the following statement, which is now referenced with Voet & Voet:

"BLOSUM62 is the matrix calculated by using the observed substitutions between proteins which have 62% or more"

The BLOSUM62 matrix actually is calculated (primarily) from sequences which have 62% and less sequence identity. Still, IMHO, BLOSUM62 is designed for sequences with similarities around 62%. If I'ld want to compare sequences with a similarity of 80%, I'ld choose BLOSUM80.

Source: Henikoff & Henikoff Amino acid substitution matrices from protein blocks PNAS 89, pp. 10915-10919 134.34.4.5 21:09, 28 May 2007 (UTC)[reply]

Illustration

This badly needs a picture of a typical Blosum matrix XApple 14:52, 12 February 2007 (UTC)[reply]