Talk:BLOSUM: Difference between revisions

Content deleted Content added

Inline

Revision as of 03:40, 4 June 2008

BLOSUM62: more or less than 62% identity?

"The Henikoffs took a big database of trusted alignments (their BLOCKS database), and (in effect) only counted pairwise sequence alignments related by less than some threshold percentage identity. A threshold of 62% identity or less resulted in the target frequencies for the BLOSUM62 matrix. An 80% threshold gave the more highly conserved target frequencies of the BLOSUM80 matrix, and a 45% threshold gave the more divergent BLOSUM45 matrix."

Source: Sean R. Eddy, Where did the BLOSUM62 alignment score matrix come from? Nature Biotechnology 22, 1035--1036 (2004) doi:10.1038/nbt0804-1035

http://www.nature.com/nbt/journal/v22/n8/full/nbt0804-1035.html

"In order to avoid over-weighting closely-related sequences, the Henikoffs replaced groups of proteins that have sequence identities higher than a threshold by either a single representative or a weighted average. The threshold of 62% produces the commonly used BLOSUM62 substitution matrix."

Source: Arthur M. Lesk, Introduction to Bioinformatics Oxford University Press, 2002, p.175

Winterschlaefer 15:52, 14 February 2007 (UTC)[reply]

For what I know a BLOSUM62 matrix is good for alignements which have 62% or MORE identity XApple 00:32, 25 February 2007 (UTC)[reply]

I agree with Winterschlaefer. For the BLOSUM62, the Henikoffs weighted all the sequences with similarity 62% or more as one single sequence, thus contributing less to the matrix. As the paper reads,

"To reduce multiple contributions to amino acid pair frequencies from the most closely related members of a family, sequences are clustered within blocks and each cluster is weighted as a single sequence in counting pairs. This is done by specifying a clustering percentage in which sequence segments that are identical for at least that percentage of amino acids are grouped together."

Also, as I can read in the history of this article, the following statement used to be part of the references section: "BLOSUM62 is for sequences of 62% OR GREATER sequence identity, not less than 62% (Voet, D., Voet,J., 2005)" and this may well be what Voet & Voet claim. However, this is different from the following statement, which is now referenced with Voet & Voet: "BLOSUM62 is the matrix calculated by using the observed substitutions between proteins which have 62% or more". What I'm saying is that this reference does not support this claim. The BLOSUM62 matrix actually is calculated (primarily) from sequences which have 62% and less sequence identity. Still, IMHO, BLOSUM62 is designed for sequences with similarities around 62%, not more. If I'ld want to compare sequences with a similarity of 80%, I'ld choose BLOSUM80.

Source: Henikoff & Henikoff Amino acid substitution matrices from protein blocks PNAS 89, pp. 10915-10919 134.34.4.5 21:09, 28 May 2007 (UTC)[reply]

It is definitely the case that the BLOSUM62 is based only on sequences that have 62% or more identity while the BLOSUM80 is based on sequences with 80% or more identity. Which one you use is up to your personal taste but as far as I know you would use a BLOSUM that is around your sequence identity where I agree with the speaker above. The error was fixed here. Greetings--hroest 03:39, 4 June 2008 (UTC)[reply]

Illustration

This badly needs a picture of a typical Blosum matrix XApple 14:52, 12 February 2007 (UTC)[reply]

It did get one. --hroest 05:50, 7 March 2008 (UTC)[reply]

Revision as of 03:39, 4 June 2008 edit Hannes Röst (talk \| contribs) Extended confirmed users 4,409 edits →BLOSUM62: more or less than 62% identity? ← Previous edit		Revision as of 03:40, 4 June 2008 edit undo Hannes Röst (talk \| contribs) Extended confirmed users 4,409 edits →BLOSUM62: more or less than 62% identity? Next edit →
Line 48:		Line 48:


	It is definitely the case that the BLOSUM62 is based only on sequences that have 62% or more identity while the BLOSUM80 is based on sequences with 80% or more identity. Which one you use is up to your personal taste but as far as I know you would use a BLOSUM that is around your sequence identity where I agree with the speaker above. The error was fixed [http://en.wikipedia.org/enwiki/w/index.php?title=BLOSUM&diff=215796110&oldid=215789952 here]. Greetings--[[user:Hannes Röst\|hroest]] 03:39, 4 June 2008 (UTC)		:It is definitely the case that the BLOSUM62 is based only on sequences that have 62% or more identity while the BLOSUM80 is based on sequences with 80% or more identity. Which one you use is up to your personal taste but as far as I know you would use a BLOSUM that is around your sequence identity where I agree with the speaker above. The error was fixed [http://en.wikipedia.org/enwiki/w/index.php?title=BLOSUM&diff=215796110&oldid=215789952 here]. Greetings--[[user:Hannes Röst\|hroest]] 03:39, 4 June 2008 (UTC)

	== Illustration ==		== Illustration ==