|
|
Line 1: |
Line 1: |
| [[File:BLOSUM62.gif|thumb|400px|The BLOSUM62 matrix]]
| | Greetings. Let me begin by telling you the writer's name - Phebe. I am a meter reader but I strategy on changing it. Puerto Rico is where he's been living for many years and he will by no means transfer. Body developing is one of the things I love most.<br><br>My webpage :: [http://i4p.info/article.php?id=114136 at home std testing] |
| In [[bioinformatics]], the '''BLOSUM''' ('''BLO'''cks '''SU'''bstitution '''M'''atrix) matrix is a [[substitution matrix]] used for [[sequence alignment]] of [[protein]]s. BLOSUM matrices are used to score alignments between [[Evolutionary divergence|evolutionarily divergent]] protein sequences. They are based on local alignments. BLOSUM matrices were first introduced in a paper by Henikoff and Henikoff.<ref name=henikoff>{{cite journal| year=1992| journal=PNAS | volume=89 | pages=10915–10919| pmid=1438297 | title = Amino Acid Substitution Matrices from Protein Blocks | doi = 10.1073/pnas.89.22.10915 | author = Henikoff, S. | coauthors = Henikoff, J.G.| issue=22| pmc=50453}}</ref> They scanned the [[BLOCKS database]] for very [[Conserved sequence|conserved regions]] of protein families (that do not have gaps in the sequence alignment) and then counted the relative frequencies of [[amino acids]] and their substitution probabilities. Then, they calculated a [[odds ratio|log-odds]] score for each of the 210 possible substitution pairs of the 20 standard amino acids. All BLOSUM matrices are based on observed alignments; they are not extrapolated from comparisons of closely related proteins like the [[Point accepted mutation|PAM Matrices]].
| |
| | |
| Several sets of BLOSUM matrices exist using different alignment databases, named with numbers. BLOSUM matrices with high numbers are designed for comparing closely related sequences, while those with low numbers are designed for comparing distant related sequences. For example, BLOSUM80 is used for less divergent alignments, and BLOSUM45 is used for more divergent alignments. The matrices were created by merging (clustering) all sequences that were more similar than a given percentage into one single sequence and then comparing those sequences (that were all more divergent than the given percentage value) only; thus reducing the contribution of closely related sequences. The percentage used was appended to the name, giving BLOSUM80 for example where sequences that were more than 80% identical were clustered.
| |
| | |
| Scores within a BLOSUM are log-odds scores that measure, in an alignment, the logarithm for the ratio of the likelihood of two amino acids appearing with a biological sense and the likelihood of the same amino acids appearing by chance.<ref name=handbook>{{cite book | url=http://books.google.com/?id=kDFltuQo1dMC&pg=PA673&lpg=PA673&dq=blosum+matrix | title=Handbook of Nature-Inspired And Innovative Computing | isbn=0-387-40532-1 | author=Albert Y. Zomaya | year=2006 | publisher=Springer | location=New York, NY}}page 673</ref> The matrices are based on the minimum percentage identity of the aligned protein sequence used in calculating them.<ref name=handbook /> Every possible identity or substitution is assigned a score based on its observed frequences in the alignment of related proteins.<ref>[http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/Scoring2.html NIH "Scoring Systems"]</ref> A positive score is given to the more likely substitutions while a negative score is given to the less likely substitutions.
| |
| | |
| To calculate a BLOSUM matrix, the following equation is used:
| |
| :<math>S_{ij}= \left( \frac{1}{\lambda} \right)\log{\left( \frac{p_{ij}}{q_i * q_j} \right)}</math> | |
| | |
| Here, <math>p_{ij}</math> is the probability of two amino acids <math>i</math> and <math>j</math> replacing each other in a homologous sequence, and <math>q_i</math> and <math>q_j</math> are the background probabilities of finding the amino acids <math>i</math> and <math>j</math> in any protein sequence. The factor <math>\lambda</math> is a scaling factor, set such that the matrix contains easily computable integer values.
| |
| | |
| An article in [[Nature Biotechnology]]<ref name=article>{{cite journal| year=2008| journal=Nat. Biotech. | volume=26 | pages=274–275| title = BLOSUM62 miscalculations improve search performance | doi = 10.1038/nbt0308-274 | url=http://www.nature.com/nbt/journal/v26/n3/full/nbt0308-274.html | author = Mark P Styczynski | coauthors = Kyle L Jensen, Isidore Rigoutsos, Gregory Stephanopoulos| pmid=18327232| issue=3 }}</ref> revealed that the BLOSUM62 used for so many years as a standard is not exactly accurate according to the algorithm described by Henikoff and Henikoff.<ref name=henikoff /> Surprisingly, the miscalculated BLOSUM62 improves search performance.<ref name=article />
| |
| | |
| ==See also==
| |
| * [[Sequence alignment]]
| |
| * [[Point accepted mutation]]
| |
| | |
| == References ==
| |
| {{reflist}}
| |
| | |
| == External links ==
| |
| * {{cite journal |journal=Nature Biotechnology | title=Where did the BLOSUM62 alignment score matrix come from? | author=Sean R. Eddy | doi=10.1038/nbt0804-1035 | pmid=15286655 | year=2004 | volume=22 | pages=1035 | issue=8}}
| |
| * [http://blocks.fhcrc.org/ BLOCKS WWW server]
| |
| * [http://www.ncbi.nlm.nih.gov/BLAST/tutorial/Altschul-1.html Scoring systems for BLAST at NCBI]
| |
| * [ftp://ftp.ncbi.nih.gov/blast/matrices/ Data files of BLOSUM on the NCBI FTP server].
| |
| * [http://ahmetrasit.com/blosum/ Interactive BLOSUM Network Visualization]
| |
| | |
| [[Category:Genetics]]
| |
| [[Category:Bioinformatics]]
| |
| [[Category:Biochemistry methods]]
| |
| [[Category:Computational phylogenetics]]
| |
| [[Category:Matrices]]
| |
| | |
| [[ko:블로섬]]
| |
Greetings. Let me begin by telling you the writer's name - Phebe. I am a meter reader but I strategy on changing it. Puerto Rico is where he's been living for many years and he will by no means transfer. Body developing is one of the things I love most.
My webpage :: at home std testing