One-repetition maximum: Difference between revisions

From formulasearchengine
Jump to navigation Jump to search
en>Ginsuloft
m Reverted edits by 98.244.115.209 (talk) to last version by 59.167.121.144
No edit summary
 
(One intermediate revision by one other user not shown)
Line 1: Line 1:
[[File:Point mutations-en.png|thumb|An example of point mutations at an amino acid site coding for [[lysine]]. The [[missense mutations]] may be classed as point accepted mutations if the mutated protein is not rejected by natural selection.]]
Greetings! I am Myrtle Shroyer. For many years I've been operating as a payroll clerk. For a while she's been in South Dakota. One of the things he enjoys most is ice skating but he is having difficulties to find time for it.<br><br>Here is my site - [http://203.250.78.160/zbxe/?document_srl=810243 over the counter std test]
 
A '''point accepted mutation''' — also known as a PAM — is the replacement of a single [[amino acid]] in the [[Protein primary structure|primary structure]] of a [[protein]] with another single amino acid, which is accepted by the processes of [[natural selection]]. This definition does not include all [[point mutations]] in the [[DNA]] of an organism. In particular, [[silent mutations]] are not point accepted mutations, nor are mutations which are lethal or which are rejected by natural selection in other ways.
 
A PAM matrix is a [[Matrix (mathematics)|matrix]] where each column and row represents one of the twenty standard amino acids. In [[bioinformatics]], PAM matrices are regularly used as [[substitution matrix|substitution matrices]] to score [[sequence alignment]]s for proteins. Each entry in a PAM matrix indicates the likelihood of the amino acid of that row being replaced with the amino acid of that column through a series of one or more point accepted mutations during a specified evolutionary interval, rather than these two amino acids being aligned due to chance. Different PAM matrices correspond to different lengths of time in the evolution of the protein sequence.
 
==Biological Background==
The genetic instructions of every replicating [[Cell (biology)|cell]] in a living organism are contained within its DNA.<ref name=campbell-ch16/> Throughout the cell's lifetime, this information is transcribed and replicated by cellular mechanisms to produce proteins or to provide instructions for daughter cells during [[cell division]], and the possibility exists that the DNA may be altered during these processes.<ref name=campbell-ch16>{{Cite book| title=Biology: Australian Version| author =Campbell NA, Reece JB, Meyers N| coauthor= Urry LA, Cain ML, Wasserman SA, Minorsky PV, Jackson RB| contribution=The Molecular Basis of Inheritance| edition=8th| year=2009| pages=307–325| publisher=Pearson Education Australia| isbn=9781442502215 }}</ref><ref name=campbell-ch17>{{Cite book| title=Biology: Australian Version| author =Campbell NA, Reece JB, Meyers N| coauthor= Urry LA, Cain ML, Wasserman SA, Minorsky PV, Jackson RB| contribution=From Gene to Protein| edition=8th| year=2009| pages=327–350| publisher=Pearson Education Australia| isbn=9781442502215 }}</ref> This is known as a [[mutation]]. At the molecular level, there are regulatory systems that correct most — but not all — of these changes to the DNA before it is replicated.<ref name=campbell-ch17/><ref name=pal>{{Cite book| title=Fundamentals of Molecular Biology| author =Pal JK, Ghaskadbi SS| contribution=DNA Damage, Repair and Recombination| edition=1st| year=2009| pages=187–203| publisher=Oxford University Press| isbn=9780195697810 }}</ref>
 
One of the possible mutations that occurs is the replacement of a single [[nucleotide]], known as a point mutation. If a point mutation occurs within an expressed region of a [[gene]], an [[exon]], then this will change the [[codon]] specifying a particular amino acid in the protein produced by that gene.<ref name=campbell-ch17/> Despite the redundancy in the genetic code, there is a possibility that this mutation will then change the amino acid that is produced during [[Translation (genetics)|translation]], and as a consequence the structure of the protein will be changed.
 
The functionality of a protein is highly dependent on its structure.<ref name=campbell-ch5>{{Cite book| title=Biology: Australian Version| author =Campbell NA, Reece JB, Meyers N| coauthor= Urry LA, Cain ML, Wasserman SA, Minorsky PV, Jackson RB| contribution=The Structure and Function of Large Biological Molecules| edition=8th| year=2009| pages=68–89| publisher=Pearson Education Australia| isbn=9781442502215 }}</ref> Changing a single amino acid in a protein may reduce its ability to carry out this function, or the mutation may even change the function that the protein carries out.<ref name=campbell-ch17/> Changes like these may severely impact a crucial function in a cell, potentially causing the cell — and in extreme cases, the organism — to die.<ref>{{cite journal|url=http://www.nature.com/scitable/topicpage/mendelian-ratios-and-lethal-genes-557 |title=Mendelian Ratios and Lethal Genes |first=Ingrid |last=Lobo |work=[[Nature (journal)|Nature]] |publisher=[[Nature Publishing Group]] |year=2008 |accessdate=19 October 2013}}</ref> Conversely, the change may allow the cell to continue functioning albeit differently, and the mutation can be passed on to the organism's offspring. If this change does not result in any significant physical disadvantage to the offspring, the possibility exists that this mutation will persist within the population. The possibility also exists that the change in function becomes advantageous. In either case, while being subjected to the processes of natural selection, the point mutation has been accepted into the genetic pool.
 
The 20 amino acids translated by the genetic code vary greatly by the physical and chemical properties of their side chains.<ref name=campbell-ch5/> However, these amino acids can be categorised into groups with similar physicochemical properties.<ref name=campbell-ch5/> Substituting an amino acid with another from the same category is more likely to have a smaller impact on the structure and function of a protein than replacement with an amino acid from a different category. Consequently, acceptance of point mutations depends heavily on the amino acid being replaced in the mutation, and the replacement amino acid. The PAM matrices are a mathematical tool that account for these varying rates of acceptance when evaluating the similarity of proteins during alignment.
 
==Terminology==
The term ''accepted point mutation'' was initially used to describe the mutation phenomenon. However, the acronym PAM was preferred over APM due to readability, and so the term ''point accepted mutation'' is used more regularly.<ref name=Pevsner-ch3>{{Cite book| title=Bioinformatics and Functional Genomics | edition = 2nd | pages=58–68 | year=2009 | contribution=Pairwise Sequence Alignment | author=Pevsner J | publisher=Wiley-Blackwell | isbn=978-0-470-08585-1}}</ref> Because the value <math>n</math> in the PAM<sub>n</sub> matrix represents the number of mutations per 100 amino acids, which can be likened to a percentage of mutations, the term ''percentage accepted mutation'' is sometimes used.
 
It is important to distinguish between point accepted mutations (PAMs), point accepted mutation matrices (PAM matrices) and the PAM<sub>n</sub> matrix. The term 'point accepted mutation' refers to the mutation event itself. However, 'PAM matrix' refers to one of a family of matrices which contain scores representing the likelihood of two amino acids being aligned due to a series of mutation events, rather than due to random chance. The 'PAM<sub>n</sub> matrix' is the PAM matrix corresponding to a time frame long enough for <math>n</math> mutation events to occur per 100 amino acids.
 
==Construction of PAM matrices==
 
PAM matrices were introduced by [[Margaret Dayhoff]] in 1978.<ref name = dayhoff1978>{{Cite book| title=Atlas of protein sequence and structure |edition = volume 5, supplement 3 | pages=345–358 | year=1978 |contribution= A model of Evolutionary Change in Proteins | author=Dayhoff, M.O., Schwartz, R. and Orcutt, B.C.| publisher=Nat. Biomed. Res. Found. | isbn =0-912466-07-3| postscript=<!--None-->}}</ref> The calculation of these matrices were based on 1572 observed mutations in the [[phylogenetic trees]] of 71 families of closely related proteins. The proteins to be studied were selected on the basis of having high similarity with their predecessors. The protein alignments included were required to display at least 85% identity.<ref name=Pevsner-ch3/><ref name=sung>{{Cite book| title=Algorithms in Bioinformatics: A Practical Introduction| pages=51–52 | year=2010 | author=Wing-Kin Sung | publisher=CRC Press | isbn=978-1-4200-7033-0}}</ref> As a result, it is reasonable to assume that any aligned mismatches were the result of a single mutation event, rather than several at the same location.
 
Each PAM matrix has twenty rows and twenty columns — one representing each of the twenty amino acids translated by the genetic code. The value in each cell of a PAM matrix is related to the probability of a column amino acid before the mutation being aligned with a row amino acid afterwards.<ref name=Pevsner-ch3/><ref name=dayhoff1978/><ref name=sung/> From this definition, PAM matrices are an example of a substitution matrix.
 
===Collection of data from phylogenetic trees===
For each branch in the phylogenetic trees of the protein families, the number of mismatches that were observed were recorded and a record kept of the two amino acids involved.<ref name=dayhoff1978/> These counts were used as entries below the main diagonal of the matrix <math>A</math>. Since the vast majority of protein samples come from organisms that are alive today (extant species), the 'direction' of a mutation cannot be determined. That is, the amino acid present before the mutation cannot be distinguished from the amino acid that replaced it after the mutation. Because of this, the matrix <math>A</math> is assumed to be [[Symmetric matrix|symmetric]], and the entries of <math>A</math> above the main diagonal are computed on this basis. The entries along the diagonal of <math>A</math> do not correspond to mutations and can be left unfilled.
 
In addition to these counts, data on the mutability and the frequency of the amino acids was obtained.<ref name=Pevsner-ch3/><ref name=dayhoff1978/> The mutability of an amino acid is the ratio of the number of mutations it is involved in and the number of times it occurs in an alignment.<ref name=dayhoff1978/> Mutability measures how likely an amino acid is to mutate acceptably. [[Asparagine]], an amino acid with a small [[Chemical polarity|polar]] side chain, was found to be the most mutable of the amino acids.<ref name=dayhoff1978/> On the contrary, [[cysteine]] and [[tryptophan]] were found to be the least mutable amino acids.<ref name=dayhoff1978/> The side chains for cysteine and tryptophan are more unique: cysteine's side chain contains sulfur which participates in [[disulfide bonds]] with other cysteine molecules, and tryptophan's side chain is large and [[Aromaticity|aromatic]].<ref name=campbell-ch5/> Since there are several small polar amino acids, these extremes suggest that amino acids are more likely to acceptably mutate if their physical and chemical properties are more common among alternative amino acids.<ref name=Pevsner-ch3/><ref name=sung/>
 
===Construction of the mutation matrix===
 
For the <math>j</math>th amino acid, the values <math>m(j)</math> and <math>f(j)</math> are its mutability and frequency. The frequencies of the amino acids are normalised so that they sum to 1. If total number of occurrences of the <math>j</math>th amino acid is <math>n(j)</math>, and <math>N</math> is the total number of all amino acids, then
 
:<math>f(j) = \frac{n(j)}{N}</math>
 
Based on the definition of mutability as the ratio of mutations to occurrences of an amino acid
 
:<math>m(j) = \frac{\sum_{i=1, i\neq j}^{20}A(i,j)}{n(j)}</math>
or
:<math>\frac{1}{Nf(j)} = \frac{1}{n(j)} = \frac{m(j)}{\sum_{i=1, i\neq j}^{20}A(i,j)}</math>
 
The mutation matrix <math>M</math> is constructed so that the entry <math>M(i,j)</math> represents the probability of the <math>j</math>th amino acid mutating into the <math>i</math>th amino acid. The non-diagonal entries are computed by the equation<ref name=dayhoff1978/>
 
:<math>M(i,j) = \frac{\lambda m(j)A(i,j)}{\sum_{i=1, i\neq j}^{20}A(i,j)}</math>
 
:<math>M(i,j) = \lambda A(i,j)\frac{m(j)}{\sum_{i=1, i\neq j}^{20}A(i,j)} = \frac{\lambda A(i,j)}{Nf(j)}</math>
 
where <math>\lambda</math> is a constant of proportionality. However, this equation does not compute the diagonal entries. Each column in the matrix <math>M</math> lists each of the twenty possible outcomes for an amino acid — it can mutate into one of the 19 other amino acids, or remain unchanged. Since the non-diagonal entries list the probabilities of each of the 19 mutations are known, and the sum of the probabilities of these twenty outcomes must be 1, this last probability can be calculated by
 
:<math>M(j,j) = 1 - \sum_{i=1, i\neq j}^{20}M(i,j)</math>
 
which simplifies to<ref name=dayhoff1978/>
 
:<math>M(j,j) = 1 - \lambda m(j)</math>
 
:{| class="toccolours collapsible collapsed" width="60%" style="text-align:left"
!Calculation of the diagonal entries
|-
|
:<math>M(j,j) = 1 - \sum_{i=1, i\neq j}^{20}M(i,j)</math>
 
Substituting in the expression for the non-diagonal entries mutation matrix:
 
:<math>M(j,j) = 1 - \sum_{i=1, i\neq j}^{20} \frac{\lambda m(j)A(i,j)}{\sum_{i=1, i\neq j}^{20}A(i,j)}</math>
 
:<math>M(j,j) = 1 - \frac{ \sum_{i=1, i\neq j}^{20}\lambda m(j)A(i,j)}{\sum_{i=1, i\neq j}^{20}A(i,j)}</math>
 
Since the values of <math>\lambda</math> and <math>m(j)</math> are constants that don't change with the value of <math>i</math>
 
:<math>M(j,j) = 1 - \frac{\lambda m(j)\sum_{i=1, i\neq j}^{20}A(i,j)}{\sum_{i=1, i\neq j}^{20}A(i,j)}</math>
 
And thus cancellation reveals that
 
:<math>M(j,j) = 1 - \lambda m(j)</math>
|}
 
A result of particular significance is that for the non-diagonal entries
 
:<math>f(j) M(i,j) = \frac{\lambda}{N} A(i,j) = \frac{\lambda}{N} A(j,i) = f(i) M(j,i)</math>
 
Which means that for all entries in the mutation matrix
 
:<math>f(j) M(i,j) = f(i) M(j,i)</math>
 
====Choice of the constant of proportionality====
The probabilities contained in <math>M</math> vary as some unknown function of the amount of time that a protein sequence is allowed to mutate for. Instead of attempting to determine this relationship, the values of <math>M</math> are calculated for a short time frame, and the matrices for longer periods of time are calculated by assuming mutations follow a [[Markov chain]] model.<ref name=Kosiol>{{cite journal | journal=Molecular biology and evolution. | volume=22 | issue=2 | pages=193–9 | year=2005  | author=Kosiol C, Goldman N. | title=Different versions of the Dayhoff rate matrix. | pmid=15483331 | doi = 10.1093/molbev/msi005 |url=http://mbe.oxfordjournals.org/cgi/content/full/22/2/193}}</ref><ref name=Lio>{{cite journal | journal=Genome Research. | volume=8 | issue=12 | pages=1233–44 | year=1998  | author=Liò P, Goldman N. | title=Models of molecular evolution and phylogeny. | pmid=9872979  | doi = 10.1101/gr.8.12.1233 | url=http://pixfunlobdot.59.to/content/8/12/1233.full }}</ref> The base unit of time for the PAM matrices is the time required for 1 mutation to occur per 100 amino acids, sometimes called 'a PAM unit' or 'a PAM' of time.<ref name=Pevsner-ch3/> This is precisely the duration of mutation assumed by the PAM<sub>1</sub> matrix.
 
The constant <math>\lambda</math> is used to control the proportion of amino acids that are unchanged. By using only alignments of proteins that had at least 85% similarity, it could be reasonably assumed that the mutations observed were direct, without any intermediate states. This means that scaling down these counts by a common factor would provide an accurate estimate of the mutation counts had the similarity been closer to 100%. It also means that the number of mutations per 100 amino acids, the <math>n</math> in PAM<sub>n</sub> is equal to the number of mutated amino acids per 100 amino acids.
 
To find the mutation matrix for the PAM<sub>1</sub> matrix, the requirement that 99% of the amino acids in a sequence are conserved is imposed. The quantity <math>n(j)M(j,j)</math> is equal to the number of conserved amino acid <math>j</math> units, and so the total number of conserved amino acids is
 
:<math>\sum_{j=1}^{20}n(j)M(j,j) = \sum_{j=1}^{20}n(j) - \lambda \sum_{j=1}^{20}n(j)m(j) = N - N\lambda \sum_{j=1}^{20}f(j)m(j)</math>
 
The value of <math>\lambda</math> needed to be pick to produce 99% identity after mutation is then given by the equation
 
:<math>0.99 = 1 - \lambda\sum_{j=1}^{20}f(j)m(j)</math>
 
This <math>\lambda</math> value can then be used in the mutation matrix for the PAM<sub>1</sub> matrix.
 
===Construction of the PAM<sub>n</sub> matrices===
 
The Markov chain model of protein mutation relates the mutation matrix for PAM<sub>n</sub>, <math>M_{n}</math>, to the mutation matrix for the PAM<sub>1</sub> matrix, <math>M_{1}</math> by the simple relationship
 
:<math>M_{n} = M_{1}^{n}</math>
 
The PAM<sub>n</sub> matrix is constructed from the ratio of the probability of point accepted mutations replacing the <math>j</math>th amino acid  with the <math>i</math>th amino acid, to the probability of these amino acids being aligned by chance.
The entries of the PAM<sub>n</sub> matrix are given by the equation<ref name=Gusfield>{{cite book|last=Gusfield|first=Dan|title=Algorithms on String, Trees, and Sequences -Computer Science and Computational Biology|pages=383–384|publisher=Cambridge University Press | year=1997 |isbn=0521585198}}</ref><ref name=Boecken>{{cite book|last=Boeckenhauer|first=Hans-Joachim|coauthors=Dirk Bongartz|title=Algorithmic Aspects of Bioinformatics|pages=94–96|publisher=Springer | year=2010 | isbn=3642091008}}</ref>
:<math>\text{PAM}_n(i,j) = log \frac{f(j)M_{n}(i,j)}{f(i)f(j)} = log \frac{f(j)M^n(i,j)}{f(i)f(j)} = log \frac{M^n(i,j)}{f(i)}</math>
 
Note that in Gusfield's book, the entries <math>M(i,j)</math> and <math>\text{PAM}_n(i,j)</math> are related to the probability of the <math>i</math>th amino acid mutating into the <math>j</math>th amino acid.<ref name=Gusfield/> This is the origin of the different equation for the entries of the PAM matrices.
 
When using the PAM<sub>n</sub> matrix to score an alignment of two proteins, the following assumption is made:
::''If these two proteins are related, the evolutionary interval separating them is the time taken for <math>n</math> point accepted mutations to occur per 100 amino acids.''
When the alignment of the <math>i</math>th and <math>j</math>th amino acids is considered, the score indicates the relative likelihoods of the alignment due to the proteins being related or due to random chance.
* If the proteins are related, a series of point accepted mutations must have occurred to mutate the original amino acid into its replacement. Suppose the <math>j</math>th amino acid is the original. Based on the abundance of amino acids in proteins, the probability of the <math>j</math>th amino acid being the original is <math>f(j)</math>. Given any particular unit of this amino acid, the [[Conditional probability|probability]] of being replaced by the <math>i</math>th amino acid in the assumed time interval is <math>M_n(i,j)</math>. Thus, the probability of the alignment is <math>f(j)M_n(i,j)</math>, the numerator within the logarithm.
* If the proteins are not related, the events that the two aligned amino acids are the <math>i</math>th and <math>j</math>th amino acids must be [[Independence (probability theory)|independent]].The probabilities of these events are <math>f(i)</math> and <math>f(j)</math>, which means the probability of the alignment is <math>f(i)f(j)</math>, the denominator of the logarithm.
* Thus, the logarithm in the equation results in a positive entry if the alignment is more likely due to point accepted mutations, and a negative entry if the alignment is more likely due to chance.
 
==Properties of the PAM matrices==
 
===Symmetry of the PAM matrices===
 
While the mutation probability matrix <math>M</math> is not symmetric, each of the PAM matrices are.<ref name=Pevsner-ch3/><ref name=dayhoff1978/> This somewhat surprising property is a result of the relationship that was noted for the mutation probability matrix:
 
:<math> f(i)M(i,j) = f(j)M(j,i)</math>
 
In fact, this relationship holds for all positive integer powers of the matrix <math>M</math>:
 
:<math> f(i)M^n(i,j) = f(j)M^n(j,i)</math>
:{| class="toccolours collapsible collapsed" width="60%" style="text-align:left"
!Generalisation of property to postive integer matrix powers
|-
|
This generalisation can be proven using [[mathematical induction]]. Suppose that for a matrix <math>M</math>
 
:<math> f(i)M(i,j) = f(j)M(j,i)</math>
 
And that for a positive integer <math>k</math>
 
:<math> f(i)M^k(i,j) = f(j)M^k(j,i)</math>
 
By expansion of the [[Matrix multiplication|matrix product]] <math>M^{k+1}=M^k \cdot M</math>,
 
:<math> f(i)M^{k+1}(i,j) = f(i)\sum^{N}_{n=0} M^k(i,n)M(n,j)</math>
 
:<math> f(i)M^{k+1}(i,j) = \sum^{N}_{n=0} (f(i)M^k(i,n))M(n,j)</math>
 
Using the property we have assumed of the matrix <math>M^k</math>
 
:<math> f(i)M^{k+1}(i,j) = \sum^{N}_{n=0} (f(n)M^k(n,i))M(n,j)</math>
 
:<math> f(i)M^{k+1}(i,j) = \sum^{N}_{n=0}  M^k(n,i)(f(n)M(n,j))</math>
 
And using the property for the matrix <math>M</math>
 
:<math> f(i)M^{k+1}(i,j) = \sum^{N}_{n=0}  M^k(n,i)(f(j)M(j,n))</math>
 
:<math> f(i)M^{k+1}(i,j) = f(j) \sum^{N}_{n=0}  M(j,n)M^k(n,i)</math>
 
:<math> f(i)M^{k+1}(i,j) = f(j)M^{k+1}(j,i)</math>
 
In this case, it is only known at first that the result holds for <math>k=1</math>. However, the above argument shows that the property also holds for <math>k=2</math>. This new knowledge then shows that the property also holds for <math>k</math> and this repeats to show that the property holds for all positive integers <math>k</math>.
|}
 
As a result, the entries of the PAM<sub>n</sub> matrix are symmetric, since
 
:<math>\text{PAM}_n(i,j) = log \frac{f(i)M^n(i,j)}{f(i)f(j)} = log \frac{f(j)M^n(j,i)}{f(j)f(i)} = \text{PAM}_n(j,i) </math>
 
===Relating the number of mutated amino acids and the number of mutations===
The value <math>n</math> represents the number of mutations that occur per 100 amino acids, however this value is rarely accessible and often estimated. However, when comparing two proteins it is easy to calculate <math>m</math> insted, which is the number of mutated amino acids per 100 amino acids. Despite the random nature of mutation, these values can be approximately related by<ref name=Pevsner-ch7>{{Cite book| title=Bioinformatics and Functional Genomics | edition = 2nd | pages=221–227 | year=2009 | contribution=Molecular Phylogeny and Evolution | author=Pevsner J | publisher=Wiley-Blackwell | isbn=978-0-470-08585-1}}</ref>
 
:<math>\frac{m}{100} = 1 - e^{-\frac{n}{100}}</math>
 
:{| class="toccolours collapsible collapsed" width="60%" style="text-align:left"
!Derivation of relationship between <math>m</math> and <math>n</math>
|-
|
Mutations in the primary structure of a protein can occur anywhere along the sequence. If it is assumed the distribution of the mutations among amino acid positions is uniform, the problem is analogous to a distribution of "balls into bins", a common problem in [[combinatorics]]. In a case where <math>K</math> balls (i.e. mutations) are distributed amongst <math>N</math> bins (amino acid positions), the number of bins containing at least one ball, <math>M</math> has a distribution with a mean given by<ref>{{cite book| last=Motwani| first=Rajeev| coauthors=Prabhakar Raghavan| title=Randomized Algorithms| pages=94| publisher=Cambridge University Press| year=1995| isbn=0521474655| url=http://books.google.com.au/books?id=QKVY4mDivBEC&pg=PA94#v=onepage&q&f=false}}</ref>
 
:<math>E(M) = N - N(1 - \frac{1}{N})^{K}</math>
:<math>\frac{E(M)}{N} = 1 - (1 - \frac{1}{N})^{K}</math>
 
If the rate of mutation is <math>n</math> mutations per 100 amino acids, then
 
:<math>\frac{n}{100} = \frac{K}{N} </math>
 
And if there are <math>m</math> mutated amino acids per 100 amino acids, then it is approximately equal to
 
:<math>\frac{m}{100} = \frac{E(M)}{N}</math>
 
Now <math>m</math> and <math>n</math> can be related by
 
:<math>\frac{m}{100} = 1 - (1 - \frac{1}{N})^{\frac{nN}{100}}</math>
 
For large values of <math>N</math>, an assumption that can be reasonably made for typical proteins, this expression is approximately equal to
 
:<math>\frac{m}{100} = 1 - e^{-\frac{n}{100}}</math>
|}
 
The validity of these estimates can be verified by counting the number of amino acids that remain unchanged under the action of the matrix <math>M</math>. The total number of unchanged amino acids for the time interval of the PAM<sub>n</sub> matrix is
 
:<math>\sum_{j=1}^{20}n(j)M^n(j,j)</math>
 
and so the proportion of unchanged amino acids is
 
:<math>\frac{\sum_{j=1}^{20}n(j)M^n(j,j)}{N} = \sum_{j=1}^{20}f(j)M^n(j,j) = 1 - \frac{m}{100}</math>
 
==An example - PAM250==
 
A PAM250 is a commonly used scoring matrix for sequence comparison. Only the lower half the matrix needs to be computed, since by their construction, PAM matrices are required to be symmetric. Each of the 20 amino acid are shown down the top and side of the matrix, with 3 additional [[amino acids#Physicochemical properties of amino acids|ambiguous amino acids]]. The amino acids are most commonly shown listed alphabetically, or listed in groups. These [[amino acids#Physicochemical properties of amino acids|groups]] are the characteristics shared among the amino acids.<ref name = dayhoff1978>{{Cite book| title=Atlas of protein sequence and structure |edition = volume 5, supplement 3 | pages=345–358 | year=1978 |contribution= A model of Evolutionary Change in Proteins | author=Dayhoff, M.O., Schwartz, R. and Orcutt, B.C.| publisher=Nat. Biomed. Res. Found. | isbn =0-912466-07-3| postscript=<!--None-->}}</ref>
 
==Uses in bioinformatics==
 
===Determining the time of divergence in phylogenetic trees===
The [[molecular clock hypothesis]] predicts that the rate of amino acid substitution in a particular protein will be approximately constant over time, though this rate may vary between protein families.<ref name=Pevsner-ch7/> This suggests that the number of mutations per amino acid in a protein increases approximately linearly with time.
 
Determining the time at which two proteins diverged is an important task in [[phylogenetics]]. [[Fossil#Estimating dates|Fossil records]] are often used to establish the position of events on the timeline of the Earth's evolutionary history, but the application of this source is [[Fossil#Limitations|limited]]. However, if the rate at which the molecular clock of protein family ticks — that is, the rate at which the number of mutations per amino acid increases — is known, then knowing this number of mutations would allow the date of divergence to be found.
 
Suppose the date of divergence for two related proteins, taken from organisms living today, is sought. The two proteins have both been accumulating accepted mutations since the date of divergence, and so the total number of mutations per amino acid separating them is approximately twice that which separates them from their [[Common descent|common ancestor]]. If a range of PAM matrices are used to align two proteins that are known to be related, then the value of <math>n</math> in the PAM<sub>n</sub> matrix which results in the best score is most likely to correspond to the mutations per amino acid separating the two proteins. Halving this value and dividing by the rate at which accepted mutations accumulate in the protein family provides an estimate of the time of divergence of these two proteins from their common ancestor. That is, the time of divergence in [[myr]] is<ref name = Pevsner-ch7/>
 
:<math>T = \frac{K}{2r}</math>
 
Where <math>K</math> is the number of mutations per amino acid, and <math>r</math> is the rate of accepted mutation accumulation in mutations per amino acid site per million years.
 
===Use in BLAST===
PAM matrices are also used as a scoring matrix when comparing DNA sequences or protein sequences to judge the quality of the alignment. This form of scoring system is utilized by a wide range of alignment software including [[BLAST]].<ref>{{cite web |url =http://www.ncbi.nlm.nih.gov/BLAST/tutorial/Altschul-1.html#head9 |title =The Statistics of Sequence Similarity Scores |website =National Centre for Biotechnology Information |accessdate =20 October 2013}}</ref>
 
====Comparing PAM and BLOSUM====
In addition to PAM matrices, another more recently developed scoring matrix can be used. This is known as a [[BLOSUM]]. The two result in the same scoring outcome, but use differing methodologies. BLOSUM directly look at mutations in motifs of related sequences while PAM's [[extrapolate]] evolutionary information based on closely related sequences.<ref>{{cite journal | journal=Proceedings of the National Academy of Science of the United Sates of America | volume=89 | issue=22 | pages=10915–10919 | year=1992  | author=Henikoff S, Henikoff J G. | title=Amino acid substitution matrices from protein blocks. |url=http://www.ncbi.nlm.nih.gov/pmc/articles/PMC50453/pdf/pnas01096-0363.pdf | pmc=50453}}</ref>
 
Since both PAM and BLOSUM are different methods for showing the same scoring information, the two can be compared. But due to the very different method of obtaining this score, a PAM100 '''does not equal''' a BLOSUM100.<ref>{{cite web |url =http://www.birec.org/sandbox/omamasaudtutorial |title =PAM and BLOSSUM SUBSITUTION MATRICES |last =Saud
|first =Omama |year =2009 |website =Birec |accessdate =20 October 2013}}</ref>
{| class="wikitable" style="margin: 1em auto 1em auto; text-align:center; width: 30%;"
|-
! PAM !! BLOSUM
|-
| PAM100 || BLOSUM90
|-
| PAM120 || BLOSUM80
|-
| PAM160 || BLOSUM60
|-
| PAM200 || BLOSUM52
|-
| PAM250 || BLOSUM45
|}
 
==See also==
* [[Point mutation]]
* [[Sequence alignment]]
* [[Margaret Dayhoff]]
* [[Molecular clock]]
* [[BLOSUM]]
* [[BLAST]]
 
==References==
{{reflist|2}}
 
==External links==
* http://www.inf.ethz.ch/personal/gonnet/DarwinManual/node148.html
* http://www.bioinformatics.nl/tools/pam.html For quickly calculating a PAM matrix.
* http://web.expasy.org/docs/relnotes/relstat.html The most recent statistics from the Swiss-Prot protein knowledgebase. Section 6.1 contains the most up-to-date amino acid frequencies
 
[[Category:Genetics]]
[[Category:Bioinformatics]]

Latest revision as of 22:32, 20 November 2014

Greetings! I am Myrtle Shroyer. For many years I've been operating as a payroll clerk. For a while she's been in South Dakota. One of the things he enjoys most is ice skating but he is having difficulties to find time for it.

Here is my site - over the counter std test