Variable-length code: Difference between revisions

From formulasearchengine
Jump to navigation Jump to search
en>Addbot
m Bot: Migrating 1 interwiki links, now provided by Wikidata on d:q2981616
en>Mwtoews
is this commonly called VLC? if not, please revert
 
Line 1: Line 1:
In [[protein structure prediction]], a '''statistical potential''' or '''knowledge-based potential''' is an energy function derived from an analysis of known protein structures in the [[Protein Data Bank]].
Emilia Shryock is my title but you can call me anything you like. She is a librarian but she's always wanted her own company. Her husband and her live in Puerto Rico but she will have to transfer one day or another. What I love doing is to collect badges but I've been taking on new issues recently.<br><br>My blog: [http://support.kaponline.com/entries/46228144-Endocrine-Illnesses-Canines-Part-1 kaponline.com]
 
Many methods exist to obtain such potentials; two notable method are the ''quasi-chemical approximation'' (due to Miyazawa and Jernigan <ref>Miyazawa S, Jernigan R (1985) Estimation of effective interresidue contact energies from protein crystal structures: quasi-chemical approximation. Macromolecules 18: 534–552.</ref>) and the ''potential of mean force'' (due to Sippl <ref name="Sippl_a">Sippl MJ (1990) Calculation of conformational ensembles from potentials of mean force. An approach to the knowledge-based prediction of local structures in globular proteins. J Mol Biol 213: 859–883.</ref>). Although the obtained energies are often considered as approximations of the [[Thermodynamic free energy|free energy]], this physical interpretation is incorrect.<ref name="Thomas">Thomas PD, Dill KA (1996) Statistical potentials extracted from protein structures: how accurate are they? J Mol Biol 257: 457–469.</ref><ref name="BenNaim">Ben-Naim A (1997) Statistical potentials extracted from protein structures: Are these meaningful potentials? J Chem Phys 107: 3698–3706.</ref> Nonetheless, they have been applied with a limited success in many cases <ref name="ratio">{{cite journal |author=Hamelryck T, Borg M, Paluszewski M, ''et al.'' |title=Potentials of mean force for protein structure prediction vindicated, formalized and generalized |journal=PLoS ONE |volume=5 |issue=11 |pages=e13714 |year=2010 |pmid=21103041 |pmc=2978081 |doi=10.1371/journal.pone.0013714 |url= |editor1-last=Flower |editor1-first=Darren R.}}</ref> because they frequently correlate with actual (physical) [[Gibbs energy|free energy]] differences.
 
==Assigning an energy==
Possible features to which an energy can be assigned include torsion angles (such as the <math>\phi, \psi</math> angles of the [[Ramachandran plot]]), solvent exposure or [[hydrogen bond]] geometry. The classic application of such potentials is however pairwise amino acid contacts or distances. For pairwise amino acid contacts, a statistical potential is formulated as an interaction [[matrix (mathematics)|matrix]] that assigns a weight or energy value to each possible pair of [[list of standard amino acids|standard amino acids]]. The energy of a particular structural model is then the combined energy of all pairwise contacts (defined as two amino acids within a certain distance of each other) in the structure. The energies are determined using statistics on amino acid contacts in a database of known protein structures (obtained from the [[Protein Data Bank]]).
 
==Sippl's potential of mean force==
 
===Overview===
Many textbooks present the potentials of mean force (PMFs) as proposed by Sippl <ref name="Sippl_a" /> as a simple consequence of the [[Boltzmann distribution]], as applied to pairwise distances between amino acids. This is incorrect, but a useful start to introduce the construction of the potential in practice.
The Boltzmann distribution applied to a specific pair of amino acids,
is given by:
 
:<math>
P\left(r\right)=\frac{1}{Z}e^{-\frac{F\left(r\right)}{kT}}
</math>
 
where <math>r</math> is the distance, <math>k</math> is the [[Boltzmann constant]], <math>T</math> is
the temperature and <math>Z</math> is the [[partition function (statistical mechanics)|partition function]], with
 
:<math>
Z=\int e^{-\frac{F(r)}{kT}}dr
</math>
 
The quantity <math>F(r)</math> is the free energy assigned to the pairwise system.
Simple rearrangement results in the ''inverse Boltzmann formula'',
which expresses the free energy <math>F(r)</math> as a function of <math>P(r)</math>:
 
:<math>
F\left(r\right)=-kT\ln P\left(r\right)-kT\ln Z
</math>
 
To construct a PMF, one then introduces a so-called ''reference
state'' with a corresponding distribution <math>Q_{R}</math> and partition function
<math>Z_{R}</math>, and calculates the following free energy difference:
 
:<math>
\Delta F\left(r\right)=-kT\ln\frac{P\left(r\right)}{Q_{R}\left(r\right)}-kT\ln\frac{Z}{Z_{R}}
</math>
 
The reference state typically results from a hypothetical
system in which the specific interactions between the amino acids
are absent. The second term involving <math>Z</math> and
<math>Z_{R}</math> can be ignored, as it is a constant.
 
In practice, <math>P(r)</math> is estimated from the database of known protein
structures, while <math>Q_{R}(r)</math> typically results from calculations
or simulations. For example, <math>P(r)</math> could be the conditional probability
of finding the <math>C\beta</math> atoms of a valine and a serine at a given
distance <math>r</math> from each other, giving rise to the free energy difference
<math>\Delta F</math>. The total free energy difference of a protein,
<math>\Delta F_{\textrm{T}}</math>, is then claimed to be the sum
of all the pairwise free energies:
 
:<math>
\Delta F_{\textrm{T}}=\sum_{i<j}\Delta F(r_{ij}\mid a_{i},a_{j})=-kT\sum_{i<j}\ln\frac{P\left(r_{ij}\mid a_{i},a_{j}\right)}{Q_{R}\left(r_{ij}\mid a_{i},a_{j}\right)}
</math>
 
where the sum runs over all amino acid pairs <math>a_{i},a_{j}</math>
(with <math>i<j</math>) and <math>r_{ij}</math> is their corresponding distance. It should
be noted that in many studies <math>Q_{R}</math> does not depend on the amino
acid sequence.<ref>Rooman M, Wodak S (1995) Are database-derived potentials valid for scoring both forward and inverted protein folding? Protein Eng 8: 849–858.</ref>
 
Intuitively, it is clear that a low value for <math>\Delta F_{\textrm{T}}</math> indicates
that the set of distances in a structure is more likely in proteins than
in the reference state. However, the physical meaning of these PMFs have
been widely disputed since their introduction.<ref name="Thomas" /><ref name="BenNaim" /><ref>Koppensteiner WA, Sippl MJ (1998) Knowledge-based potentials–back to the roots. Biochemistry Mosc 63: 247–252.</ref><ref>Shortle D (2003) Propensities, probabilities, and the Boltzmann hypothesis. Protein Sci 12: 1298–1302.</ref> The main issues are the interpretation of this "potential" as a true, physically valid [[potential of mean force]], the nature of the reference state and its optimal formulation, and the validity of generalizations beyond pairwise distances.
 
==Justification==
 
===Analogy with liquid systems===
The first, qualitative justification of PMFs is due to Sippl, and
based on an analogy with the statistical physics of liquids.<ref name="Sippl_b">Sippl MJ, Ortner M, Jaritz M, Lackner P, Flockner H (1996) Helmholtz free energies of atom pair interactions in proteins. Fold Des 1: 289–98.</ref>
For liquids,<ref name="Chandler">Chandler D (1987) Introduction to Modern Statistical Mechanics. New York: Oxford University Press, USA.</ref>
the potential of mean force is related to the [[radial distribution function]] <math>g(r)</math>, which is given by:
 
:<math>
g(r)=\frac{P(r)}{Q_{R}(r)}
</math>
 
where <math>P(r)</math> and <math>Q_{R}(r)</math> are the respective probabilities of
finding two particles at a distance <math>r</math> from each other in the liquid
and in the reference state. For liquids, the reference state
is clearly defined; it corresponds to the ideal gas, consisting of
non-interacting particles. The two-particle potential of mean force
<math>W(r)</math> is related to <math>g(r)</math> by:
 
:<math>
W(r)=-kT\log g(r)=-kT\log\frac{P(r)}{Q_{R}(r)}
</math>
 
According to the [[reversible work theorem]], the two-particle
potential of mean force <math>W(r)</math> is the reversible work required to
bring two particles in the liquid from infinite separation to a distance
<math>r</math> from each other.<ref name="Chandler" />
 
Sippl justified the use of PMFs - a few years after he introduced
them for use in protein structure prediction <ref name="Sippl_b" /> - by
appealing to the analogy with the reversible work theorem for liquids. For liquids, <math>g(r)</math> can be experimentally measured
using [[small angle X-ray scattering]]; for proteins, <math>P(r)</math> is obtained
from the set of known protein structures, as explained in the previous
section. However, as Ben-Naim writes in a publication on the subject:<ref name="BenNaim" />
<blockquote>
[...]the quantities, referred to as `statistical potentials,' `structure
based potentials,' or `pair potentials of mean force', as derived from
the protein data bank, are neither `potentials' nor `potentials of
mean force,' in the ordinary sense as used in the literature on
liquids and solutions.
</blockquote>
Another issue is that the analogy does not specify
a suitable reference state for proteins.
 
===Analogy with likelihood===
Baker and co-workers <ref>Simons KT, Kooperberg C, Huang E, Baker D (1997) Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. J Mol Biol 268: 209–225.</ref> justified PMFs from a
Bayesian point of view and used these insights in the construction of
the coarse grained ROSETTA energy function.  According
to Bayesian probability calculus, the conditional probability <math>P(X\mid
A)</math> of a structure <math>X</math>, given the amino acid sequence <math>A</math>, can be
written as:
 
:<math>
P\left(X\mid A\right)=\frac{P\left(A\mid
X\right)P\left(X\right)}{P\left(A\right)}\propto P\left(A\mid
X\right)P\left(X\right)
</math>
 
<math>P(X\mid A)</math> is proportional to the product of
the likelihood <math>P\left(A\mid X\right)</math> times the prior
<math>P\left(X\right)</math>. By assuming that the likelihood can be approximated
as a product of pairwise probabilities, and applying Bayes' theorem, the
likelihood can be written as:
 
:<math>
P\left(A\mid
X\right)\approx\prod_{i<j}P\left(a_{i},a_{j}\mid
r_{ij}\right)\propto\prod_{i<j}\frac{P\left(r_{ij}\mid
a_{i},a_{j}\right)}{P(r_{ij})}
</math>
 
where the product runs over all amino acid pairs <math>a_{i},a_{j}</math> (with
<math>i<j</math>), and <math>r_{ij}</math> is the distance between amino acids <math>i</math> and <math>j</math>.
Obviously, the negative of the logarithm of the expression
has the same functional form as the classic
pairwise distance PMFs, with the denominator playing the role of the
reference state. This explanation has two shortcomings: it is purely qualitative,
and relies on the unfounded assumption the likelihood can be expressed
as a product of pairwise probabilities.
 
===Reference ratio explanation===
 
[[Image:ratio reference method.svg|thumb|350px|right|The reference ratio method. <math>Q(X)</math> is a probability distribution that describes the structure of proteins on a local length scale (right). Typically, <math>Q(X)</math> is embodied in a fragment library, but other possibilities are an energy function or a [[graphical model]]. In order to obtain a complete description of protein structure, one also needs a probability distribution <math>P(Y)</math> that describes nonlocal aspects, such as hydrogen bonding. <math>P(Y)</math> is typically obtained from a set of solved protein structures from the [[Protein data bank]] (PDB, left). In order to combine <math>Q(X)</math> with <math>P(Y)</math> in a meaningful way, one needs the reference ratio expression (bottom), which takes the signal in <math>Q(X)</math> with respect to <math>Y</math> into account.]]
 
Expressions that resemble PMFs naturally result from the application of
probability theory to solve a fundamental problem that arises in protein
structure prediction: how to improve an imperfect probability
distribution <math>Q(X)</math> over a first variable <math>X</math> using a probability
distribution <math>P(Y)</math> over a second variable <math>Y</math>, with <math>Y=f(X)</math>.<ref name="ratio" /> Typically, <math>X</math> and <math>Y</math> are fine and coarse grained variables, respectively. For example, <math>Q(X)</math> could concern
the local structure of the protein, while <math>P(Y)</math> could concern the pairwise distances between the amino acids. In that case, <math>X</math> could for example be a vector of dihedral angles that specifies all atom positions (assuming ideal bond lengths and angles).
In order to combine the two distributions, such that the local structure will be distributed according to <math>Q(X)</math>, while
the pairwise distances will be distributed according to <math>P(Y)</math>, the following expression is needed:
 
:<math>
P(X,Y)=\frac{P(Y)}{Q(Y)}Q(X)
</math>
 
where <math>Q(Y)</math> is the distribution over <math>Y</math> implied by <math>Q(X)</math>. The ratio in the expression corresponds
to the PMF. Typically, <math>Q(X)</math> is brought in by sampling (typically from a fragment library), and not explicitly evaluated; the ratio, which in contrast is explicitly evaluated, corresponds to Sippl's potential of mean  force. This explanation is quantitive, and allows the generalization of PMFs from pairwise distances to arbitrary coarse grained variables. It also
provides a rigorous definition of the reference state, which is implied by <math>Q(X)</math>. Conventional applications of pairwise distance PMFs usually lack two
necessary features to make them fully rigorous: the use of a proper probability distribution over pairwise distances in proteins, and the recognition that the reference state is rigorously
defined by <math>Q(X)</math>.
 
==Applications==
Statistical potentials are used as [[energy function]]s in the assessment of an ensemble of structural models produced by [[homology modeling]] or [[protein threading]] - predictions for the tertiary structure assumed by a particular [[amino acid sequence]] made on the basis of comparisons to one or more [[homology (biology)|homologous]] proteins with known structure. Many differently parameterized statistical potentials have been shown to successfully identify the native state structure from an ensemble of "decoy" or non-native structures.<ref name="MJ">Miyazawa S. & Jernigan RL. (1996). Residue–Residue Potentials with a Favorable Contact Pair Term and an Unfavorable High Packing Density Term, for Simulation and Threading. ''J Mol Biol'' 256:623–644.</ref><ref name="TobiElber">Tobi D & Elber R. (2000). Distance Dependent, Pair Potential for Protein Folding: Results from Linear Optimization. ''Proteins'' 41:40-46.</ref><ref name="Sali">Shen MY & Sali A. (2006). Statistical potential for assessment and prediction of protein structures. ''Protein Sci'' 15:2507-2524.</ref><ref name="Narang">Narang P, Bhushan K, Bose S, Jayaram B. (2006). Protein structure evaluation using an all-atom energy based empirical scoring function. ''J Biomol Struct Dyn'' 23(4):385-406.</ref><ref name="Sippl">Sippl MJ. (1993). Recognition of Errors in Three-Dimensional Structures of Proteins. ''Proteins'' 17:355-62.</ref><ref name="Bryant">Bryant SH, Lawrence CE. (1993). An empirical energy function for threading protein sequence through the folding motif. ''Proteins'' 16(1):92-112.</ref> Statistical potentials are not only used for [[protein structure prediction]], but also for modelling the [[protein folding]] pathway.<ref name="Kmiecik">{{cite journal |author=Kmiecik S and Kolinski A |title=Characterization of protein-folding pathways by reduced-space modeling |journal=Proc. Natl. Acad. Sci. U.S.A. |volume=104 |issue=30 |pages=12330–12335 |year=2007 |url=http://www.pnas.org/cgi/content/abstract/104/30/12330 |pmid=17636132 |doi=10.1073/pnas.0702265104 |pmc=1941469}}</ref><ref name="Adhikari">{{cite journal |author=Adhikari AN, Freed KF and Sosnick TR |title=De novo prediction of protein folding pathways and structure using the principle of sequential stabilization |journal=Proc. Natl. Acad. Sci. U.S.A. |volume=109 |issue=43 |pages=17442–17447 |year=2012 |url=http://www.pnas.org/content/109/43/17442.abstract}}</ref>
 
==References==
<references />
 
==See also==
* [[Potential energy]]
* [[Molecular dynamics]]
* [[Bond order potential]]
 
[[Category:Bioinformatics]]
[[Category:Protein structure]]

Latest revision as of 21:10, 12 January 2015

Emilia Shryock is my title but you can call me anything you like. She is a librarian but she's always wanted her own company. Her husband and her live in Puerto Rico but she will have to transfer one day or another. What I love doing is to collect badges but I've been taking on new issues recently.

My blog: kaponline.com