Correspondence (mathematics): Difference between revisions

From formulasearchengine
Jump to navigation Jump to search
en>Bob1960evens
only one isbn allowed by cite book
No edit summary
 
Line 1: Line 1:
In [[information geometry]], the '''Fisher information metric''' is a particular [[Riemannian metric]] which can be defined on a smooth [[statistical manifold]], ''i.e.'', a [[smooth manifold]] whose points are [[probability measure]]s defined on a common [[probability space]]. It can be used to calculate the informational difference between measurements.
Eusebio Stanfill is what's written on my birth document although it is in no way the name on the little birth certificate. Idaho is our birth place. I work as an pay for clerk. As a man what While i really like is behaving but I'm thinking towards starting something new. You will be able to find my website here: http://prometeu.net<br><br>My blog ... [http://prometeu.net clash of clans hack tool v3.1 password]
 
The metric is interesting in several respects.  First, it can be understood to be the infinitesimal form of the relative entropy (''i.e.'', the [[Kullback–Leibler divergence]]); specifically, it is the [[Hessian matrix|Hessian]] of the divergence.  Alternately, it can be understood as the metric induced by the flat space [[Euclidean metric]], after appropriate changes of variable.  When extended to complex [[projective Hilbert space]], it becomes the [[Fubini–Study metric]]; when written in terms of [[mixed state (physics)|mixed states]], it is the quantum [[Bures metric]].
 
Considered purely as a matrix, it is known as the [[Fisher information matrix]]. Considered as a measurement technique, where it is used to estimate hidden parameters in terms of observed random variables, it is known as the [[observed information]] (this is wrong: it is the expected information; the observed information is not a function of the parameters).
 
==Definition==
Given a statistical manifold, with coordinates given by <math>\theta=(\theta_1, \theta_2, \cdots, \theta_n)</math>, one writes <math>p(x,\theta)</math> for the probability distribution.  Here, <math>x</math> is a specific value drawn from a collection of (discrete or continuous) [[random variables]] ''X''.  The probability is normalized, in that
:<math>\int_X p(x,\theta) \,dx = 1</math>
 
The Fisher information metric then takes the form:
 
:<math>
g_{jk}(\theta)
=
\int_X
\frac{\partial \log p(x,\theta)}{\partial \theta_j}
\frac{\partial \log p(x,\theta)}{\partial \theta_k}
p(x,\theta) \, dx.
</math>
 
The integral is performed over all values ''x'' of all random variables ''X''.  Again, the variable <math>\theta</math> is understood as a coordinate on the [[Riemann manifold]].  The labels ''j'' and ''k'' index the local coordinate axes on the manifold.
 
When the probability is derived from the [[Gibbs measure]], as it would be for any [[Markovian process]], then <math>\theta</math> can also be understood to be a [[Lagrange multiplier]]; Lagrange multipliers are used to enforce constraints, such as holding the [[expectation value]] of some quantity constant. If there are ''n'' constraints holding ''n'' different expectation values constant, then the manifold is ''n''-dimensional.  In this case, the metric can be explicitly derived from the [[partition function (mathematics)|partition function]]; a derivation and discussion is presented there.
 
Substituting <math>i = -\ln(p)</math> from [[information theory]], an equivalent form of the above definition is:
 
:<math>
g_{jk}(\theta)
=
\int_X
\frac{\partial^2 i(x,\theta)}{\partial \theta_j \partial \theta_k}
p(x,\theta) \, dx
=
\mathrm{E}
\left[
\frac{\partial^2 i(x,\theta)}{\partial \theta_j \partial \theta_k}
\right].
</math>
 
==Relation to the Kullback–Leibler divergence==
Alternately, the metric can be obtained as the second derivative of the ''relative entropy'' or [[Kullback–Leibler divergence]]. To obtain this, one considers two probability distributions <math> P = P(\theta) </math> and <math>Q = P(\theta_0)</math>, which are infinitesimally close to one another, so that
 
:<math>P = Q + \sum_j \Delta\theta^j Q_j</math>
 
with <math>\Delta\theta^j</math> an infinitesimally small change of <math>\theta</math> in the ''j'' direction, and <math>Q_j = \left.\frac{\partial P}{\partial \theta^j}\right|_{\theta = \theta_0}</math> the rate of change of the probability distribution.  Then, since the Kullback–Leibler divergence <math>D_{\mathrm{KL}}(P\|Q)</math> has an absolute minimum 0 for ''P'' = ''Q'' one has an expansion up to second order in <math>\theta = \theta_0</math> of the form
:<math>f_{\theta_0}(\theta) := D_{\mathrm{KL}}(P \| Q) = \sum_{jk}\Delta\theta^j\Delta\theta^k g_{jk}(\theta_0)</math>.  
 
The symmetric matrix <math>g_{jk}</math> is positive (semi) definite and is the [[Hessian matrix]] of the function <math>f_{\theta_0}</math> at the stationary point <math>\theta_0</math>. This can be thought of intuitively as: "The distance between two infinitesimally close points on a statistical differential manifold is the amount of information, ''i.e.'' the informational difference between them."
 
== Relation to Ruppeiner geometry==
The [[Ruppeiner metric]] and [[Weinhold metric]] arise as the [[thermodynamic limit]] of the Fisher information metric.<ref name="crooks">Gavin E. Crooks, "Measuring thermodynamic length" (2007), [http://arxiv.org/abs/0706.0559 ArXiv 0706.0559] ''Physical Review Letters'' (2009) pp100602.  DOI: 10.1103/PhysRevLett.99.100602</ref>
 
== Change in entropy==
The [[geodesic|action]] of a curve on a [[Riemannian manifold]] is given by
:<math>A=\frac{1}{2}\int_a^b
\frac{\partial\theta^j}{\partial t}
g_{jk}(\theta)\frac{\partial\theta^k}{\partial t} dt</math>
The path parameter here is time ''t''; this action can be understood to give the change in [[entropy]] of a system as it is moved from time ''a'' to time ''b''.<ref name="crooks"/> Specifically, one has
 
:<math>\Delta S = (b-a) A</math>
 
as the change in entropy.  This observation has resulted in practical applications in [[chemical industry|chemical]] and [[processing industry]]: in order to minimize the change in entropy of a system, one should follow the minimum [[geodesic]] path between the desired endpoints of the process. The geodesic minimizes the entropy, due to the [[Cauchy–Schwarz inequality]], which states that the action is bounded below by the length of the curve, squared.
 
==Relation to the Jensen–Shannon divergence==
The Fisher metric also allows the action and the curve length to be related to the [[Jensen–Shannon divergence]].<ref name="crooks"/> Specifically, one has
 
:<math>(b-a)\int_a^b
\frac{\partial\theta^j}{\partial t}
g_{jk}\frac{\partial\theta^k}{\partial t} dt =
8\int_a^b dJSD</math>
where the integrand ''dJSD'' is understood to be the infinitessimal change in the Jensen–Shannon divergence along the path taken.  Similarly, for the [[curve length]], one has
:<math>\int_a^b \sqrt{
\frac{\partial\theta^j}{\partial t}
g_{jk}\frac{\partial\theta^k}{\partial t}} dt =
\sqrt{8}\int_a^b \sqrt{dJSD}</math>
That is, the square root of the Jensen–Shannon divergence is just the Fisher metric (divided by the square root of 8).
 
==As Euclidean metric==
For a [[discrete probability space]], that is, a probability space on a finite set of objects, the Fisher metric can be understood to simply be the flat [[Euclidean metric]], after approriate changes of variable.<ref name="gromov">Misha Gromov, (2012) "[http://www.ihes.fr/~gromov/PDF/structre-serch-entropy-july5-2012.pdf In a Search for a Structure, Part 1: On Entropy.]"</ref>
 
An ''N''-dimensional sphere embedded in (''N''&nbsp;+&nbsp;1)-dimensional space is defined as
:<math>\sum_i y_i^2 = 1</math>
The metric on the surface of the sphere is given by
:<math>h=\sum_i dy_i \; dy_i</math>
where the <math>\textstyle dy_i</math> are [[1-form]]s; they are the basis vectors for the [[cotangent space]].  Writing <math>\textstyle \frac{\partial}{\partial y_j}</math> as the basis vectors for the [[tangent space]], so that <math>\textstyle dy_j\left(\frac{\partial}{\partial y_k}\right) = \delta_{jk}</math>, the Euclidean metric may be written as
 
:<math>h^\mathrm{flat}_{jk} = h\left(\tfrac{\partial}{\partial y_j},
\tfrac{\partial}{\partial y_k}\right) = \delta_{jk}</math>
The superscript 'flat' is there to remind that, when written in coordinate form, this metric is with respect to the flat-space coordinate <math>y</math>. Consider now the change of variable <math>p_i=y_i^2</math>.  The sphere condition now becomes the probability normalization condition
:<math>\sum_i p_i = 1</math>
while the metric becomes
:<math>\begin{align} h &=\sum_i dy_i \; dy_i
= \sum_i d\sqrt{p_i} \; d\sqrt{p_i} \\
&= \frac{1}{4}\sum_i \frac{dp_i \; dp_i}{p_i}
= \frac{1}{4}\sum_i p_i\; d(\log p_i) \; d(\log p_i)
\end{align}</math>
The last can be recognized as one-fourth of the Fisher information metric.  To complete the process, recall that the probabilities are parametric functions of the manifold variables <math>\theta</math>, that is, one has <math>p_i = p_i(\theta)</math>.  Thus, the above induces a metric on the parameter manifold:
:<math>\begin{align} h
& = \frac{1}{4}\sum_i p_i(\theta) \; d(\log p_i(\theta))\; d(\log p_i(\theta)) \\
&= \frac{1}{4}\sum_{jk} \sum_i p_i(\theta) \;
\frac{\partial \log p_i(\theta)} {\partial \theta_j}
\frac{\partial \log p_i(\theta)} {\partial \theta_k}
d\theta_j d\theta_k
\end{align}</math>
or, in coordinate form, the Fisher information metric is:
:<math> \begin{align}
g_{jk}(\theta)
= 4h_{jk}^\mathrm{fisher}
&= 4 h\left(\tfrac{\partial}{\partial \theta_j},
\tfrac{\partial}{\partial \theta_k}\right) \\
& = \sum_i p_i(\theta) \;
\frac{\partial \log p_i(\theta)} {\partial \theta_j} \;
\frac{\partial \log p_i(\theta)} {\partial \theta_k}  \\
& = \mathrm{E}\left[
\frac{\partial \log p_i(\theta)} {\partial \theta_j} \;
\frac{\partial \log p_i(\theta)} {\partial \theta_k}
\right]
\end{align}</math>
where, as before,
<math>\textstyle d\theta_j\left(\frac{\partial}{\partial \theta_k}\right) = \delta_{jk}</math>. The superscript 'fisher' is present to remind that this expression is applicable for the coordinates <math>\theta</math>; whereas the non-coordinate form is the same as the Euclidean (flat-space) metric. That is, the Fisher information metric on a statistical manifold is simply (four times) the flat Euclidean metric, after appropriate changes of variable.
 
When the random variable <math>p</math> is not discrete, but continuous, the argument still holds. This can be seen in one of two different ways. One way is to carefully recast all of the above steps in an infinite-dimensional space, being careful to define limits appropriately, ''etc.'', in order to make sure that all manipulations are well-defined, convergent, ''etc.''  The other way, as noted by [[Mikhail Gromov (mathematician)|Gromov]],<ref name="gromov"/> is to use a [[category-theoretic]] approach; that is, to note that the above manipulations remain valid in the category of probabilities.
 
==As Fubini–Study metric==
The above manipulations deriving the Fisher metric from the Euclidean metric can be extended to complex [[projective Hilbert space]]s. In this case, one obtains the [[Fubini–Study metric]].<ref name="facchi">Paolo Facchi, Ravi Kulkarni, V. I. Man'ko, Giuseppe Marmo, E. C. G. Sudarshan, Franco Ventriglia "[http://arxiv.org/abs/1009.5219 Classical and Quantum Fisher Information in the Geometrical Formulation of Quantum Mechanics]" (2010), ''Physics Letters'' '''A 374''' pp. 4801. DOI: 10.1016/j.physleta.2010.10.005</ref> This should perhaps be no surprise, as the Fubini–Study metric provides the means of measuring information in quantum mechanics.  The [[Bures metric]], also known as the [[Helstrom metric]], is identical to the Fubini–Study metric,<ref name="facchi"/> although the latter is usually written in terms of [[pure state]]s, as below, whereas the Bures metric is written for [[mixed state (physics)|mixed states]].  By setting the phase of the complex coordinate to zero, one obtains exactly one-fourth of the Fisher information metric, exactly as above.
 
One begins with the same trick, of constructing a [[probability amplitude]], written in [[polar coordinate]]s, so:
 
:<math>\psi(x;\theta) = \sqrt{p(x; \theta)} \; e^{i\alpha(x;\theta)} </math>
 
Here, <math>\psi(x;\theta)</math> is a complex-valued [[probability amplitude]]; <math>p(x; \theta)</math> and <math>\alpha(x;\theta) </math> are strictly real.  The previous calculations are obtained by
setting <math>\alpha(x;\theta)=0</math>.  The usual condition that probabilities lie within a [[simplex]], namely that
 
:<math>\int_X p(x;\theta)dx =1</math>
 
is equivalently expressed by the idea the square amplitude be normalized:
 
:<math>\int_X \vert \psi(x;\theta)\vert^2 dx = 1</math>
 
When <math>\psi(x;\theta)</math> is real, this is the surface of a sphere.
 
The [[Fubini–Study metric]], written in infinitesimal form, using quantum-mechanical [[bra-ket notation]], is
 
:<math>ds^2 = \frac{\langle \delta \psi \vert \delta \psi \rangle}
{\langle \psi \vert \psi \rangle} -
\frac {\langle \delta \psi \vert \psi \rangle \;
\langle \psi \vert \delta \psi \rangle}
{{\langle \psi \vert \psi \rangle}^2}.
</math>
 
In this notation, one has that
:<math> \langle x\vert \psi\rangle = \psi(x;\theta)</math>
and integration over the entire measure space ''X'' is written as
:<math> \langle \phi \vert \psi\rangle = \int_X \phi^*(x;\theta) \psi(x;\theta) dx </math>
 
The expression <math>\vert \delta \psi \rangle</math> can be understood to be an infinitesimal variation; equivalently, it can be understood to be a [[1-form]] in the [[cotangent space]].  Using the infinitesimal notation, the polar form of the probability above is simply
 
:<math>\delta\psi = \left(\frac{\delta p}{2p} + i \delta \alpha\right) \psi</math>
 
Inserting the above into the Fubini–Study metric gives:
 
:<math>\begin{align} ds^2 & =
\frac{1}{4}\int_X (\delta \log p)^2 \;pdx
+ \int_X  (\delta \alpha)^2 \;pdx
- \left(\int_X \delta \alpha \;pdx\right)^2 \\
& -\frac{i}{2} \int_X (\delta \log p \delta\alpha - \delta\alpha \delta \log p) \;pdx
\end{align}       
</math>
 
Setting <math>\delta\alpha=0</math> in the above makes it clear that the first term is (one-fourth of) the Fisher information metric.  The full form of the above can be made slightly clearer by changing notation to that of standard Riemannian geometry, so that the metric becomes a symmetric [[2-form]] acting on the [[tangent space]]. The change of notation is done simply replacing <math>\delta \to d</math> and <math>ds^2\to h</math> and noting that the integrals are just expectation values; so:
 
:<math>h = \frac{1}{4} \mathrm{E}\left[(d\log p)^2\right]
+ \mathrm{E}\left[(d\alpha)^2\right]
- \left(\mathrm{E}\left[d\alpha\right]\right)^2
- \frac{i}{2}\mathrm{E}\left[d\log p\wedge d\alpha\right]</math>
 
The imaginary term is a [[symplectic form]], it is the [[Berry phase]] or [[geometric phase]].  In index notation, the metric is:
 
:<math>\begin{align}h_{jk} & =
h\left(\tfrac{\partial}{\partial\theta_j}, \tfrac{\partial}{\partial\theta_k}\right)  \\
& = \frac{1}{4} \mathrm{E}\left[
\frac{\partial\log p}{\partial\theta_j}
\frac{\partial\log p}{\partial\theta_k}
\right]
+ \mathrm{E}\left[
\frac{\partial\alpha}{\partial\theta_j}
\frac{\partial\alpha}{\partial\theta_k}
\right]
- \mathrm{E}\left[ \frac{\partial\alpha}{\partial\theta_j} \right]
\mathrm{E}\left[ \frac{\partial\alpha}{\partial\theta_k} \right] \\
& - \frac{i}{2}\mathrm{E}\left[
\frac{\partial\log p}{\partial\theta_j}
\frac{\partial\alpha}{\partial\theta_k}
-
\frac{\partial\alpha}{\partial\theta_j}
\frac{\partial\log p}{\partial\theta_k}
\right]
\end{align}</math>
 
Again, the first term can be clearly seen to be (one fourth of) the Fisher information metric, by setting <math>\alpha=0</math>.  Equivalently, the Fubini–Study metric can be understood as the metric on complex projective Hilbert space that is induced by the complex extension of the flat Euclidean metric.  The difference between this, and the Bures metric, is that the Bures metric is written in terms of mixed states.
 
==Formal definition==
A slightly more formal, abstract definition can be given, as follows.<ref>Mitsuhiro Itoh and Yuichi Shishido, "[http://www.tulips.tsukuba.ac.jp/dspace/bitstream/2241/100265/1/DGA_26-4.pdf Fisher information metric and Poisson kernels]" (2008)</ref>
 
Let ''X'' be an [[orientable manifold]], and let <math>(X,\Sigma,\mu)</math> be a [[measure space|measure]] on ''X''. Equivalently, let <math>(\Omega, \mathcal{F},P)</math> be a [[probability space]] on <math>\Omega=X</math>, with [[sigma algebra]] <math>\mathcal{F}=\Sigma</math> and probability <math>P=\mu</math>.
 
The [[statistical manifold]] ''S''(''X'') of ''X'' is defined as the space of all measures <math>\mu</math> on ''X'' (with the sigma-algebra <math>\Sigma</math> held fixed). Note that this space is infinite-dimensional, and is commonly taken to be a [[Fréchet space]].  The points of ''S''(''X'') are measures.
 
Pick a point <math>\mu\in S(X)</math> and consider the [[tangent space]] <math>T_\mu S</math>. The Fisher information metric is then an [[inner product]] on the tangent space. With some [[abuse of notation]], one may write this as
 
:<math>g(\sigma_1,\sigma_2)=\int_X \frac{d\sigma_1}{d\mu}\frac{d\sigma_2}{d\mu}d\mu</math>
 
Here, <math>\sigma_1</math> and <math>\sigma_2</math> are vectors in the tangent space; that is,  <math>\sigma_1,\sigma_2\in T_\mu S</math>.  The abuse of notation is to write the tangent vectors as if they are derivatives, and to insert the extraneous ''d'' in writing the integral: the integration is meant to be carried out using the measure <math>\mu</math> over the whole space ''X''.
 
This definition of the metric can be seen to be equivalent to the previous, in several steps.  First, one selects a [[submanifold]] of ''S''(''X'') by considering only those measures <math>\mu</math> that are parameterized by some smoothly varying parameter <math>\theta</math>.  Then, if <math>\theta</math> is finite-dimensional, then so is the submanifold; likewise, the tangent space has the same dimension as <math>\theta</math>.
 
With some additional abuse of language, one notes that the [[exponential map]] provides a map from vectors in a tangent space to points in an underlying manifold. Thus, if <math>\sigma\in T_\mu S</math> is a vector in the tangent space, then <math>p=\exp(\sigma)</math> is the corresponding probability associated with point <math>p\in S(X)</math> (after the [[parallel transport]] of the exponential map to <math>\mu</math>.) Conversely, given a point <math>p\in S(X)</math>, the logarithm gives a point in the tangent space (roughly speaking, as again, one must transport from the origin to point <math>\mu</math>; for details, refer to original sources).  Thus, one has the appearance of logarithms in the simpler definition, previously given.
 
==See also==
 
*[[Cramér–Rao bound]]
*[[Fisher information]]
*[[Hellinger distance]]
 
== References ==
{{reflist}}
* Edward H. Feng, Gavin E. Crooks, "[http://threeplusone.com/Feng2009.pdf  Far-from-equilibrium measurements of thermodynamic length]" (2009)  ''Physical Review E'' '''79''', pp 012104. DOI: 10.1103/PhysRevE.79.012104
 
* [[Shun'ichi Amari]] (1985) ''Differential-geometrical methods in statistics'', Lecture notes in statistics, Springer-Verlag, Berlin.
* Shun'ichi Amari, Hiroshi Nagaoka (2000) ''Methods of information geometry'', Translations of mathematical monographs; v. 191, American Mathematical Society.
*  Paolo Gibilisco, Eva Riccomagno, Maria Piera Rogantin and Henry P. Wynn, (2009) ''Algebraic and Geometric Methods in Statistics'', Cambridge U. Press, Cambridge.
 
[[Category:Differential geometry]]
[[Category:Statistical distance measures]]

Latest revision as of 14:59, 15 March 2014

Eusebio Stanfill is what's written on my birth document although it is in no way the name on the little birth certificate. Idaho is our birth place. I work as an pay for clerk. As a man what While i really like is behaving but I'm thinking towards starting something new. You will be able to find my website here: http://prometeu.net

My blog ... clash of clans hack tool v3.1 password