Connection form: Difference between revisions

From formulasearchengine
Jump to navigation Jump to search
en>Enyokoyama
m fix tags, replaced: <sub>p</sup> → <sub>p</sub> (2) using AWB
Line 1: Line 1:
{{Merge to|Mixture model|date=September 2010}}
Adrianne Swoboda is what your wife husband loves to e-mail her though she doesn't always really like being names like that. After being out of his own job for years so santa became an order sales person. What my friend loves doing is to go to karaoke but the woman is thinking on starting something more challenging. Massachusetts could where he's always been living. She is running and looking after a blog here: http://[http://Search.Un.org/search?ie=utf8&site=un_org&output=xml_no_dtd&client=UN_Website_en&num=10&lr=lang_en&proxystylesheet=UN_Website_en&oe=utf8&q=circuspartypanama&Submit=Go circuspartypanama].com<br><br>Also visit my web site: [http://circuspartypanama.com clash of clans hack no survey mac]
 
{{See also|Mixture model}}
 
In [[probability]] and [[statistics]], a '''mixture distribution''' is the [[probability distribution]] of a [[random variable]] whose values can be interpreted as being derived in the following way from an underlying set of other random variables: specifically, the realization of the random variable with a mixture distribution is randomly selected from among the realizations of the underlying random variables, with a certain probability of selection being associated with each. Here the underlying random variables may be [[random vector]]s (each having the same dimension) in which case the mixture distribution is a [[multivariate distribution]].
 
In cases where each of the underlying random variables is [[Continuous random variable|continuous]], the outcome variable will also be continuous and its [[probability density function]] is sometimes referred to as a '''mixture density'''. The [[cumulative distribution function]] (and the [[probability density function]] if it exists) can be expressed as a [[convex combination]] (i.e. a weighted sum, with non-negative weights that sum to 1) of other distribution functions and density functions. The individual distributions that are combined to form the mixture distribution are called the '''mixture components''', and the probabilities (or weights) associated with each component are called the '''mixture weights'''.  The number of components in mixture distribution is often restricted to being finite, although in some cases the components may be [[countable|countably infinite]]. More general cases (i.e. an [[uncountable]] set of component distributions), as well as the countable case, are treated under the title of [[compound distribution]]s.
 
A distinction needs to be made between a [[random variable]] whose distribution function or density is the sum of a set of components  (i.e. a mixture distribution) and a random variable whose value is the sum of the values of two or more underlying random variables, in which case the distribution is given by the [[convolution]] operator.  As an example, the sum of two [[Multivariate normal distribution|jointly normally distributed]] random variables, each with different means, will still have a normal distribution.  On the other hand, a mixture density created as a mixture of two normal distributions with different means will have two peaks provided that the two means are far enough apart, showing that this distribution is radically different from a normal distribution.
 
Mixture distributions arise in many contexts in the literature and arise naturally where a [[statistical population]] contains two or more [[subpopulation]]s. They are also sometimes used as a means of representing non-normal distributions. Data analysis concerning [[statistical model]]s involving mixture distributions is discussed under the title of [[mixture model]]s, while the present article concentrates on simple probabilistic and statistical properties of mixture distributions and how these relate to properties of the underlying distributions.
 
== Finite and countable mixtures ==
[[Image:Gaussian-mixture-example.svg|thumb|Density of a mixture of three normal distributions (μ&nbsp;=&nbsp;5, 10, 15, σ&nbsp;=&nbsp;2) with equal weights. Each component is shown as a weighted density (each integrating to 1/3)]] 
 
Given a finite set of probability density functions ''p''<sub>1</sub>(''x''), …, ''p<sub>n</sub>''(''x''), or corresponding cumulative distribution functions  ''P''<sub>1</sub>(''x''), …, ''P<sub>n</sub>''(''x'') and '''weights''' ''w''<sub>1</sub>, …, ''w<sub>n</sub>'' such that {{nowrap|''w<sub>i</sub>'' ≥ 0}} and {{nowrap|∑''w<sub>i</sub>'' {{=}} 1, }} the mixture distribution can be represented by writing either the density, ''f'',  or the distribution function, ''F'', as a sum (which in both cases is a convex combination):
:<math> F(x) = \sum_{i=1}^n \, w_i \, P_i(x), </math>
:<math> f(x) = \sum_{i=1}^n \, w_i \, p_i(x) .</math>
This type of mixture, being a finite sum, is called a '''finite mixture,''' and in applications, an unqualified reference to a "mixture density" usually means a finite mixture. The case of a countably infinite set of components is covered formally by allowing <math> n = \infty\!</math>.
 
== Uncountable mixtures ==
{{Main|compound distribution}}
 
Where the set of component distributions is [[uncountable]], the result is often called a [[compound probability distribution]]. The construction of such distributions has a formal similarity to that of mixture distributions, with either infinite summations or integrals replacing the finite summations used for finite mixtures.
 
Consider a probability density function ''p''(''x'';''a'') for a variable ''x'', parameterized by ''a''. That is, for each value of ''a'' in some set ''A'', ''p''(''x'';''a'') is a probability density function with respect to ''x''. Given a probability density function ''w'' (meaning that ''w'' is nonnegative and integrates to 1), the function
 
:<math> f(x) = \int_A \, w(a) \, p(x;a) \, da </math>
 
is again a probability density function for ''x''. A similar integral can be written for the cumulative distribution function. Note that the formulae here reduce to the case of a finite or infinite mixture if the density ''w'' is allowed to be a [[generalized function]] representing the "derivative" of the cumulative distribution function of a [[discrete distribution]].
 
==Mixtures of parametric families==
The mixture components are often not arbitrary probability distributions, but instead are members of a [[parametric family]] (such as normal distributions), with different values for a parameter or parameters. In such cases, assuming that it exists, the density can be written in the form of a  sum as:
:<math> f(x; a_1, \ldots , a_n) = \sum_{i=1}^n \, w_i \, p(x;a_i) </math>
for one parameter, or
:<math> f(x; a_1, \ldots , a_n, b_1, \ldots , b_n) = \sum_{i=1}^n \, w_i \, p(x;a_i,b_i) </math>
for two parameters, and so forth.
 
== Properties ==
=== Convexity ===
A general [[linear combination]] of probability density functions is not necessarily a probability density, since it may be negative or it may integrate to something other than 1. However, a [[convex combination]] of probability density functions preserves both of these properties (non-negativity and integrating to 1), and thus mixture densities are themselves probability density functions.
 
=== Moments ===
Let ''X''<sub>1</sub>, ..., ''X''<sub>''n''</sub> denote random variables from the ''n'' component distributions, and let ''X'' denote a random variable from the mixture distribution. Then, for any function ''H''(·) for which <math>\operatorname{E}[H(X_i)]</math> exists, and assuming that the component densities ''p<sub>i</sub>''(''x'') exist,
 
:<math>
\begin{align}
\operatorname{E}[H(X)] & = \int_{-\infty}^\infty H(x) \sum_{i = 1}^n w_i p_i(x) \, dx \\
& = \sum_{i = 1}^n w_i \int_{-\infty}^\infty p_i(x) H(x) \, dx = \sum_{i = 1}^n w_i \operatorname{E}[H(X_i)].
\end{align}
</math>
 
The relation,
 
:<math> \operatorname{E}[H(X)] =  \sum_{i = 1}^n w_i \operatorname{E}[H(X_i)],</math>
 
holds more generally.
 
It is a trivial matter to note that the ''j''<sup>th</sup> moment about zero (i.e. choosing {{nowrap|''H''(''x'') {{=}} ''x<sup>j</sup>''}}) is simply a weighted average of the ''j''<sup>th</sup> moments of the components. Moments about the mean {{nowrap|''H''(''x'') {{=}} (''x − μ'')<sup>''j''</sup>}} involve a binomial expansion:<ref>Frühwirth-Schnatter (2006, Ch.1.2.4)</ref>
 
:<math>
\begin{align}
\operatorname{E}[(X - \mu)^j] & = \sum_{i = 1}^n w_i \operatorname{E}[(X_i - \mu_i + \mu_i - \mu)^j] \\
& = \sum_{i=1}^n \sum_{k=0}^j \left( \begin{array}{c} j \\ k \end{array} \right) (\mu_i - \mu)^{j-k} w_i \operatorname{E}[(X_i- \mu_i)^k],
\end{align}
</math>
 
where ''μ<sub>i</sub>'' denotes the mean of the ''i''<sup>th</sup> component.
 
In case of a mixture of one-dimensional [[normal distribution]]s with weights ''w<sub>i</sub>'', means ''μ<sub>i</sub>'' and variances ''σ<sub>i</sub>''<sup>2</sup>, the total mean and variance will be:
: <math> \operatorname{E}[X] = \mu = \sum_{i = 1}^n w_i \mu_i ,</math>
: <math> \operatorname{E}[(X - \mu)^2] = \sigma^2 = \sum_{i=1}^n w_i((\mu_i - \mu)^{2} + \sigma_i^2) .</math>
 
These relations highlight the potential of mixture distributions to display non-trivial higher-order moments such as [[skewness]] and [[kurtosis]] ('''[[fat tail]]s''') and multi-modality, even in the absence of such features within the components themselves.  Marron and Wand (1992) give an illustrative account of the flexibility of this framework.
 
===Modes===
 
The question of [[Multimodal distribution|multimodality]] is simple for some cases, such as mixtures of [[exponential distribution]]s: all such mixtures are [[Unimodality|unimodal]].<ref>Frühwirth-Schnatter (2006, Ch.1)</ref> However, for the case of mixtures of [[normal distribution]]s, it is a complex one. Conditions for the number of modes in a multivariate normal mixture are explored by Ray and Lindsay<ref name=rayLindsay>{{cite
|title=The topography of multivariate normal mixtures|
last1=Ray |first1=R.|
last2=Lindsay |first2= B.
|year=2005
|journal = The Annals of Statistics
|volume  =33
|number  =5
|pages  =2042–2065
}}</ref> extending the earlier work on univariate <ref name=Robertson1969>Robertson CA, Fryer JG (1969) Some descriptive properties of normal mixtures. Skand Aktuarietidskr 137–146</ref><ref name=Behboodian1970>Behboodian J (1970) On the modes of a mixture of two normal distributions. Technometrics 12: 131–139</ref> and multivariate distributions (Carreira-Perpinan and Williams, 2003){{full|date=November 2012}}.
 
Here the problem of evaluation of the modes of a ''n'' component mixture in a ''D'' dimensional space is reduced to identification of critical points (local minima, maxima and saddle points) on a [[manifold]] referred to as the ridgeline surface, which is the image of the ridgeline function
:<math> x^{*}(\alpha) = \left[ \sum_{i=1}^{n} \alpha_i \Sigma_i^{-1} \right]^{-1} \times \left[  \sum_{i=1}^{n}  \alpha_i \Sigma_i^{-1} \mu_i \right],
</math>
where ''α'' belongs to the {{nowrap|''n'' − 1}} dimensional '''unit simplex'''
<math> \mathcal{S}_n  =
\{ \alpha \in \mathbb{R}^n: \alpha_i \in [0,1], \sum_{i=1}^n \alpha_i = 1 \}
</math>
and {{nowrap|Σ<sub>''i''</sub> ∈ '''R'''<sup>''D'' × ''D''</sup>, ''μ<sub>i</sub>'' ∈ '''R'''<sup>''D''</sup>}} correspond to the covariance and mean of the ''i''<sup>th</sup> component.  Ray and Lindsay{{citation needed|date=December 2010}} consider the case in which {{nowrap|''n'' − 1 < ''D''}} showing a one-to-one correspondence of modes of the mixture and those on the '''elevation function''' {{nowrap|''h''(''α'') {{=}} ''q''(''x*''(''α''))}}
thus one may identify the modes by solving  <math>  \frac{d h(\alpha)}{d \alpha} = 0 </math>  with respect to ''α'' and determining the value ''x*''(''α''). 
 
Using graphical tools, the potential multi-modality of {{nowrap|''n'' {{=}} {2, 3}}} mixtures is demonstrated; in particular it is shown that the number of modes may exceed ''n'' and that the modes may not be coincident with the component means.  For two components they develop a graphical tool for analysis by instead solving the aforementioned differential with respect to ''w''<sub>1</sub> and expressing the solutions as a function Π(''α''), {{nowrap|''α'' ∈ [0, 1]}} so that the number and location of modes for a given value of ''w''<sub>1</sub> corresponds to the number of intersections of the graph on the line {{nowrap|Π(''α'') {{=}} ''w''<sub>1</sub>}}.  This in turn can be related to the number of oscillations of the graph and therefore to solutions of <math> \frac{d \Pi(\alpha)}{d \alpha} = 0 </math> leading to an explicit solution for a two component [[homoscedastic]] mixture given by
:<math>  1 - \alpha(1-\alpha) d_M(\mu_1, \mu_2, \Sigma)^2 </math>
where {{nowrap|''d<sub>M</sub>''(''μ''<sub>1</sub>, ''μ''<sub>2</sub>, Σ) {{=}} (''μ''<sub>2</sub> − ''μ''<sub>1</sub>)<sup>''T''</sup>Σ<sup>−1</sup>(''μ''<sub>2</sub> − ''μ''<sub>1</sub>)}} is the [[Mahalanobis distance]].  
 
Since the above is quadratic it follows that in this instance there are at most two modes irrespective of the dimension or the weights.
 
== Examples ==
Simple examples can be given by a mixture of two normal distributions.
 
Given an equal (50/50) mixture of two normal distributions with the same standard deviation and different means ([[homoscedastic]]), the overall distribution will exhibit low [[kurtosis]] relative to a single normal distribution – the means of the subpopulations fall on the shoulders of the overall distribution. If sufficiently separated, namely by twice the (common) standard deviation, so <math>\left|\mu_1 - \mu_2\right| > 2\sigma,</math> these form a [[bimodal distribution]], otherwise it simply has a wide peak.<ref name="Schilling2002">{{Cite journal|title=Is Human Height Bimodal?|first1=Mark F. |last1=Schilling |first2= Ann E.| last2=Watkins |first3=William |last3=Watkins| journal=[[The American Statistician]]| doi=10.1198/00031300265 |volume=56 |year=2002| pages=223–229 |issue=3}}</ref> The variation of the overall population will also be greater than the variation of the two subpopulations (due to spread from different means), and thus exhibits [[overdispersion]] relative to a normal distribution with fixed variation <math>\sigma,</math> though it will not be overdispersed relative to a normal distribution with variation equal to variation of the overall population.
 
Alternatively, given two subpopulations with the same mean and different standard deviations, the overall population will exhibit high kurtosis, with a sharper peak and heavier tails (and correspondingly shallower shoulders) than a single distribution.
 
<gallery>
File:Bimodal.png|Univariate mixture distribution, showing bimodal distribution
File:Bimodal-bivariate-small.png|Multivariate mixture distribution, showing four modes
</gallery>
 
== Applications ==
{{details|Mixture model}}
{{Expand section|date=March 2009}}
Mixture densities are complicated densities expressible in terms of simpler densities (the mixture components), and are used both because they provide a good model for certain data sets (where different subsets of the data exhibit different characteristics and can best be modeled separately), and because they can be more mathematically tractable, because the individual mixture components can be more easily studied than the overall mixture density.
 
Mixture densities can be used used to model a [[statistical population]] with [[subpopulation]]s, where the mixture components are the densities on the subpopulations, and the weights are the proportions of each subpopulation in the overall population.
 
Mixture densities can also be used to model [[experimental error]] or contamination – one assumes that most of the samples measure the desired phenomenon,
 
Parametric statistics that assume no error often fail on such mixture densities – for example, statistics that assume normality often fail disastrously in the presence of even a few [[outliers]] – and instead one uses [[robust statistics]].
 
In [[meta-analysis]] of separate studies, [[study heterogeneity]] causes distribution of results to be a mixture distribution, and leads to [[overdispersion]] of results relative to predicted error. For example, in a [[statistical survey]], the [[margin of error]] (determined by sample size) predicts the [[sampling error]] and hence dispersion of results on repeated surveys. The presence of study heterogeneity (studies have different [[sampling bias]]) increases the dispersion relative to the margin of error.
 
== See also ==
* [[Convex combination]]
* [[Expectation-maximization algorithm]]
* [[Fat tail]]
* Not to be confused with: [[List_of_convolutions_of_probability_distributions]]
 
=== Mixture ===
* [[Mixture (probability)]]
* [[Mixture model]]
 
=== Hierarchical models ===
* [[Graphical model]]
* [[Hierarchical Bayes model]]
 
==Notes==
{{reflist}}
 
== References ==
*{{cite
|title=Finite Mixture and Markov Switching Models|
last1=Frühwirth-Schnatter |first1=Sylvia
|date=2006
|publisher=Springer
|isbn=978-1-4419-2194-9
}}
 
{{DEFAULTSORT:Mixture Density}}
[[Category:Probability distributions]]
[[Category:Systems of probability distributions]]
 
[[de:Mischverteilung]]
[[fr:Densité mélange]]

Revision as of 23:10, 28 February 2014

Adrianne Swoboda is what your wife husband loves to e-mail her though she doesn't always really like being names like that. After being out of his own job for years so santa became an order sales person. What my friend loves doing is to go to karaoke but the woman is thinking on starting something more challenging. Massachusetts could where he's always been living. She is running and looking after a blog here: http://circuspartypanama.com

Also visit my web site: clash of clans hack no survey mac