|
|
Line 1: |
Line 1: |
| [[File:Empirical CDF.png|thumb|300px|The blue line shows an empirical distribution function. The black bars represent the samples corresponding to the ecdf and the gray line is the true cumulative distribution function.]]
| | Greetings. Let me start by telling you the author's name - Phebe. My working day job is a librarian. South Dakota is where me and my spouse reside and my family members loves it. What I adore doing is doing ceramics but I haven't made a dime with it.<br><br>My website; [http://www.karachicattleexpo.com/blog/56 over the counter std test] |
| | |
| In [[statistics]], the '''empirical distribution function''', or '''empirical cdf''', is the [[cumulative distribution function]] associated with the [[empirical measure]] of the [[sample (statistics)|sample]]. This cdf is a [[step function]] that jumps up by 1/''n'' at each of the ''n'' data points. The empirical distribution function estimates the true underlying cdf of the points in the sample and converges with probability 1 according to the [[Glivenko–Cantelli theorem]]. A number of results exist to quantify the rate of convergence of the empirical cdf to the underlying cdf.
| |
| | |
| == Definition ==
| |
| Let (''x''<sub>1</sub>, …, ''x''<sub>''n''</sub>) be [[iid]] real random variables with the common [[Cumulative distribution function|cdf]] ''F''(''t''). Then the '''empirical distribution function''' is defined as <ref name="vdv265">{{cite book
| |
| | last = van der Vaart | first = A.W.
| |
| | title = Asymptotic statistics
| |
| | year = 1998
| |
| | publisher = Cambridge University Press
| |
| | isbn = 0-521-78450-6
| |
| | page = 265
| |
| }}</ref><ref>[http://planetmath.org/encyclopedia/EmpiricalDistributionFunction.html PlanetMath]</ref>
| |
| : <math>
| |
| \hat F_n(t) = \frac{ \mbox{number of elements in the sample} \leq t}n =
| |
| \frac{1}{n} \sum_{i=1}^n \mathbf{1}\{x_i \le t\},
| |
| </math>
| |
| where '''1'''{''A''} is the [[indicator function|indicator]] of [[event (probability theory)|event]] ''A''. For a fixed ''t'', the indicator '''1'''{''x<sub>i</sub>'' ≤ ''t''} is a [[Bernoulli distribution|Bernoulli]] random variable with parameter {{nowrap|''p'' {{=}} ''F''(''t'')}}, hence <math style="vertical-align:-.3em">\scriptstyle n \hat F_n(t)</math> is a [[binomial distribution|binomial]] random variable with [[mean]] ''nF''(''t'') and [[variance]] {{nowrap|''nF''(''t'')(1 − ''F''(''t''))}}. This implies that <math style="vertical-align:-.3em">\scriptstyle \hat F_n(t)</math> is an [[bias of an estimator|unbiased]] estimator for ''F''(''t'').
| |
| | |
| == Asymptotic properties ==
| |
| By the [[strong law of large numbers]], the estimator <math style="vertical-align:-.3em">\scriptstyle\hat{F}_n(t)</math> converges to ''F''(''t'') as {{nowrap|''n'' → ∞}} [[almost sure convergence|almost surely]], for every value of ''t'': <ref name="vdv265" />
| |
| : <math>
| |
| \hat F_n(t)\ \xrightarrow{a.s.}\ F(t),
| |
| </math>
| |
| thus the estimator <math style="vertical-align:-.3em">\scriptstyle\hat{F}_n(t)</math> is [[consistent estimator|consistent]]. This expression asserts the pointwise convergence of the empirical distribution function to the true cdf. There is a stronger result, called the [[Glivenko–Cantelli theorem]], which states that the convergence in fact happens uniformly over ''t'': <ref name="vdv266">{{cite book
| |
| | last = van der Vaart | first = A.W.
| |
| | title = Asymptotic statistics
| |
| | year = 1998
| |
| | publisher = Cambridge University Press
| |
| | isbn = 0-521-78450-6
| |
| | page = 266
| |
| }}</ref>
| |
| : <math>
| |
| \|\hat F_n-F\|_\infty \equiv
| |
| \sup_{t\in\mathbb{R}} \big|\hat F_n(t)-F(t)\big|\ \xrightarrow{a.s.}\ 0.
| |
| </math>
| |
| The sup-norm in this expression is called the [[Kolmogorov–Smirnov test|Kolmogorov–Smirnov statistic]] for testing the goodness-of-fit between the empirical distribution <math style="vertical-align:-.3em">\scriptstyle\hat{F}_n(t)</math> and the assumed true cdf ''F''. Other [[norm (mathematics)|norm function]]s may be reasonably used here instead of the sup-norm. For example, the [[Lp norm|L²-norm]] gives rise to the [[Cramér–von Mises criterion|Cramér–von Mises statistic]].
| |
| | |
| The asymptotic distribution can be further characterized in several different ways. First, the [[central limit theorem]] states that ''pointwise'', <math style="vertical-align:-.3em">\scriptstyle\hat{F}_n(t)</math> has asymptotically normal distribution with the standard ''√n'' rate of convergence:<ref name="vdv265"/>
| |
| : <math>
| |
| \sqrt{n}\big(\hat F_n(t) - F(t)\big)\ \ \xrightarrow{d}\ \ \mathcal{N}\Big( 0, F(t)\big(1-F(t)\big) \Big).
| |
| </math>
| |
| This result is extended by the [[Donsker’s theorem]], which asserts that the ''[[empirical process]]'' <math style="vertical-align:-.3em">\scriptstyle\sqrt{n}(\hat{F}_n - F)</math>, viewed as a function indexed by {{nowrap|''t'' ∈ '''R'''}}, [[convergence in distribution|converges in distribution]] in the [[Skorokhod space]] {{nowrap|''D''[−∞, +∞]}} to the mean-zero [[Gaussian process]] {{nowrap|''G<sub>F</sub>'' {{=}} ''B''∘''F''}}, where ''B'' is the standard [[Brownian bridge]].<ref name="vdv266"/> The covariance structure of this Gaussian process is
| |
| : <math> | |
| \mathrm{E}[\,G_F(t_1)G_F(t_2)\,] = F(t_1\wedge t_2) - F(t_1)F(t_2).
| |
| </math>
| |
| The uniform rate of convergence in Donsker’s theorem can be quantified by the result, known as the [[Hungarian embedding]]: <ref name="vdv268">{{cite book
| |
| | last = van der Vaart | first = A.W.
| |
| | title = Asymptotic statistics
| |
| | year = 1998
| |
| | publisher = Cambridge University Press
| |
| | isbn = 0-521-78450-6
| |
| | page = 268
| |
| }}</ref>
| |
| : <math>
| |
| \limsup_{n\to\infty} \frac{\sqrt{n}}{\ln^2 n} \big\| \sqrt{n}(\hat F_n-F) - G_{F,n}\big\|_\infty < \infty, \quad \text{a.s.}
| |
| </math>
| |
| | |
| Alternatively, the rate of convergence of <math style="vertical-align:-.3em">\scriptstyle\sqrt{n}(\hat{F}_n-F)</math> can also be quantified in terms of the asymptotic behavior of the sup-norm of this expression. Number of results exist in this venue, for example the [[Dvoretzky–Kiefer–Wolfowitz inequality]] provides bound on the tail probabilities of <math style="vertical-align:-.3em">\scriptstyle\sqrt{n}\|\hat{F}_n-F\|_\infty</math>:<ref name="vdv268"/>
| |
| : <math>
| |
| \Pr\!\Big( \sqrt{n}\|\hat{F}_n-F\|_\infty > z \Big) \leq 2e^{-2z^2}.
| |
| </math>
| |
| In fact, Kolmogorov has shown that if the cdf ''F'' is continuous, then the expression <math style="vertical-align:-.3em">\scriptstyle\sqrt{n}\|\hat{F}_n-F\|_\infty</math> converges in distribution to ||''B''||<sub>∞</sub>, which has the [[Kolmogorov distribution]] that does not depend on the form of ''F''.
| |
| | |
| Another result, which follows from the [[law of the iterated logarithm]], is that <ref name="vdv268"/>
| |
| : <math>
| |
| \limsup_{n\to\infty} \frac{\sqrt{n}\|\hat{F}_n-F\|_\infty}{\sqrt{2\ln\ln n}} \leq \frac12, \quad \text{a.s.}
| |
| </math>
| |
| and
| |
| : <math>
| |
| \liminf_{n\to\infty} \sqrt{2n\ln\ln n} \|\hat{F}_n-F\|_\infty = \frac{\pi}{2}, \quad \text{a.s.}
| |
| </math>
| |
| | |
| == See also ==
| |
| * [[Càdlàg]] functions
| |
| * [[Dvoretzky–Kiefer–Wolfowitz inequality]]
| |
| * [[Empirical probability]]
| |
| * [[Empirical process]]
| |
| * [[Kaplan–Meier estimator]] for censored processes
| |
| * [[Survival function]]
| |
| * [[Distribution fitting]]
| |
| | |
| == References ==
| |
| {{reflist}}
| |
| | |
| == Further reading ==
| |
| * {{cite book
| |
| | last1 = Shorack | first1 = G.R.
| |
| | last2 = Wellner | first2 = J.A.
| |
| | title = Empirical Processes with Applications to Statistics
| |
| | year = 1986
| |
| | publisher = Wiley
| |
| | location = New York
| |
| | isbn = 0-471-86725-X
| |
| | ref = harv
| |
| }}
| |
| | |
| == External links ==
| |
| {{commonscat|Cumulative frequency distribution|Empirical distribution functions}}
| |
| | |
| {{DEFAULTSORT:Empirical Distribution Function}}
| |
| [[Category:Data analysis]]
| |
| [[Category:Non-parametric statistics]]
| |
| [[Category:Empirical process]]
| |
Greetings. Let me start by telling you the author's name - Phebe. My working day job is a librarian. South Dakota is where me and my spouse reside and my family members loves it. What I adore doing is doing ceramics but I haven't made a dime with it.
My website; over the counter std test