|
|
Line 1: |
Line 1: |
| {{move portions from|Sample standard deviation|date=April 2013}}
| | Hello friend. Allow me introduce myself. I am Ron but I don't like when individuals use my complete name. To play badminton is something he really enjoys doing. Interviewing is what she does in her day occupation but soon her husband and her will begin their personal company. Delaware has always been my living location and will by no means move.<br><br>Feel free to surf to my web site :: [http://srgame.co.kr/qna/12373 auto warranty] |
| {{move portions from|Sample variance|date=April 2013}}
| |
| In [[statistics]] and in particular [[statistical theory]], '''unbiased estimation of a standard deviation''' is the calculation from a [[statistical sample]] of an estimated value of the [[standard deviation]] (a measure of [[statistical dispersion]]) of a [[statistical population|population]] of values, in such a way that the [[expected value]] of the calculation equals the true value. Except in some important situations, outlined later, the task has little relevance to applications of statistics since its need is avoided by standard procedures, such as the use of [[significance test]]s and [[confidence intervals]], or by using [[Bayesian analysis]].
| |
| | |
| However, for statistical theory, it provides an exemplar problem in the context of [[estimation theory]] which is both simple to state and for which results cannot be obtained in closed form. It also provides an example where imposing the requirement for [[Bias of an estimator|unbiased estimation]] might be seen as just adding inconvenience, with no real benefit.
| |
| | |
| ==Background==
| |
| In [[statistics]], the [[standard deviation]] of a population of numbers is often estimated from a [[random sample]] drawn from the population. The most common measure used is the sample standard deviation, which is defined by
| |
| :<math>
| |
| s = \sqrt{\frac{1}{n-1} \sum_{i=1}^n (x_i - \overline{x})^2}\,,
| |
| </math>
| |
| where <math>\{x_1,x_2,\ldots,x_n\}</math> is the sample (formally, realizations from a [[random variable]] ''X'') and <math>\overline{x}</math> is the [[Sample mean and sample covariance|sample mean]].
| |
| | |
| One way of seeing that this is a [[biased estimator]] of the standard deviation of the population is to start from the result that ''s''<sup>2</sup> is an [[unbiased estimator]] for the [[variance]] σ<sup>2</sup> of the underlying population if that variance exists and the sample values are drawn independently with replacement. The square root is a nonlinear function, and only linear functions commute with taking the expectation. Since the square root is a concave function, it follows from [[Jensen's inequality]] that the square root of the sample variance is an underestimate.
| |
| | |
| The use of ''n'' − 1 instead of ''n'' in the formula for the sample variance is known as [[Bessel's correction]], which corrects the bias in the estimation of the population ''variance,'' and some, but not all of the bias in the estimation of the sample ''standard deviation.''
| |
| | |
| It is not possible to find an estimate of the standard deviation which is unbiased for all population distributions, as the bias depends on the particular distribution. Much of the following relates to estimation assuming a [[normal distribution]].
| |
| | |
| ==Bias correction==
| |
| | |
| ===Results for the normal distribution===
| |
| [[Image:Stddevc4factor.jpg|thumb|Correction factor ''c''<sub>4</sub> versus sample size ''n''.]]
| |
| | |
| When the random variable is [[normal distribution|normally distributed]], a minor correction exists to eliminate the bias. To derive the correction, note that for normally distributed ''X'', [[Cochran's theorem]] implies that the square of <math>\sqrt{n-1}\,s/\sigma</math> has a [[chi distribution]] with {{nowrap|''n'' − 1}} degrees of freedom. Consequently,
| |
| | |
| :<math>\operatorname{E}[s] = c_4(n)\sigma \,</math>
| |
| | |
| where the correction factor ''c''<sub>4</sub>(''n'') is the scale mean of the chi distribution with {{nowrap|''n'' − 1}} degrees of freedom, <math>\mu_1(n-1)/\sqrt{n-1}.</math> This depends on the sample size ''n,'' and is given as follows {{citation needed|date=December 2013}}:
| |
| | |
| :<math>c_4(n)\,=\,\sqrt{\frac{2}{n-1}}\,\,\,\frac{\Gamma\left(\frac{n}{2}\right)}{\Gamma\left(\frac{n-1}{2}\right)}
| |
| \, = \, 1 - \frac{1}{4n} - \frac{7}{32n^2} - \frac{19}{128n^3} + O(n^{-4})</math>
| |
| | |
| and Γ(·) is the [[gamma function]]. An unbiased estimator of ''σ'' can be obtained by dividing ''s'' by ''c''<sub>4</sub>(''n''). As ''n'' grows large it approaches 1, and even for smaller values the correction is minor. The figure shows a plot of ''c''<sub>4</sub>(''n'') versus sample size. The table below gives numerical values of ''c''<sub>4</sub> and algebraic expressions for some values of ''n''; more complete tables may be found in most textbooks{{Citation needed|date=October 2010}} on [[statistical quality control]].
| |
| | |
| {| class="wikitable" style="text-align:center; width:600px; height:200px; margin: 1em auto 1em auto"
| |
| |-
| |
| ! Sample size
| |
| ! Expression of ''c''<sub>4</sub>
| |
| ! Numerical value
| |
| |-
| |
| | 2
| |
| | <math>\sqrt{\frac{2}{\pi}}</math>
| |
| | 0.7978845608
| |
| |-
| |
| | 3
| |
| | <math>\frac{\sqrt{\pi}}{2}</math>
| |
| | 0.8862269255
| |
| |-
| |
| | 4
| |
| | <math>2\,\sqrt{\frac{2}{3\pi}}</math>
| |
| | 0.9213177319
| |
| |-
| |
| | 5
| |
| | <math>\frac{3}{4}\,\sqrt{\frac{\pi}{2}}</math>
| |
| | 0.9399856030
| |
| |-
| |
| | 6
| |
| | <math>\frac{8}{3}\,\sqrt{\frac{2}{5\pi}}</math>
| |
| | 0.9515328619
| |
| |-
| |
| | 7
| |
| | <math>\frac{5 \sqrt{3\pi}}{16}</math>
| |
| | 0.9593687891
| |
| |-
| |
| | 8
| |
| | <math>\frac{16}{5}\,\sqrt{\frac{2}{7\pi}}</math>
| |
| | 0.9650304561
| |
| |-
| |
| | 9
| |
| | <math>\frac{35 \sqrt{\pi}}{64}</math>
| |
| | 0.9693106998
| |
| |-
| |
| | 10
| |
| | <math>\frac{128}{105}\,\sqrt{\frac{2}{\pi}}</math>
| |
| | 0.9726592741
| |
| |-
| |
| | 100
| |
| |
| |
| | 0.9974779761
| |
| |-
| |
| | 1000
| |
| |
| |
| | 0.9997497811
| |
| |-
| |
| | 10000
| |
| |
| |
| | 0.9999749978
| |
| |-
| |
| | n = 2k
| |
| | <math>\sqrt{\frac{2}{\pi \left ( 2k-1 \right )}}\,\frac{2^{2k-2}\left ( k-1 \right )!^{2}}{\left ( 2k-2 \right )!}</math>
| |
| |-
| |
| | n = 2k+1
| |
| | <math>\sqrt{\frac{\pi}{ k }}\,\frac{\left ( 2k-1 \right )!}{2^{2k-1}\left ( k-1 \right )!^{2}}</math>
| |
| |}
| |
| | |
| It is important to keep in mind this correction only produces an unbiased estimator for normally and independently distributed ''X''. When this condition is satisfied, another result about ''s'' involving ''c''<sub>4</sub>(''n'') is that the [[standard error (statistics)|standard error]] of ''s'' is<ref>Duncan, A. J., ''Quality Control and Industrial Statistics'' 4th Ed., Irwin (1974) ISBN 0-256-01558-9, p.139</ref><ref>* N.L. Johnson, S. Kotz, and N. Balakrishnan, ''Continuous Univariate Distributions, Volume 1'', 2nd edition, Wiley and sons, 1994. ISBN 0-471-58495-9. Chapter 13, Section 8.2</ref> <math>\sigma\sqrt{1-c_4^{2}}</math>, while the [[standard error (statistics)|standard error]] of the unbiased estimator is <math>\sigma\sqrt{c_4^{-2}-1} .</math>
| |
| | |
| === Rule of thumb for the normal distribution ===
| |
| If calculation of the function ''c''<sub>4</sub>(''n'') appears too difficult, there is a simple rule-of-thumb<ref>Richard M. Brugger, "A Note on Unbiased Estimation of the Standard Deviation", The American Statistician (23) 4 p. 32 (1969)</ref> to take the estimator
| |
| : <math>
| |
| \hat\sigma = \sqrt{ \frac{1}{n-1.5} \sum_{i=1}^n(x_i - \bar{x})^2}
| |
| </math>
| |
| The formula differs from the familiar expression for ''s''<sup>2</sup> only by having {{nowrap|''n'' − 1.5}} instead of {{nowrap|''n'' − 1}} in the denominator. This expression is only approximate, in fact
| |
| : <math>
| |
| \operatorname{E}[\hat\sigma] = \sigma\cdot\Big( 1 + \frac{1}{16n^2} + \frac{3}{16n^3} + O(n^{-4}) \Big).
| |
| </math>
| |
| The bias is relatively small: say, for {{nowrap|''n'' {{=}} 3}} it is equal to 1.3%, and for {{nowrap|''n'' {{=}} 9}} the bias is already less than 0.1%.
| |
| | |
| ===Other distributions===
| |
| In cases where [[statistically independent]] data are modelled by a parametric family of distributions other than the [[normal distribution]], the population standard deviation will, if it exists, be a function of the parameters of the model. One general approach to estimation would be [[maximum likelihood]]. Alternatively, it may be possible to use the [[Rao–Blackwell theorem]] as a route to finding a good estimate of the standard deviation. In neither case would the estimates obtained usually be unbiased. Notionally, theoretical adjustments might be obtainable to lead to unbiased estimates but, unlike those for the normal distribution, these would typically depend on the estimated parameters.
| |
| | |
| If the requirement is simply to reduce the bias of an estimated standard deviation, rather than to eliminate it entirely, then two practical approaches are available, both within the context of [[Resampling (statistics)|resampling]]. These are [[Resampling (statistics)#Jackknife|jackknifing]] and [[Bootstrapping (statistics)|bootstrapping]]. Both can be applied either to parametrically based estimates of the standard deviation or to the sample standard deviation.
| |
| | |
| For non-normal distributions an approximate (up to ''O''(''n''<sup>−1</sup>) terms) formula for the unbiased estimator of the standard deviation is
| |
| : <math>
| |
| \hat\sigma = \sqrt{ \frac{1}{n - 1.5 - \tfrac14 \gamma_2} \sum_{i=1}^n (x_i - \bar{x})^2 },
| |
| </math>
| |
| where ''γ''<sub>2</sub> denotes the population [[excess kurtosis]]. The excess kurtosis may be either known beforehand for certain distributions, or estimated from the data.
| |
| | |
| ==Effect of autocorrelation (serial correlation)==
| |
| | |
| The material above, to stress the point again, applies only to independent data. However, real-world data often does not meet this requirement; it is [[autocorrelation|autocorrelated]] (also known as serial correlation). As one example, the successive readings of a measurement instrument that incorporates some form of “smoothing” (more correctly, “filtering”) process will be autocorrelated, since the current reading is calculated from some combination of the prior readings.
| |
| | |
| Estimates of the variance, and standard deviation, of autocorrelated data will be biased. The expected value of the sample variance is<ref>Law and Kelton, ''Simulation Modeling and Analysis'', 2nd Ed. McGraw-Hill (1991), p.284, ISBN 0-07-036698-5. This expression can be derived from its original source in Anderson, ''The Statistical Analysis of Time Series'', Wiley (1971), ISBN 0-471-04745-7, p.448, Equation 51.</ref>
| |
| | |
| :<math>
| |
| {\rm E}\left[ {s^2 } \right]\,\, = \,\,\sigma ^2 \,\left[ {1\,\,\, - \,\,\,{2 \over {n - \,\,1}}\,\,\sum\limits_{k\, = \,1}^{n\, - 1} {\,\left( {1\,\, - \,\,{k \over n}} \right)\rho _k } } \right]</math>
| |
|
| |
| where ''n'' is the sample size (number of measurements) and ''<math>\rho_k</math>'' is the autocorrelation function (ACF) of the data. (Note that the expression in the brackets is simply one minus the average expected autocorrelation for the readings.) If the ACF consists of positive values then the estimate of the variance (and its square root, the standard deviation) will be biased low. That is, the actual variability of the data will be greater than that indicated by an uncorrected variance or standard deviation calculation. It is essential to recognize that, if this expression is to be used to correct for the bias, by dividing the estimate <math>s^2</math> by the quantity in brackets above, then the ACF must be known '''analytically''', not via estimation from the data. This is because the estimated ACF will itself be biased.<ref>Law and Kelton, p.286. This bias is quantified in Anderson, p.448, Equations 52–54.</ref>
| |
| | |
| ===Example of bias in standard deviation===
| |
| To illustrate the magnitude of the bias in the standard deviation, consider a dataset that consists of sequential readings from an instrument that uses a specific digital filter whose ACF is known to be given by
| |
| | |
| :<math>\rho _k = \,\,\left( {\,1\,\, - \,\,\alpha \,} \right)^k</math>
| |
|
| |
| where ''α'' is the parameter of the filter, and it takes values from zero to unity. Thus the ACF is positive and geometrically decreasing. [[Image:Biascurves01.jpeg|thumb|Bias in standard deviation for autocorrelated data.]]The figure shows the ratio of the estimated standard deviation to its known value (which can be calculated analytically for this digital filter), for several settings of ''α'' as a function of sample size ''n''. Changing ''α'' alters the variance reduction ratio of the filter, which is known to be
| |
| | |
| :<math>{\rm VRR}\,\,\, = \,\,{\alpha \over {2\,\, - \,\,\alpha }}</math>
| |
|
| |
| so that smaller values of ''α'' result in more variance reduction, or “smoothing.” The bias is indicated by values on the vertical axis different from unity; that is, if there were no bias, the ratio of the estimated to known standard deviation would be unity. Clearly, for modest sample sizes there can be significant bias (a factor of two, or more).
| |
| | |
| ===Variance of the mean===
| |
| It is often of interest to estimate the variance or standard deviation of an estimated '''mean''' rather than the variance of a population. When the data are autocorrelated, this has a direct effect on the theoretical variance of the sample mean, which is<ref>Law and Kelton, p.285. This equation can be derived from Theorem 8.2.3 of Anderson. It also appears in Box, Jenkins, Reinsel, ''Time Series Analysis: Forecasting and Control'', 4th Ed. Wiley (2008), ISBN 978-0-470-27284-8, p.31.</ref>
| |
| | |
| :<math>
| |
| {\rm Var}\left[ \bar x \right]\,\,\, = \,\,{{\sigma ^2 } \over n}\,\left[ {1\,\,\, + \,\,\,2\,\sum\limits_{k\, = \,1}^{n - 1} {\left( {1\,\, - \,\,{k \over n}} \right)\rho _k } } \right] .</math>
| |
| The variance of the sample mean can then be estimated by substituting an estimate of ''σ''<sup>2</sup>. One such estimate can be obtained from the equation for E[s<sup>2</sup>] given above. First define the following constants, assuming, again, a '''known''' ACF:
| |
| :<math>
| |
| \gamma _1 \,\, \equiv \,\,1\,\,\, - \,\,{2 \over {n\,\, - \,\,1}}\,\,\sum\limits_{k\, = \,1}^{n\, - \,1} {\,\left( {1\,\,- \,\,{k \over n}} \right)} \,\rho _k \,\,\,\,\,\,\,\,\,\,\,\,\gamma _2 \,\, \equiv \,\,1\,\,\, + \,\,2\,\sum\limits_{k\, = \,1}^{n\, - \,1} {\,\left( {1\,\,- \,\,{k \over n}} \right)} \,\rho_k</math>
| |
| so that
| |
| :<math>
| |
| {\rm E}\left[ {s^2 } \right]\,\, = \,\,\sigma ^2 \,\gamma _1 \,\,\,\,\,\, \Rightarrow \,\,\,\,\,\,{\rm E}\left[ {{{s^2 } \over {\gamma _1 }}} \right]\,\,\, = \,\,\,\sigma ^2
| |
| </math>
| |
| This says that the expected value of the quantity obtained by dividing the observed sample variance by the correction factor <math>\gamma_1</math> gives an unbiased estimate of the variance. Similarly, re-writing the expression above for the variance of the mean,
| |
| :<math>
| |
| {\rm Var}\left[ {\bar x} \right]\,\,\, = \,\,\,{{\sigma ^2 } \over n}\,\,\gamma _2</math>
| |
| and substituting the estimate for <math>\sigma^2</math> gives<ref>Law and Kelton, p.285</ref>
| |
| :<math>
| |
| {\rm Var}\left[ {\bar x} \right]\,\,\, = \,\,\,{\rm E}\left[ {{{s^2 } \over {\gamma _1 }}\left( {{{\gamma _2 } \over n}} \right)} \right]\,\,\,\, = \,\,\,{\rm E}\left[ {{{s^2 } \over n}\left\{ {{{n\,\, - \,\,1} \over {{n \over {\gamma _2 }} - \,\,1}}} \right\}} \right]</math>
| |
| which is an unbiased estimator of the variance of the mean in terms of the observed sample variance and known quantities. Note that, if the autocorrelations <math>\rho_k</math> are identically zero, this expression reduces to the well-known result for the variance of the mean for independent data. The effect of the expectation operator in these expressions is that the equality holds in the mean (i.e., on average).
| |
| | |
| ===Estimating the standard deviation of the population===
| |
| | |
| Having the expressions above involving the '''variance''' of the population, and of an estimate of the mean of that population, it would seem logical to simply take the square root of these expressions to obtain unbiased estimates of the respective standard deviations. However it is the case that, since expectations are integrals,
| |
| | |
| :<math>{\rm E}[s]\,\,\, \ne \,\,\sqrt {\,{\rm E}\left[ {s^2 } \right]} \,\,\, \ne \,\,\,\sigma \,\sqrt {\,\gamma _1 }
| |
| | |
| </math>
| |
| | |
| Instead, assume a function ''θ'' exists such that an unbiased estimator of the standard deviation can be written
| |
| | |
| :<math> | |
| {\rm E}\left[ s \right]\,\,\, = \,\,\,\sigma \,\,\theta \sqrt {\,\gamma _1 } \,\,\,\,\,\,\,\, \Rightarrow \,\,\,\,\,\,\,\hat \sigma \,\, = \,\,{s \over {\theta \,\sqrt {\,\gamma _1 } }}
| |
| </math>
| |
| | |
| and ''θ'' depends on the sample size ''n'' and the ACF. In the case of NID (normally and independently distributed) data, the radicand is unity and ''θ <sub> </sub>'' is just the c<sub>4</sub> function given in the first section above. As with c<sub>4</sub>, ''θ'' approaches unity as the sample size increases (as does ''γ<sub>1</sub>'').
| |
| | |
| It can be demonstrated via simulation modeling that ignoring ''θ <sub> </sub>'' (that is, taking it to be unity) and using
| |
| | |
| :<math> | |
| | |
| {\rm E}[s]\,\, \approx \,\,\sigma \,\sqrt {\,\gamma _1 } \,\,\,\,\,\,\,\,\,\,\Rightarrow \,\,\,\,\,\,\,\,\,\hat \sigma \,\,\, \approx \,\,\,{s \over {\sqrt {\,\gamma _1 } }}</math>
| |
| | |
| removes all but a few percent of the bias caused by autocorrelation, making this a ''reduced''-bias estimator, rather than an ''un''biased estimator. In practical measurement situations, this reduction in bias can be significant, and useful, even if some relatively small bias remains. The figure above, showing an example of the bias in the standard deviation vs. sample size, is based on this approximation; the actual bias would be somewhat larger than indicated in those graphs since the transformation bias ''θ'' is not included there.
| |
| | |
| ===Estimating the standard deviation of the mean===
| |
| | |
| The unbiased variance of the mean in terms of the population variance and the ACF is given by
| |
| | |
| :<math>{\rm Var}\left[ {\bar x} \right]\,\,\, = \,\,\,{{\sigma ^2 } \over n}\,\,\gamma _2 \,</math>
| |
| | |
| and since there are no expected values here, in this case the square root can be taken, so that
| |
| | |
| :<math>\sigma _{\bar x} \,\,\,\, = \,\,\,{\sigma \over {\sqrt {\,n} }}\,\,\sqrt {\,\gamma _2 } </math>
| |
| | |
| Using the unbiased estimate expression above for ''σ'', an '''estimate''' of the standard deviation of the mean will then be
| |
| | |
| :<math>
| |
| | |
| \hat \sigma _{\bar x} \,\, = \,\,\,{{s\,} \over {\theta \,\sqrt {\,n} }}{{\sqrt {\,\gamma _2 } } \over {\sqrt {\,\gamma _1 } }}</math>
| |
| | |
| If the data are NID, so that the ACF vanishes, this reduces to
| |
| | |
| :<math>\hat \sigma _{\bar x} \,\,\, = \,\,\,{s \over {c_4 \sqrt {\,n} }}</math>
| |
| | |
| In the presence of a nonzero ACF, ignoring the function ''θ'' as before leads to the ''reduced''-bias estimator
| |
| | |
| :<math>
| |
| | |
| \hat \sigma _{\bar x} \,\,\, \approx \,\,\,\,{{s\,} \over {\sqrt {\,n} }}{{\sqrt {\,\gamma _2 } } \over {\sqrt {\,\gamma _1 } }}\,\,\,\, = \,\,\,\,{{s\,} \over {\sqrt {\,n} }}\sqrt {{{n\,\, - \,\,1} \over {{n \over {\gamma _2 }}\,\, - \,\,1}}}
| |
| | |
| </math>
| |
| | |
| which again can be demonstrated to remove a useful majority of the bias.
| |
| | |
| ==See also==
| |
| *[[Bessel's correction]]
| |
| *[[Estimation of covariance matrices]]
| |
| *[[Sample mean and sample covariance]]
| |
| | |
| ==References==
| |
| {{Reflist}}
| |
| * Douglas C. Montgomery and George C. Runger, ''Applied Statistics and Probability for Engineers'', 3rd edition, Wiley and sons, 2003. (see Sections 7–2.2 and 16–5)
| |
| <References/>
| |
| | |
| ==External links==
| |
| * A [http://www.geogebra.org/en/upload/files/nikenuke/helmert03.html Java interactive graphic ] showing the Helmert PDF from which the bias correction factors are derived.
| |
| * [http://www.mathworks.com/matlabcentral/fileexchange/34408 Monte-Carlo simulation demo for unbiased estimation of standard deviation.]
| |
| * http://www.itl.nist.gov/div898/handbook/pmc/section3/pmc32.htm What are Variables Control Charts?
| |
| | |
| {{NIST-PD}}
| |
| | |
| {{Statistics}}
| |
| | |
| {{DEFAULTSORT:Unbiased Estimation Of Standard Deviation}}
| |
| [[Category:Estimation for specific parameters]]
| |
| [[Category:Summary statistics]]
| |
| [[Category:Covariance and correlation]]
| |