|
|
Line 1: |
Line 1: |
| In [[statistics]], '''D’Agostino’s ''K''<sup>2</sup> test''' is a [[goodness-of-fit]] measure of departure from [[normal distribution|normality]], that is the test aims to establish whether or not the given sample comes from a normally distributed population. The test is based on transformations of the sample [[kurtosis]] and [[skewness]], and has power only against the alternatives that the distribution is skewed and/or kurtic.
| | I would like to introduce myself to you, I am Andrew and my spouse doesn't like it at all. Some time ago he chose to reside in North Carolina and he doesn't strategy on changing it. One of the issues she loves most is canoeing and she's been performing it for quite a while. love [http://isaworld.pe.kr/?document_srl=392088 psychic phone] ([http://165.132.39.93/xe/visitors/372912 165.132.39.93]) Invoicing is my occupation.<br><br>Here is my homepage; psychics online - [http://cspl.postech.Ac.kr/zboard/Membersonly/144571 http://cspl.postech.Ac.kr/zboard/Membersonly/144571] - |
| | |
| == Skewness and kurtosis ==
| |
| In the following, let { ''x<sub>i</sub>'' } denote a sample of ''n'' observations, ''g''<sub>1</sub> and ''g''<sub>2</sub> are the sample [[skewness]] and [[kurtosis]], ''m<sub>j</sub>''’s are the ''j''-th sample [[central moment]]s, and <math style="position:relative;top:-.3em">\bar{x}</math> is the sample [[mean]]. (Note that quite frequently in the literature related to [[normality tests|normality testing]] the skewness and kurtosis are denoted as √''β''<sub>1</sub> and ''β''<sub>2</sub> respectively. Such notation is less convenient since for example √''β''<sub>1</sub> can be a negative quantity).
| |
| | |
| The sample skewness and kurtosis are defined as
| |
| : <math>\begin{align} | |
| & g_1 = \frac{ m_3 }{ m_2^{3/2} } = \frac{\frac{1}{n} \sum_{i=1}^n \left( x_i - \bar{x} \right)^3}{\left( \frac{1}{n} \sum_{i=1}^n \left( x_i - \bar{x} \right)^2 \right)^{3/2}}\ , \\
| |
| & g_2 = \frac{ m_4 }{ m_2^{2} }-3 = \frac{\frac{1}{n} \sum_{i=1}^n \left( x_i - \bar{x} \right)^4}{\left( \frac{1}{n} \sum_{i=1}^n \left( x_i - \bar{x} \right)^2 \right)^2} - 3\ .
| |
| \end{align}</math>
| |
| | |
| These quantities [[consistent estimator|consistently]] estimate the theoretical skewness and kurtosis of the distribution, respectively. Moreover, if the sample indeed comes from a normal population, then the exact finite sample distributions of the skewness and kurtosis can themselves be analysed in terms of their means ''μ''<sub>1</sub>, variances ''μ''<sub>2</sub>, skewnesses ''γ''<sub>1</sub>, and kurtoses ''γ''<sub>2</sub>. This has been done by {{harvtxt|Pearson|1931}}, who derived the following expressions:{{better source|reason=need more accessible source so that quoted expression can be checked|date=November 2010}}
| |
| | |
| : <math>\begin{align}
| |
| & \mu_1(g_1) = 0, \\
| |
| & \mu_2(g_1) = \frac{ 6(n-2) }{ (n+1)(n+3) }, \\
| |
| & \gamma_1(g_1) \equiv \frac{\mu_3(g_1)}{\mu_2(g_1)^{3/2}} = 0, \\
| |
| & \gamma_2(g_1) \equiv \frac{\mu_4(g_1)}{\mu_2(g_1)^{2}}-3 = \frac{ 36(n-7)(n^2+2n-5) }{ (n-2)(n+5)(n+7)(n+9) }.
| |
| \end{align}</math>
| |
| and
| |
| : <math>\begin{align} | |
| & \mu_1(g_2) = - \frac{6}{n+1}, \\
| |
| & \mu_2(g_2) = \frac{ 24n(n-2)(n-3) }{ (n+1)^2(n+3)(n+5) }, \\
| |
| & \gamma_1(g_2) \equiv \frac{\mu_3(g_2)}{\mu_2(g_2)^{3/2}} = \frac{6(n^2-5n+2)}{(n+7)(n+9)} \sqrt{\frac{6(n+3)(n+5)}{n(n-2)(n-3)}}, \\
| |
| & \gamma_2(g_2) \equiv \frac{\mu_4(g_2)}{\mu_2(g_2)^{2}}-3 = \frac{ 36(15n^6-36n^5-628n^4+982n^3+5777n^2-6402n+900) }{ n(n-3)(n-2)(n+7)(n+9)(n+11)(n+13) }.
| |
| \end{align}</math>
| |
| For example, a sample with size {{nowrap|''n'' {{=}} 1000}} drawn from a normally distributed population can be expected to have a skewness of {{nowrap|0, SD 0.08}} and a kurtosis of {{nowrap|0, SD 0.15}}, where SD indicates the standard deviation.{{citation needed|date=January 2012}}
| |
| | |
| == Transformed sample skewness and kurtosis ==
| |
| The sample skewness ''g''<sub>1</sub> and kurtosis ''g''<sub>2</sub> are both asymptotically normal. However, the rate of their convergence to the distribution limit is frustratingly slow, especially for ''g''<sub>2</sub>. For example even with {{nowrap|1=''n'' = 5000}} observations the sample kurtosis ''g''<sub>2</sub> has both the skewness and the kurtosis of approximately 0.3, which is not negligible. In order to remedy this situation, it has been suggested to transform the quantities ''g''<sub>1</sub> and ''g''<sub>2</sub> in a way that makes their distribution as close to standard normal as possible.
| |
| | |
| In particular, {{harvtxt|D’Agostino|1970}} suggested the following transformation for sample skewness:
| |
| : <math>
| |
| Z_1(g_1) = \delta\cdot \ln\!\left( \frac{g_1}{\alpha\sqrt{\mu_2}} + \sqrt{\frac{g_1^2}{\alpha^2\mu_2} + 1}\right),
| |
| </math>
| |
| where constants ''α'' and ''δ'' are computed as
| |
| : <math>\begin{align}
| |
| & W^2 = \sqrt{2\gamma_2 + 4} - 1, \\
| |
| & \delta = 1 / \sqrt{\ln W}, \\
| |
| & \alpha^2 = 2 / (W^2-1), \\
| |
| \end{align}</math>
| |
| and where ''μ''<sub>2</sub> = ''μ''<sub>2</sub>(''g''<sub>1</sub>) is the variance of ''g''<sub>1</sub>, and ''γ''<sub>2</sub> = ''γ''<sub>2</sub>(''g''<sub>1</sub>) is the kurtosis — the expressions given in the previous section.
| |
| | |
| Similarly, {{harvtxt|Anscombe|Glynn|1983}} suggested a transformation for ''g''<sub>2</sub>, which works reasonably well for sample sizes of 20 or greater:
| |
| : <math>
| |
| Z_2(g_2) = \sqrt{\frac{9A}{2}} \left\{1 - \frac{2}{9A} - \left(\frac{ 1-2/A }{ 1+\frac{g_2-\mu_1}{\sqrt{\mu_2}}\sqrt{2/(A-4)} }\right)^{\!1/3}\right\},
| |
| </math>
| |
| where
| |
| : <math> | |
| A = 6 + \frac{8}{\gamma_1} \left( \frac{2}{\gamma_1} + \sqrt{1+4/\gamma_1^2}\right),
| |
| </math>
| |
| and ''μ''<sub>1</sub> = ''μ''<sub>1</sub>(''g''<sub>2</sub>), ''μ''<sub>2</sub> = ''μ''<sub>2</sub>(''g''<sub>2</sub>), ''γ''<sub>1</sub> = ''γ''<sub>1</sub>(''g''<sub>2</sub>) are the quantities computed by Pearson.
| |
| | |
| == Omnibus ''K''<sup>2</sup> statistic ==
| |
| Statistics ''Z''<sub>1</sub> and ''Z''<sub>2</sub> can be combined to produce an omnibus test, able to detect deviations from normality due to either skewness or kurtosis {{harv|D’Agostino|Belanger|D’Agostino|1990}}:
| |
| : <math>
| |
| K^2 = Z_1(g_1)^2 + Z_2(g_2)^2\,
| |
| </math>
| |
| | |
| If the [[null hypothesis]] of normality is true, then ''K''<sup>2</sup> is approximately [[chi-squared distribution|''χ''<sup>2</sup>-distributed]] with 2 degrees of freedom.
| |
| | |
| Note that the statistics ''g''<sub>1</sub>, ''g''<sub>2</sub> are not independent, only uncorrelated. Therefore their transforms ''Z''<sub>1</sub>, ''Z''<sub>2</sub> will be dependent also {{harv|Shenton|Bowman|1977}}, rendering the validity of ''χ''<sup>2</sup> approximation questionable. Simulations show that under the null hypothesis the ''K''<sup>2</sup> test statistic is characterized by
| |
| <!-- each experiment was based on 1,000,000 simulations -->
| |
| {|class="wikitable" style="text-align:right"
| |
| |-
| |
| !
| |
| ! expected value
| |
| ! standard deviation
| |
| ! 95% quantile
| |
| |-
| |
| |style="text-align:left"| ''n'' = 20
| |
| | 1.971
| |
| | 2.339
| |
| | 6.373
| |
| |-
| |
| |style="text-align:left"| ''n'' = 50
| |
| | 2.017
| |
| | 2.308
| |
| | 6.339
| |
| |-
| |
| |style="text-align:left"| ''n'' = 100
| |
| | 2.026
| |
| | 2.267
| |
| | 6.271
| |
| |-
| |
| |style="text-align:left"| ''n'' = 250
| |
| | 2.012
| |
| | 2.174
| |
| | 6.129
| |
| |-
| |
| |style="text-align:left"| ''n'' = 500
| |
| | 2.009
| |
| | 2.113
| |
| | 6.063
| |
| |-
| |
| |style="text-align:left"| ''n'' = 1000
| |
| | 2.000
| |
| | 2.062
| |
| | 6.038
| |
| |-
| |
| | ''χ''<sup>2</sup>(2) distribution
| |
| | 2.000
| |
| | 2.000
| |
| | 5.991
| |
| |}
| |
| | |
| ==References==
| |
| {{Refbegin}}
| |
| * {{cite journal
| |
| | title = Distribution of the kurtosis statistic ''b''<sub>2</sub> for normal statistics
| |
| | first1 = F.J.
| |
| | last1 = Anscombe
| |
| | first2 = William J.
| |
| | last2 = Glynn
| |
| | year = 1983
| |
| | journal = [[Biometrika]]
| |
| | volume = 70
| |
| | issue = 1
| |
| | pages = 227–234
| |
| | jstor = 2335960
| |
| | ref = CITEREFAnscombeGlynn1983
| |
| }}
| |
| * {{cite journal
| |
| | title = Transformation to normality of the null distribution of ''g''<sub>1</sub>
| |
| | first = Ralph B.
| |
| | last = D’Agostino
| |
| | journal = [[Biometrika]]
| |
| | volume = 57
| |
| | issue = 3
| |
| | year = 1970
| |
| | pages = 679–681
| |
| | jstor = 2334794
| |
| | ref = CITEREFD.E2.80.99Agostino1970
| |
| }}
| |
| * {{cite journal
| |
| | title = A suggestion for using powerful and informative tests of normality
| |
| | author1 = D’Agostino, Ralph B.
| |
| | author2 = Albert Belanger
| |
| | author3 = Ralph B. D’Agostino, Jr
| |
| | journal = [[The American Statistician]]
| |
| | volume = 44
| |
| | issue = 4
| |
| | year = 1990
| |
| | pages = 316–321
| |
| | jstor = 2684359
| |
| |url=http://www.cee.mtu.edu/~vgriffis/CE%205620%20materials/CE5620%20Reading/DAgostino%20et%20al%20-%20normaility%20tests.pdf
| |
| | ref = CITEREFD.E2.80.99AgostinoBelangerD.E2.80.99Agostino1990
| |
| }}
| |
| * {{cite journal
| |
| | last = Pearson | first = Egon S. | authorlink = Egon Pearson
| |
| | title = Note on tests for normality
| |
| | year = 1931
| |
| | journal = [[Biometrika]] | volume = 22 | issue = 3/4
| |
| | pages = 423–424
| |
| | ref = harv
| |
| | jstor = 2332104
| |
| }}
| |
| * {{cite journal
| |
| | title = A bivariate model for the distribution of √b<sub>1</sub> and b<sub>2</sub>
| |
| | last1 = Shenton
| |
| | first1 = L.R.
| |
| | last2 = Bowman
| |
| | first2 = K.O.
| |
| | year = 1977
| |
| | journal = Journal of the American Statistical Association
| |
| | volume = 72
| |
| | issue = 357
| |
| | pages = 206–211
| |
| | ref = CITEREFShentonBowman1977
| |
| | jstor = 2286939
| |
| }}
| |
| {{Refend}}
| |
| | |
| {{DEFAULTSORT:D'agostino'S K-Squared Test}}
| |
| [[Category:Parametric statistics]]
| |
| [[Category:Normality tests]]
| |