D'Agostino's K-squared test
In statistics, D’Agostino’s K^{2} test is a goodness-of-fit measure of departure from normality, that is the test aims to establish whether or not the given sample comes from a normally distributed population. The test is based on transformations of the sample kurtosis and skewness, and has power only against the alternatives that the distribution is skewed and/or kurtic.
Skewness and kurtosis
In the following, let { x_{i} } denote a sample of n observations, g_{1} and g_{2} are the sample skewness and kurtosis, m_{j}’s are the j-th sample central moments, and is the sample mean. (Note that quite frequently in the literature related to normality testing the skewness and kurtosis are denoted as √β_{1} and β_{2} respectively. Such notation is less convenient since for example √β_{1} can be a negative quantity).
The sample skewness and kurtosis are defined as
These quantities consistently estimate the theoretical skewness and kurtosis of the distribution, respectively. Moreover, if the sample indeed comes from a normal population, then the exact finite sample distributions of the skewness and kurtosis can themselves be analysed in terms of their means μ_{1}, variances μ_{2}, skewnesses γ_{1}, and kurtoses γ_{2}. This has been done by Template:Harvtxt, who derived the following expressions:Template:Better source
and
For example, a sample with size n = 1000 drawn from a normally distributed population can be expected to have a skewness of 0, SD 0.08 and a kurtosis of 0, SD 0.15, where SD indicates the standard deviation.{{ safesubst:#invoke:Unsubst||date=__DATE__ |$B= {{#invoke:Category handler|main}}{{#invoke:Category handler|main}}^{[citation needed]} }}
Transformed sample skewness and kurtosis
The sample skewness g_{1} and kurtosis g_{2} are both asymptotically normal. However, the rate of their convergence to the distribution limit is frustratingly slow, especially for g_{2}. For example even with n = 5000 observations the sample kurtosis g_{2} has both the skewness and the kurtosis of approximately 0.3, which is not negligible. In order to remedy this situation, it has been suggested to transform the quantities g_{1} and g_{2} in a way that makes their distribution as close to standard normal as possible.
In particular, Template:Harvtxt suggested the following transformation for sample skewness:
where constants α and δ are computed as
and where μ_{2} = μ_{2}(g_{1}) is the variance of g_{1}, and γ_{2} = γ_{2}(g_{1}) is the kurtosis — the expressions given in the previous section.
Similarly, Template:Harvtxt suggested a transformation for g_{2}, which works reasonably well for sample sizes of 20 or greater:
where
and μ_{1} = μ_{1}(g_{2}), μ_{2} = μ_{2}(g_{2}), γ_{1} = γ_{1}(g_{2}) are the quantities computed by Pearson.
Omnibus K^{2} statistic
Statistics Z_{1} and Z_{2} can be combined to produce an omnibus test, able to detect deviations from normality due to either skewness or kurtosis Template:Harv:
If the null hypothesis of normality is true, then K^{2} is approximately χ^{2}-distributed with 2 degrees of freedom.
Note that the statistics g_{1}, g_{2} are not independent, only uncorrelated. Therefore their transforms Z_{1}, Z_{2} will be dependent also Template:Harv, rendering the validity of χ^{2} approximation questionable. Simulations show that under the null hypothesis the K^{2} test statistic is characterized by
expected value | standard deviation | 95% quantile | |
---|---|---|---|
n = 20 | 1.971 | 2.339 | 6.373 |
n = 50 | 2.017 | 2.308 | 6.339 |
n = 100 | 2.026 | 2.267 | 6.271 |
n = 250 | 2.012 | 2.174 | 6.129 |
n = 500 | 2.009 | 2.113 | 6.063 |
n = 1000 | 2.000 | 2.062 | 6.038 |
χ^{2}(2) distribution | 2.000 | 2.000 | 5.991 |
References
- {{#invoke:Citation/CS1|citation
|CitationClass=journal }}
- {{#invoke:Citation/CS1|citation
|CitationClass=journal }}
- {{#invoke:Citation/CS1|citation
|CitationClass=journal }}
- {{#invoke:Citation/CS1|citation
|CitationClass=journal }}
- {{#invoke:Citation/CS1|citation
|CitationClass=journal }} Template:Refend