Main Page: Difference between revisions

From formulasearchengine
Jump to navigation Jump to search
No edit summary
No edit summary
Line 1: Line 1:
In statistics, an '''[[F-test|''F''-test]] for the [[null hypothesis]] that two [[normal distribution|normal]] populations have the same [[variance]]''' is sometimes used, although it needs to be used with caution as it can be sensitive to the assumption that the variables have this distribution.
'''Least trimmed squares''' ('''LTS'''), or '''least trimmed sum of squares''', is a [[robust statistics|robust statistical method]] that fits a function to a set of data whilst not being unduly affected by the presence of [[outlier]]s. It is one of a number of methods for [[robust regression]].


Notionally, any F-test can be regarded as a comparison of two variances, but the specific case being discussed in this article is that of two populations, where the [[test statistic]] used is the ratio of two [[sample variance]]s. This particular situation is of importance in [[mathematical statistics]] since it provides a basic exemplar case in which the [[F-distribution]] can be derived.<ref>Johnson, N.L., Kotz, S., Balakrishnan, N. (1995) ''Continuous Univariate Distributions, Volume 2'', Wiley.  ISBN 0-471-58494-0 (Section 27.1)</ref>  For application in [[applied statistics]], there is concern{{Citation needed|date=May 2010}} that the test is so sensitive to the assumption of normality that it would be inadvisable to use it as a routine test for the equality of variances. In other words, this is a case where "approximate normality" (which in similar contexts would often be justified using the [[central limit theorem]]), is not good enough to make the test procedure approximately valid to an acceptable degree.
== Description of method ==
Instead of the standard [[least squares]] method, which minimises the [[Residual sum of squares|sum of squared residuals]] over ''n'' points, the LTS method attempts to minimise the sum of squared residuals over a subset, ''k'', of those points. The ''n-k'' points which are not used do not influence the fit.


== The test ==
In a standard least squares problem, the estimated parameter values, &beta;, are defined to be those values that minimise the objective function,  ''S''(&beta;), of squared residuals
:<math>S=\sum_{i=1}^{n}{r_i(\beta)}^2</math>,
where the [[errors and residuals in statistics|residuals]] are defined as the differences between the values of the [[Dependent and independent variables|dependent variables]] (observations) and the model values


Let ''X''<sub>1</sub>,&nbsp;...,&nbsp;''X''<sub>''n''</sub> and ''Y''<sub>1</sub>,&nbsp;...,&nbsp;''Y''<sub>''m''</sub> be [[independent and identically distributed]] samples from two populations which each have a [[normal distribution]]. The [[expected value]]s for the two populations can be different, and the hypothesis to be tested is that the variances are equal. Let
:<math>r_i(\beta)= y_i - f(x_i, \beta),</math>


: <math> \overline{X} = \frac{1}{n}\sum_{i=1}^n X_i\text{ and }\overline{Y} = \frac{1}{m}\sum_{i=1}^m Y_i</math>
and where ''n'' is the overall number of data points. For a least trimmed squares analysis, this objective function is replaced by one constructed in the following way. For a fixed value of &beta;, let <math> r_{(j)}(\beta) </math> denote the set of ordered absolute values of the residuals (in increasing order of absolute value). In this notation, the standard sum of squares function is
:<math>S(\beta)=\sum_{j=1}^n (r_{(j)}(\beta))^2,</math>
while the objective function for LTS is
:<math>S_k(\beta)=\sum_{j=1}^k (r_{(j)}(\beta))^2.</math>


be the [[sample mean]]s. Let
== Computational considerations ==
Because this method is binary, in that points are either included or excluded, no closed form solution exists. As a result, methods which try to find a LTS solution through a problem sift through combinations of the data, attempting to find the ''k'' subset which yields the lowest sum of squared residuals. Methods exist for low ''n'' which will find the exact solution, however as ''n'' rises, the number of combinations grows rapidly, thus yielding methods which attempt to find approximate (but generally sufficient) solutions.


: <math> S_X^2 = \frac{1}{n-1}\sum_{i=1}^n \left(X_i - \overline{X}\right)^2\text{ and }S_Y^2 = \frac{1}{m-1}\sum_{i=1}^m \left(Y_i - \overline{Y}\right)^2 </math>
== References ==
* [[Peter Rousseeuw|Rousseeuw, P. J.]] (1984) "Least Median of Squares Regression" ''Journal of the American Statistical Association'', 79, 871&ndash;880. {{JSTOR|2288718}}
*Rousseeuw, P. J., Leroy A.M.  (1987) ''Robust Regression and Outlier Detection'', Wiley. ISBN 978-0-471-85233-9 (Published online 2005 {{DOI| 10.1002/0471725382}} )
*Li, L.M. (2005) "An algorithm for computing exact least-trimmed squares estimate of simple linear regression with constraints", ''Computational Statistics & Data Analysis'', 48 (4), 717&ndash;734. {{DOI| 10.1016/j.csda.2004.04.003.}}
*Atkinson, A.C., Cheng, T.-C. (1999) "Computing least trimmed squares regression with the forward search", ''Statistics and Computing'', 9 (4), 251&ndash;263. {{DOI| 10.1023/A:1008942604045}}  
*Jung, Kang-Mo (2007) "Least Trimmed Squares Estimator in the Errors-in-Variables Model", ''Journal of Applied Statistics'', 34 (3), 331&ndash;338. {{DOI| 10.1080/02664760601004973}}


be the [[sample variance]]s.  Then the test statistic
[[Category:Robust statistics]]
 
[[Category:Robust regression]]
: <math> F = \frac{S_X^2}{S_Y^2} </math>
 
has an [[F-distribution]] with ''n''&nbsp;&minus;&nbsp;1 and ''m''&nbsp;&minus;&nbsp;1 degrees of freedom if the [[null hypothesis]] of equality of variances is true.  Otherwise it has a non-central F-distribution.  The null hypothesis is rejected if ''F'' is either too large or too small.
 
== Properties==
This F-test is known to be extremely sensitive to [[normal distribution|non-normality]],<ref>{{cite journal | last=Box | first=G.E.P. |authorlink=George E. P. Box| journal=Biometrika | year=1953 | title=Non-Normality and Tests on Variances  | pages=318&ndash;335 | volume=40 | jstor=2333350 | issue=3/4 | doi=10.1093/biomet/40.3-4.318}}</ref><ref>{{cite journal | last=Markowski | first=Carol A |author2=Markowski, Edward P. | year = 1990 | title=Conditions for the Effectiveness of a Preliminary Test of Variance | journal=The American Statistician | pages=322&ndash;326 | volume=44 | jstor=2684360 | doi=10.2307/2684360 | issue=4}}</ref> so [[Levene's test]], [[Bartlett's test]], or the [[Brown–Forsythe test]] are better tests for testing the equality of two variances. (However, all of these tests create experiment-wise [[type I error]] inflations when conducted as a test of the assumption of [[homoscedasticity]] prior to a test of effects.<ref>Sawilowsky, S. (2002). [http://digitalcommons.wayne.edu/coe_tbf/23 "Fermat, Schubert, Einstein, and Behrens–Fisher:The Probable Difference Between Two Means When σ<sub>1</sub><sup>2</sup> ≠ σ<sub>2</sub><sup>2</sup>"], ''Journal of Modern Applied Statistical Methods'', ''1''(2), 461&ndash;472.</ref>) F-tests for the equality of variances can be used in practice, with care, particularly where a quick check is required, and subject to associated diagnostic checking: practical text-books<ref>Rees, D.G. (2001) ''Essential Statistics (4th Edition)'', Chapman & Hall/CRC, ISBN 1-58488-007-4. Section 10.15</ref> suggest both graphical and formal checks of the assumption.
 
[[F-test]]s are used for other statistical [[hypothesis test|tests of hypotheses]], such as testing for differences in means in three or more groups, or in factorial layouts. These F-tests are generally not [[robust statistics|robust]] when there are violations of the assumption that each population follows the [[normal distribution]], particularly for small alpha levels and unbalanced layouts.<ref>Blair, R. C. (1981). "A reaction to ‘Consequences of failure to meet assumptions underlying the fixed effects analysis of variance and covariance.’" ''Review of Educational Research'', ''51'', 499–507.</ref> However, for large alpha levels (e.g., at least 0.05) and balanced layouts, the F-test is relatively robust, although (if the normality assumption does not hold) it suffers from a loss in comparative statistical power as compared with non-parametric counterparts.
 
==Generalization==
The immediate generalization of the problem outlined above is to situations where there are more than two groups or populations, and the hypothesis is that all of the variances are equal. This is the problem treated by [[Hartley's test]] and [[Bartlett's test]].
 
==See also==
*[[Goldfeld–Quandt test]]
 
==References==
{{reflist|30em}}
 
{{Statistics}}
 
{{DEFAULTSORT:F-Test Of Equality Of Variances}}
[[Category:Statistical ratios]]
[[Category:Statistical tests]]

Revision as of 04:43, 18 August 2014

Least trimmed squares (LTS), or least trimmed sum of squares, is a robust statistical method that fits a function to a set of data whilst not being unduly affected by the presence of outliers. It is one of a number of methods for robust regression.

Description of method

Instead of the standard least squares method, which minimises the sum of squared residuals over n points, the LTS method attempts to minimise the sum of squared residuals over a subset, k, of those points. The n-k points which are not used do not influence the fit.

In a standard least squares problem, the estimated parameter values, β, are defined to be those values that minimise the objective function, S(β), of squared residuals

S=i=1nri(β)2,

where the residuals are defined as the differences between the values of the dependent variables (observations) and the model values

ri(β)=yif(xi,β),

and where n is the overall number of data points. For a least trimmed squares analysis, this objective function is replaced by one constructed in the following way. For a fixed value of β, let r(j)(β) denote the set of ordered absolute values of the residuals (in increasing order of absolute value). In this notation, the standard sum of squares function is

S(β)=j=1n(r(j)(β))2,

while the objective function for LTS is

Sk(β)=j=1k(r(j)(β))2.

Computational considerations

Because this method is binary, in that points are either included or excluded, no closed form solution exists. As a result, methods which try to find a LTS solution through a problem sift through combinations of the data, attempting to find the k subset which yields the lowest sum of squared residuals. Methods exist for low n which will find the exact solution, however as n rises, the number of combinations grows rapidly, thus yielding methods which attempt to find approximate (but generally sufficient) solutions.

References

  • Rousseeuw, P. J. (1984) "Least Median of Squares Regression" Journal of the American Statistical Association, 79, 871–880. Glazier Alfonzo from Chicoutimi, has lots of interests which include lawn darts, property developers house for sale in singapore singapore and cigar smoking. During the last year has made a journey to Cultural Landscape and Archaeological Remains of the Bamiyan Valley.
  • Rousseeuw, P. J., Leroy A.M. (1987) Robust Regression and Outlier Detection, Wiley. ISBN 978-0-471-85233-9 (Published online 2005 Electronic Instrument Positions Staff (Standard ) Cameron from Clarence Creek, usually spends time with hobbies and interests which include knotting, property developers in singapore apartment For sale and boomerangs. Has enrolled in a world contiki journey. Is extremely thrilled specifically about visiting . )
  • Li, L.M. (2005) "An algorithm for computing exact least-trimmed squares estimate of simple linear regression with constraints", Computational Statistics & Data Analysis, 48 (4), 717–734. Electronic Instrument Positions Staff (Standard ) Cameron from Clarence Creek, usually spends time with hobbies and interests which include knotting, property developers in singapore apartment For sale and boomerangs. Has enrolled in a world contiki journey. Is extremely thrilled specifically about visiting .
  • Atkinson, A.C., Cheng, T.-C. (1999) "Computing least trimmed squares regression with the forward search", Statistics and Computing, 9 (4), 251–263. Electronic Instrument Positions Staff (Standard ) Cameron from Clarence Creek, usually spends time with hobbies and interests which include knotting, property developers in singapore apartment For sale and boomerangs. Has enrolled in a world contiki journey. Is extremely thrilled specifically about visiting .
  • Jung, Kang-Mo (2007) "Least Trimmed Squares Estimator in the Errors-in-Variables Model", Journal of Applied Statistics, 34 (3), 331–338. Electronic Instrument Positions Staff (Standard ) Cameron from Clarence Creek, usually spends time with hobbies and interests which include knotting, property developers in singapore apartment For sale and boomerangs. Has enrolled in a world contiki journey. Is extremely thrilled specifically about visiting .