|
|
Line 1: |
Line 1: |
| '''Winsorising''' or '''Winsorization''' (this is also sometimes called Georgization{{citation needed|date=November 2013}}) is the transformation of [[statistic]]s by limiting [[extreme value]]s in the [[statistics|statistical]] data to reduce the effect of possibly spurious [[outliers]]. It is named after the engineer-turned-biostatistician [[Charles P. Winsor]] (1895–1951). The effect is the same as [[clipping (signal processing)|clipping]] in signal processing.
| | Hello! <br>My name is Lorie and I'm a 24 years old boy from Switzerland.<br><br>Also visit my web page - fifa 15 coin generator ([http://osbm.lviv.ua/index.php/features?limitstart=0%FFandroid%FFnofollow%FFPlyometric+Training+Benifits%FFexternal osbm.lviv.ua]) |
| | |
| The distribution of many [[statistic]]s can be heavily influenced by [[outlier]]s. A typical strategy is to set all outliers to a specified [[percentile]] of the data; for example, a 90% Winsorisation would see all data below the 5th percentile set to the 5th percentile, and data above the 95th percentile set to the 95th percentile.
| |
| Winsorised [[estimator]]s are usually more [[robust statistics|robust]] to outliers than their more standard forms, although there are alternatives, such as [[Trimmed estimator|trimming]], that will achieve a similar effect.
| |
| | |
| == Example ==
| |
| Consider the data set consisting of:
| |
| :<math>\{92, 19, \mathbf{101}, 58, \mathbf{153}, 91, 26, 78, 10, 13, \mathbf{-40}, \mathbf{101}, 86, 85, 15, 89, 89, 25, \mathbf{2}, 41\} \qquad (N = 20)</math>
| |
| The 5th percentile lies between -40 and 2, while the 95th percentile lies between 101 and 153. (Values shown in bold.)
| |
| Then a 90% Winsorisation would result in the following:
| |
| :<math>\{92, 19, \mathbf{101}, 58, \mathbf{101}, 91, 26, 78, 10, 13, \mathbf{2}, \mathbf{101}, 86, 85, 15, 89, 89, 25, \mathbf{2}, 41\} \qquad (N = 20)</math>
| |
| | |
| == Distinction from trimming ==
| |
| Note that Winsorizing is not equivalent to simply excluding data, which is a simpler procedure, called [[trimmed estimator|trimming]] or [[Truncation (statistics)|truncation]], but is a method of [[Censoring (statistics)|censoring]] data.
| |
| | |
| In a trimmed estimator, the extreme values are ''discarded;'' in a Winsorized estimator, the extreme values are instead ''replaced'' by certain percentiles (the trimmed minimum and maximum).
| |
| | |
| Thus a [[Winsorized mean]] is not the same as a [[truncated mean]].
| |
| For instance, the 10% trimmed mean is the average of the 5th to 95th percentile of the data, while the 90% Winsorised mean sets the bottom 5% to the 5th percentile, the top 5% to the 95th percentile, and then averages the data. In the previous example the trimmed mean would be obtained from the smaller set:
| |
| :<math>\{92, 19, \mathbf{101}, 58, \quad 91, 26, 78, 10, 13, \quad \mathbf{101}, 86, 85, 15, 89, 89, 25, \mathbf{2}, 41\} \qquad (N = 18)</math>
| |
| | |
| More formally, they are distinct because the [[order statistics]] are not independent.
| |
| | |
| == References ==
| |
| | |
| * Hasings, C., Mosteller, F., Tukey, J.W., Winsor, C.P. (1947) ''Low moments for small samples: a comparative study of order statistics'', [[Annals of Mathematical Statistics]], 18, 413–426.
| |
| * W. J. Dixon (1960). ''Simplified Estimation from Censored Normal Samples'', The Annals of Mathematical Statistics, 31, 385–391.
| |
| * [[John Tukey|J. W. Tukey]] (1962) ''The Future of Data Analysis'', The Annals of Mathematical Statistics, 33, p. 18
| |
| | |
| [[Category:Statistical theory]]
| |
| [[Category:Robust statistics]]
| |
| | |
| | |
| {{Statistics-stub}}
| |
Hello!
My name is Lorie and I'm a 24 years old boy from Switzerland.
Also visit my web page - fifa 15 coin generator (osbm.lviv.ua)