Magic series: Difference between revisions

From formulasearchengine
Jump to navigation Jump to search
en>Mogism
m Cleanup/Typo fixing, typos fixed: a exact → an exact (4) using AWB
en>David Eppstein
External links: templatize reference
 
Line 1: Line 1:
'''Winsorising''' or '''Winsorization''' (this is also sometimes called Georgization{{citation needed|date=November 2013}}) is the transformation of [[statistic]]s by limiting [[extreme value]]s in the [[statistics|statistical]] data to reduce the effect of possibly spurious [[outliers]]. It is named after the engineer-turned-biostatistician [[Charles P. Winsor]] (1895–1951). The effect is the same as [[clipping (signal processing)|clipping]] in signal processing.
Hello! <br>My name is Lorie and I'm a 24 years old boy from Switzerland.<br><br>Also visit my web page - fifa 15 coin generator ([http://osbm.lviv.ua/index.php/features?limitstart=0%FFandroid%FFnofollow%FFPlyometric+Training+Benifits%FFexternal osbm.lviv.ua])
 
The distribution of many [[statistic]]s can be heavily influenced by [[outlier]]s. A typical strategy is to set all outliers to a specified [[percentile]] of the data; for example, a 90% Winsorisation would see all data below the 5th percentile set to the 5th percentile, and data above the 95th percentile set to the 95th percentile.
Winsorised [[estimator]]s are usually more [[robust statistics|robust]] to outliers than their more standard forms, although there are alternatives, such as [[Trimmed estimator|trimming]], that will achieve a similar effect.
 
== Example ==
Consider the data set consisting of:
:<math>\{92, 19, \mathbf{101}, 58, \mathbf{153}, 91, 26, 78, 10, 13, \mathbf{-40}, \mathbf{101}, 86, 85, 15, 89, 89, 25, \mathbf{2}, 41\} \qquad (N = 20)</math>
The 5th percentile lies between -40 and 2, while the 95th percentile lies between 101 and 153.  (Values shown in bold.)
Then a 90% Winsorisation would result in the following:
:<math>\{92, 19, \mathbf{101}, 58, \mathbf{101}, 91, 26, 78, 10, 13, \mathbf{2}, \mathbf{101}, 86, 85, 15, 89, 89, 25, \mathbf{2}, 41\} \qquad (N = 20)</math>
 
== Distinction from trimming ==
Note that Winsorizing is not equivalent to simply excluding data, which is a simpler procedure, called [[trimmed estimator|trimming]] or [[Truncation (statistics)|truncation]], but is a method of [[Censoring (statistics)|censoring]] data.
 
In a trimmed estimator, the extreme values are ''discarded;'' in a Winsorized estimator, the extreme values are instead ''replaced'' by certain percentiles (the trimmed minimum and maximum).
 
Thus a [[Winsorized mean]] is not the same as a [[truncated mean]].
For instance, the 10% trimmed mean is the average of the 5th to 95th percentile of the data, while the 90% Winsorised mean sets the bottom 5% to the 5th percentile, the top 5% to the 95th percentile, and then averages the data.  In the previous example the trimmed mean would be obtained from the smaller set:
:<math>\{92, 19, \mathbf{101}, 58, \quad 91, 26, 78, 10, 13, \quad \mathbf{101}, 86, 85, 15, 89, 89, 25, \mathbf{2}, 41\} \qquad (N = 18)</math>
 
More formally, they are distinct because the [[order statistics]] are not independent.
 
== References ==
 
* Hasings, C., Mosteller, F., Tukey, J.W., Winsor, C.P. (1947) ''Low moments for small samples: a comparative study of order statistics'', [[Annals of Mathematical Statistics]], 18, 413&ndash;426.
* W. J. Dixon (1960). ''Simplified Estimation from Censored Normal Samples'',  The Annals of Mathematical Statistics, 31, 385&ndash;391.
* [[John Tukey|J. W. Tukey]] (1962) ''The Future of Data Analysis'', The Annals of Mathematical Statistics, 33, p.&nbsp;18
 
[[Category:Statistical theory]]
[[Category:Robust statistics]]
 
 
{{Statistics-stub}}

Latest revision as of 00:40, 18 December 2014

Hello!
My name is Lorie and I'm a 24 years old boy from Switzerland.

Also visit my web page - fifa 15 coin generator (osbm.lviv.ua)