Lift (data mining): Difference between revisions

Revision as of 15:38, 29 July 2013

In statistics, the 68–95–99.7 rule, also known as the three-sigma rule or empirical rule, states that nearly all values lie within three standard deviations of the mean in a normal distribution.

About 68.27% of the values lie within one standard deviation of the mean. Similarly, about 95.45% of the values lie within two standard deviations of the mean. Nearly all (99.73%) of the values lie within three standard deviations of the mean.

In mathematical notation, these facts can be expressed as follows, where $x$ is an observation from a normally distributed random variable, $μ$ is the mean of the distribution, and $σ$ is its standard deviation:

{\begin{aligned}\Pr(\mu -\;\,\sigma \leq x\leq \mu +\;\,\sigma )&\approx 0.6827\\\Pr(\mu -2\sigma \leq x\leq \mu +2\sigma )&\approx 0.9545\\\Pr(\mu -3\sigma \leq x\leq \mu +3\sigma )&\approx 0.9973\end{aligned}}

Derivation

These numerical values come from the cumulative distribution function of the normal distribution. For example, $Φ(2) \approx 0.9772$ , or $Pr(x \leq μ + 2σ) \approx 0.9772$ . Note that this is not a symmetrical interval – this is merely the probability that an observation is less than $μ + 2σ$ . To compute the probability that an observation is within two standard deviations of the mean (small differences due to rounding):

\Pr(\mu -2\sigma \leq x\leq \mu +2\sigma )=\Phi (2)-\Phi (-2)\approx 0.9772-(1-0.9772)\approx 0.9545

This is related to confidence interval as used in statistics: $\scriptstyle {\bar {x}}\pm 2\sigma$ is approximately a 95% confidence interval when ${\bar {x}}$ is the average of a sample.

Uses

This rule is often used to quickly get a rough probability estimate of something, given its standard deviation, if the population is assumed normal, thus also as a simple test for outliers (if the population is assumed normal), and as a normality test (if the population is potentially not normal).

Recall that to pass from a sample to a number of standard deviations, one computes the deviation, either the error or residual (accordingly if one knows the population mean or only estimates it), and then either uses standardizing (dividing by the population standard deviation), if the population parameters are known, or studentizing (dividing by an estimate of the standard deviation), if the parameters are unknown and only estimated.

To use as a test for outliers or a normality test, one computes the size of deviations in terms of standard deviations, and compares this to expected frequency. Given a sample set, compute the studentized residuals and compare these to the expected frequency: points that fall more than 3 standard deviations from the norm are likely outliers (unless the sample size is significantly large, by which point one expects a sample this extreme), and if there are many points more than 3 standard deviations from the norm, one likely has reason to question the assumed normality of the distribution. This holds ever more strongly for moves of 4 or more standard deviations.

One can compute more precisely, approximating the number of extreme moves of a given magnitude or greater by a Poisson distribution, but simply, if one has multiple 4 standard deviation moves in a sample of size 1,000, one has strong reason to consider these outliers or question the assumed normality of the distribution.

Higher deviations

Because of the exponential tails of the normal distribution, odds of higher deviations decrease very quickly. From the rules for normally distributed data:

Range	Population in range	Expected frequency outside range	Approx. frequency for daily event
μ ± 1σ	Template:Gaps	1 in 3	Twice a week
μ ± 1.5σ	Template:Gaps	1 in 7	Weekly
μ ± 2σ	Template:Gaps	1 in 22	Every three weeks
μ ± 2.5σ	Template:Gaps	1 in 81	Quarterly
μ ± 3σ	Template:Gaps	1 in 370	Yearly
μ ± 3.5σ	Template:Gaps	1 in 2149	Every six years
μ ± 4σ	Template:Gaps	1 in Template:Val	Every 43 years (twice in a lifetime)
μ ± 4.5σ	Template:Gaps	1 in Template:Val	Every 403 years
μ ± 5σ	Template:Gaps	1 in Template:Val	Every Template:Val years (once in recorded history)
μ ± 5.5σ	Template:Gaps	1 in Template:Val	Every Template:Val years
μ ± 6σ	Template:Gaps	1 in Template:Val	Every 1.38 million years (history of humankind)
μ ± 6.5σ	Template:Gaps	1 in Template:Val	Every 34 million years
μ ± 7σ	Template:Gaps	1 in Template:Val	Every 1.07 billion years
μ ± Buying, selling and renting HDB and personal residential properties in Singapore are simple and transparent transactions. Although you are not required to engage a real property salesperson (generally often known as a "public listed property developers In singapore agent") to complete these property transactions, chances are you'll think about partaking one if you are not accustomed to the processes concerned. Professional agents are readily available once you need to discover an condominium for hire in singapore In some cases, landlords will take into account you more favourably in case your agent comes to them than for those who tried to method them by yourself. You need to be careful, nevertheless, as you resolve in your agent. Ensure that the agent you are contemplating working with is registered with the IEA – Institute of Estate Brokers. Whereas it might sound a hassle to you, will probably be worth it in the end. The IEA works by an ordinary algorithm and regulations, so you'll protect yourself in opposition to probably going with a rogue agent who prices you more than they should for his or her service in finding you an residence for lease in singapore. There isn't any deal too small. Property agents who are keen to find time for any deal even if the commission is small are the ones you want on your aspect. Additionally they present humbleness and might relate with the typical Singaporean higher. Relentlessly pursuing any deal, calling prospects even without being prompted. Even if they get rejected a hundred times, they still come again for more. These are the property brokers who will find consumers what they need eventually, and who would be the most successful in what they do. 4. Honesty and Integrity This feature is suitable for you who need to get the tax deductions out of your PIC scheme to your property agency firm. It's endorsed that you visit the correct site for filling this tax return software. This utility must be submitted at the very least yearly to report your whole tax and tax return that you're going to receive in the current accounting 12 months. There may be an official website for this tax filling procedure. Filling this tax return software shouldn't be a tough thing to do for all business homeowners in Singapore. A wholly owned subsidiary of SLP Worldwide, SLP Realty houses 900 associates to service SLP's fast rising portfolio of residential tasks. Real estate is a human-centric trade. Apart from offering comprehensive coaching applications for our associates, SLP Realty puts equal emphasis on creating human capabilities and creating sturdy teamwork throughout all ranges of our organisational hierarchy. Worldwide Presence At SLP International, our staff of execs is pushed to make sure our shoppers meet their enterprise and investment targets. Under is an inventory of some notable shoppers from completely different industries and markets, who've entrusted their real estate must the expertise of SLP Worldwide. If you're looking for a real estate or Singapore property agent online, you merely need to belief your instinct. It is because you don't know which agent is sweet and which agent will not be. Carry out research on a number of brokers by looking out the internet. As soon as if you find yourself certain that a selected agent is dependable and trustworthy, you'll be able to choose to utilize his partnerise find you a house in Singapore. More often than not, a property agent is considered to be good if she or he places the contact data on his web site. This is able to imply that the agent does not thoughts you calling them and asking them any questions regarding properties in Singapore. After chatting with them you too can see them of their office after taking an appointment. Another method by way of which you could find out whether the agent is sweet is by checking the feedback, of the shoppers, on the website. There are various individuals would publish their comments on the web site of the Singapore property agent. You can take a look at these feedback and the see whether it will be clever to hire that specific Singapore property agent. You may even get in contact with the developer immediately. Many Singapore property brokers know the developers and you may confirm the goodwill of the agent by asking the developer.σ	$\textstyle \operatorname {erf} \left({\frac {x}{\sqrt {2}}}\right)$	1 in $\textstyle {\frac {1}{1-\operatorname {erf} \left({\frac {x}{\sqrt {2}}}\right)}}$	Every $\textstyle {\frac {1}{1-\operatorname {erf} \left({\frac {x}{\sqrt {2}}}\right)}}$ days

Thus for a daily process, a 6σ event is expected to happen less than once in a million years. This gives a simple normality test: if one witnesses a 6σ in daily data and significantly fewer than 1 million years have passed, then a normal distribution most likely does not provide a good model for the magnitude or frequency of large deviations in this respect. In The Black Swan, Nassim Nicholas Taleb gives the example of risk models for which the Black Monday crash was a 36-sigma event: the occurrence of such an event should instantly suggest a consideration of a catastrophic flaw in a model. However, such models were created before there was a proper understanding of stochastic volatility and the recitation of such calculations, which no modern practitioner would take seriously at all, is somewhat akin to a straw man argument. In such discussion it is important to be aware of the fact that there is actually nothing in the process of drawing with replacement that specifies the order in which the unlikely events should occur, merely their relative frequency, and one must take care when reasoning from sequential draws. It is a corollary of the gambler's fallacy to suggest that just because a rare event has been observed, that rare event was not rare. It is the observation of a multitude of puportedly rare events that undermines the hypothesis that they are actually rare.

External links

"The Normal Distribution" by Balasubramanian Narasimhan
"Calculate percentage proportion within x sigmas at WolframAlpha

55 yrs old Metal Polisher Records from Gypsumville, has interests which include owning an antique car, summoners war hack and spelunkering. Gets immense motivation from life by going to places such as Villa Adriana (Tivoli).

my web site - summoners war hack no survey ios

pl:Odchylenie standardowe#Dla rozkładu normalnego

@@ Line 1: / Line 1: @@
-Claude is her title and she completely digs that name. Kansas is exactly where her home is but she requirements to transfer simply because of her family members. The thing she adores most is to perform handball but she can't make it her occupation. Managing individuals is what I do in my day occupation.<br><br>Feel free to surf to my weblog: [http://Www.Hk-301.net/clansphere/index.php?mod=users&action=view&id=3833 http://Www.Hk-301.net]
+{{no footnotes|date=November 2013}}
+[[Image:standard deviation diagram.svg|thumb|350px|Dark blue is less than one standard deviation from the mean. For the [[normal distribution]], this accounts for 68.27% of the set; while two standard deviations from the mean (medium and dark blue) account for 95.45%; and three standard deviations (light, medium, and dark blue) account for 99.73%.]]
+[[File:Standard score and prediction interval.png|thumb|250px|right|Prediction interval (on the [[y-axis]]) given from the [[standard score]] (on the [[x-axis]]). The y-axis is logarithmically scaled (but the values on it are not modified).]]
+In [[statistics]], the '''68–95–99.7 rule''', also known as the '''three-sigma rule''' or '''empirical rule''', states that nearly all values lie within three [[standard deviation]]s of the [[Arithmetic mean|mean]] in a [[normal distribution]].
+About 68.27% of the values lie within one standard deviation of the mean.  Similarly, about 95.45% of the values lie within two standard deviations of the mean.  Nearly all (99.73%) of the values lie within three standard deviations of the mean.
+In mathematical notation, these facts can be expressed as follows, where <span class="texhtml">x</span> is an observation from a normally distributed [[random variable]], <span class="texhtml">μ</span> is the mean of the distribution, and <span class="texhtml">σ</span> is its standard deviation:
+:<math>\begin{align}
+  \Pr(\mu-\;\,\sigma \le x \le \mu+\;\,\sigma) &\approx 0.6827 \\
+  \Pr(\mu-2\sigma \le x \le \mu+2\sigma)       &\approx 0.9545 \\
+  \Pr(\mu-3\sigma \le x \le \mu+3\sigma)       &\approx 0.9973
+\end{align}
+</math>
+==Derivation==
+[[File:Cumulative distribution function for normal distribution, mean 0 and sd 1.png|270px|thumb|left|Diagram showing the [[cumulative distribution function]] for the normal distribution with mean (''µ'') 0 and variance (''σ''<sup>2</sup>)&nbsp;1. The prediction interval for any standard score corresponds numerically to (1-(1-<span style="font-size:100%;">Φ</span><sub>''µ'',''σ''<sup>2</sup></sub>(standard score))&middot;2). For example, a standard score of ''x''&nbsp;=&nbsp;2 gives <span style="font-size:100%;">Φ</span><sub>''µ'',''σ''<sup>2</sup></sub>(2) =&nbsp;0.97725 corresponding to a prediction interval of (1&nbsp;−&nbsp;(1&nbsp;−&nbsp;0.97725)&middot;2) =&nbsp;0.9545 =&nbsp;95.45%.]]
+These numerical values come from the [[Normal_distribution#Cumulative_distribution|cumulative distribution function of the normal distribution]].  For example, <span class="texhtml">Φ(2) ≈ 0.9772</span>, or <span class="texhtml">Pr(x ≤ μ + 2σ) ≈ 0.9772</span>.  Note that this is not a symmetrical interval – this is merely the probability that an observation is less than <span class="texhtml">μ + 2σ</span>.  To compute the probability that an observation is within two standard deviations of the mean (small differences due to rounding):
+:<math>\Pr(\mu-2\sigma \le x \le \mu+2\sigma)
+ = \Phi(2) - \Phi(-2)
+ \approx 0.9772 - (1 - 0.9772)
+ \approx 0.9545
+</math>
+This is related to [[confidence interval]] as used in statistics: <math>\scriptstyle \bar{x} \pm 2\sigma</math> is approximately a 95% confidence interval when <math>\bar{x}</math> is the average of a sample.
+== Uses ==
+This rule is often used to quickly get a rough probability estimate of something, given its standard deviation, if the population is assumed normal, thus also as a simple test for [[outliers]] (if the population is assumed normal), and as a [[normality test]] (if the population is potentially not normal).
+Recall that to pass from a sample to a number of standard deviations, one
+computes the [[deviation (statistics)|deviation]], either the [[Errors and residuals in statistics|error or residual]] (accordingly if one knows the population mean or only estimates it), and then either uses [[standardizing]] (dividing by the population standard deviation), if the population parameters are known, or [[studentizing]] (dividing by an estimate of the standard deviation), if the parameters are unknown and only estimated.
+To use as a test for outliers or a normality test, one computes the size of deviations in terms of standard deviations, and compares this to expected frequency. Given a sample set, compute the [[studentized residual]]s and compare these to the expected frequency: points that fall more than 3 standard deviations from the norm are likely outliers (unless the [[sample size]] is significantly large, by which point one expects a sample this extreme), and if there are many points more than 3 standard deviations from the norm, one likely has reason to question the assumed normality of the distribution. This holds ever more strongly for moves of 4 or more standard deviations.
+One can compute more precisely, approximating the number of extreme moves of a given magnitude or greater by a [[Poisson distribution]], but simply, if one has multiple 4 standard deviation moves in a sample of size 1,000, one has strong reason to consider these outliers or question the assumed normality of the distribution.
+==Higher deviations==
+Because of the exponential tails of the normal distribution, odds of higher deviations decrease very quickly. From the [[Standard deviation#Rules for normally distributed data|rules for normally distributed data]]:
+{| class="wikitable" style="text-align:center"
+|- bgcolor="#CCCCCC"
+! Range !! Population in range !! Expected frequency outside range !! Approx. frequency for daily event
+|-
+|μ ± 1σ || {{gaps|0.682|689|492|137|086}} || 1 in 3 || Twice a week
+|-
+|μ ± 1.5σ || {{gaps|0.866|385|597|462|284}} || 1 in 7 || Weekly
+|-
+|μ ± 2σ || {{gaps|0.954|499|736|103|642}} || 1 in 22 || Every three weeks
+|-
+|μ ± 2.5σ || {{gaps|0.987|580|669|348|448}} || 1 in 81 || Quarterly
+|-
+|μ ± 3σ || {{gaps|0.997|300|203|936|740}} || 1 in 370 || Yearly
+|-
+|μ ± 3.5σ || {{gaps|0.999|534|741|841|929}} || 1 in 2149 || Every six years
+|-
+|μ ± 4σ || {{gaps|0.999|936|657|516|334}} || 1 in {{val|15787}} || Every 43 years (twice in a lifetime)
+|-
+|μ ± 4.5σ || {{gaps|0.999|993|204|653|751}} || 1 in {{val|147160}} || Every 403 years
+|-
+|μ ± 5σ || {{gaps|0.999|999|426|696|856}} || 1 in {{val|1744278}} || Every {{val|4776}} years (once in recorded history)
+|-
+|μ ± 5.5σ || {{gaps|0.999|999|962|020|875}} || 1 in {{val|26330254}} || Every {{val|72090}} years
+|-
+|μ ± 6σ || {{gaps|0.999|999|998|026|825}} || 1 in {{val|506797346}} || Every 1.38 million years (history of [[Homo Sapiens|humankind]])
+|-
+|μ ± 6.5σ || {{gaps|0.999|999|999|919|680}} || 1 in {{val|12450197393}} || Every 34 million years
+|-
+|μ ± 7σ || {{gaps|0.999|999|999|997|440}} || 1 in {{val|390682215445}} || Every 1.07 billion years
+|-
+|μ ± {{math|<var>x</var>}}σ || [[Error function|<math>\textstyle\operatorname{erf}\left(\frac{x}{\sqrt{2}}\right)</math>]]  || 1 in <math>\textstyle \frac{1}{1-\operatorname{erf}\left(\frac{x}{\sqrt{2}}\right)}</math> || Every <math>\textstyle \frac{1}{1-\operatorname{erf}\left(\frac{x}{\sqrt{2}}\right)}</math> days
+|}
+Thus for a daily process, a 6''σ'' event is expected to happen less than once in a million years. This gives a [[normality_test#Back_of_the_envelope_test|simple normality test]]: if one witnesses a 6''σ'' in daily data and significantly fewer than 1 million years have passed, then a normal distribution most likely does not provide a good model for the magnitude or frequency of large deviations in this respect.  In ''[[The Black Swan (Taleb book)|The Black Swan]]'', [[Nassim Nicholas Taleb]] gives the example of risk models for which the [[Black Monday (1987)|Black Monday]] crash was a 36-sigma event: the occurrence of such an event should instantly suggest a consideration of a catastrophic flaw in a model. However, such models were created before there was a proper understanding of [[stochastic volatility]] and the recitation of such calculations, which no modern practitioner would take seriously at all, is somewhat akin to a [[straw man]] argument. In such discussion it is important to be aware of the fact that there is actually nothing in the process of drawing with replacement that specifies the order in which the unlikely events should occur, merely their relative frequency, and one must take care when reasoning from sequential draws. It is a [[corollary]] of the [[gambler's fallacy]] to suggest that just because a rare event has been observed, that rare event was not rare. It is the observation of a multitude of puportedly rare events that undermines the hypothesis that they are actually rare.
+== See also ==
+* [[Standard score]]
+* [[t-statistic]]
+== External links ==
+* "[http://www-stat.stanford.edu/~naras/jsm/NormalDensity/NormalDensity.html The Normal Distribution]" by Balasubramanian Narasimhan
+* "[http://www.wolframalpha.com/input/?i=erf%28x%2Fsqrt%282%29%29 Calculate percentage proportion within ''x'' sigmas] at WolframAlpha
+{{ProbDistributions|Normal distribution}}
+{{DEFAULTSORT:68-95-99.7 rule}}
+[[Category:Data analysis]]
+[[Category:Statistical approximations]]
+[[pl:Odchylenie standardowe#Dla rozkładu normalnego]]

Lift (data mining): Difference between revisions

Revision as of 15:38, 29 July 2013

Contents

Derivation

Uses

Higher deviations

See also

External links

Navigation menu

Lift (data mining): Difference between revisions

Revision as of 15:38, 29 July 2013

Derivation

Uses

Higher deviations

See also

External links

Navigation menu

Search