Dunford–Schwartz theorem: Difference between revisions
en>Sodin added Category:Theorems in functional analysis using HotCat |
en>Yobot m WP:CHECKWIKI error fixes / special characters in sortkey fixed using AWB (9427) |
||
Line 1: | Line 1: | ||
In [[statistics]], '''leverage''' is a term used in connection with [[regression analysis]] and, in particular, in analyses aimed at identifying those observations that are far away from corresponding average predictor values. Leverage points do not necessarily have a large effect on the outcome of fitting regression models. | |||
'''Leverage points''' are those observations, if any, made at extreme or outlying values of the <em>independent variables</em> such that the lack of neighboring observations means that the fitted regression model will pass close to that particular observation.<ref>Everitt, B.S. (2002) Cambridge Dictionary of Statistics. CUP. ISBN 0-521-81099-X</ref> | |||
Modern computer packages for statistical analysis include, as part of their facilities for regression analysis, various quantitative measures for identifying [[influential observation]]s: among these measures is [[partial leverage]], a measure of how a variable contributes to the leverage of a datum. | |||
==Definition== | |||
The leverage score for the <math> i^{th} </math> data unit is defined as: | |||
*<math> h_{ii}=(H)_{ii} </math>, | |||
the <math> i^{th} </math> diagonal of the hat matrix <math> H=X(X'X)^{-1}X'</math>. | |||
==Properties== | |||
<math> 0 \leq h_{ii} \leq 1 </math> | |||
===Proof=== | |||
First, note that <math> H^2=X(X'X)^{-1}X'X(X'X)^{-1}X'=XI(X'X)^{-1}X'=H </math>. Also, observe that <math> H </math> is symmetric. | |||
So we have, | |||
*<math> h_{ii}=h_{ii}^2+\sum_{i\neq j}h_{ij}^2 \geq 0 </math> | |||
and | |||
*<math> h_{ii} \geq h_{ii}^2 \implies h_{ii}\leq 1 </math> | |||
If we are in an ordinary least squares setting with fixed X and: | |||
*<math> Y=X\beta+\epsilon </math> | |||
*<math>var(\epsilon)=\sigma^2I </math> | |||
then <math> var(e_i)=(1-h_{ii})\sigma^2 </math> where <math> e_i=Y_i-\hat{Y}_i </math>. | |||
In other words, if the <math> \epsilon </math> are homoscedastic, leverage scores determine the noise level in the model. | |||
===Proof=== | |||
First, note that <math> I-H </math> is idempotent and symmetric. This gives, | |||
<math> var(e)=var((I-H)Y)=(I-H)var(Y)(I-H)'=\sigma^2(I-H)^2=\sigma^2(I-H) </math>. | |||
So that, <math> var(e_i)=(1-h_{ii})\sigma^2 </math>. | |||
==See also== | |||
* [[Hat matrix]] — whose main diagonal entries are the leverages of the observations | |||
* [[Mahalanobis distance]] — a measure of leverage of a datum | |||
* [[Cook's distance]] - a measure of changes in regression coefficients when an observation is deleted | |||
* [[DFFITS]] | |||
* [[Outliers]] — observations with extreme Y values | |||
==References== | |||
{{reflist}} | |||
[[Category:Regression analysis]] | |||
[[Category:Statistical terminology]] | |||
[[Category:Regression diagnostics]] |
Revision as of 07:27, 20 August 2013
In statistics, leverage is a term used in connection with regression analysis and, in particular, in analyses aimed at identifying those observations that are far away from corresponding average predictor values. Leverage points do not necessarily have a large effect on the outcome of fitting regression models.
Leverage points are those observations, if any, made at extreme or outlying values of the independent variables such that the lack of neighboring observations means that the fitted regression model will pass close to that particular observation.[1]
Modern computer packages for statistical analysis include, as part of their facilities for regression analysis, various quantitative measures for identifying influential observations: among these measures is partial leverage, a measure of how a variable contributes to the leverage of a datum.
Definition
The leverage score for the data unit is defined as:
the diagonal of the hat matrix .
Properties
Proof
First, note that . Also, observe that is symmetric. So we have,
and
If we are in an ordinary least squares setting with fixed X and:
In other words, if the are homoscedastic, leverage scores determine the noise level in the model.
Proof
First, note that is idempotent and symmetric. This gives, .
See also
- Hat matrix — whose main diagonal entries are the leverages of the observations
- Mahalanobis distance — a measure of leverage of a datum
- Cook's distance - a measure of changes in regression coefficients when an observation is deleted
- DFFITS
- Outliers — observations with extreme Y values
References
43 year old Petroleum Engineer Harry from Deep River, usually spends time with hobbies and interests like renting movies, property developers in singapore new condominium and vehicle racing. Constantly enjoys going to destinations like Camino Real de Tierra Adentro.
- ↑ Everitt, B.S. (2002) Cambridge Dictionary of Statistics. CUP. ISBN 0-521-81099-X