Leverage (statistics)

In statistics and in particular in regression analysis, leverage is a measure of how far away the independent variable values of an observation are from those of the other observations.

High-leverage points are those observations, if any, made at extreme or outlying values of the independent variables such that the lack of neighboring observations means that the fitted regression model will pass close to that particular observation.^[1]

Definition

In the linear regression model, the leverage score for the i-th observation is defined as:

h_{ii}=\left[\mathbf {H} \right]_{ii},

the i-th diagonal element of the projection matrix $\mathbf {H} =\mathbf {X} \left(\mathbf {X} ^{\mathsf {T}}\mathbf {X} \right)^{-1}\mathbf {X} ^{\mathsf {T}}$ , where $\mathbf {X}$ is the design matrix (whose rows correspond to the observations and whose columns correspond to the independent or explanatory variables).

Interpretation

The leverage score is also known as the observation self-sensitivity or self-influence,^[2] because of the equation

h_{ii}={\frac {\partial {\widehat {y\,}}_{i}}{\partial y_{i}}},

which states that the leverage of the i-th observation equals the partial derivative of the fitted i-th dependent value ${\widehat {y\,}}_{i}$ with respect to the measured i-th dependent value $y_{i}$ . This partial derivative describes the degree by which the i-th measured value influences the i-th fitted value. Note that this leverage depends on the values of the explanatory (x-) variables of all observations but not on any of the values of the dependent (y-) variables.

The equation $h_{ii}={\frac {\partial {\widehat {y\,}}_{i}}{\partial y_{i}}}$ follows directly from the computation of the fitted values via the hat matrix as ${\mathbf {\widehat {y}} }={\mathbf {H} }{\mathbf {y} }$ ; that is, leverage is a diagonal element of the design matrix:

h_{ii}=\mathbf {H} (i,i).

Bounds on leverage

0\leq h_{ii}\leq 1.

Proof

First, note that H is an idempotent matrix: $H^{2}=X(X^{\top }X)^{-1}X^{\top }X(X^{\top }X)^{-1}X^{\top }=XI(X^{\top }X)^{-1}X^{\top }=H.$ Also, observe that $H$ is symmetric (i.e.: $h_{ij}=h_{ji}$ ). So equating the ii element of H to that of H ², we have

h_{ii}=h_{ii}^{2}+\sum _{j\neq i}h_{ij}^{2}\geq 0

and

h_{ii}\geq h_{ii}^{2}\implies h_{ii}\leq 1.

Relation to influence functions

In a regression context, we combine leverage and influence functions to compute the degree to which estimated coefficients would change if we removed a single data point. Denoting leverage $h_{ii}\equiv x_{i}'(X'X)^{-1}x_{i}$ and the regression residual ${\hat {e}}_{i}\equiv y_{i}-x_{i}'\beta$ , one can compare the estimated coefficient ${\hat {\beta }}$ to the leave-one-out estimated coefficient ${\hat {\beta }}^{(-i)}$ using the formula ^[3]^[4]

{\hat {\beta }}-{\hat {\beta }}^{(-i)}={\frac {(X'X)^{-1}x_{i}'{\hat {e}}_{i}}{1-h_{ii}}}

Young (2019) uses a version of this formula after residualizing controls.^[5]

To gain intuition for this formula, note that the k-by-1 vector ${\frac {\partial {\hat {\beta }}}{\partial y_{i}}}=(X'X)^{-1}x_{i}$ captures the potential for an observation to affect the regression parameters, and therefore $(X'X)^{-1}x_{i}{\hat {e}}_{i}$ captures the actual influence of that observations' deviations from its fitted value on the regression parameters. The formula then divides by $(1-h_{ii})$ to account for the fact that we remove the observation rather than adjusting its value, reflecting the fact that removal changes the distribution of covariates more when applied to high-leverage observations (i.e. with outlier covariate values).

Similar formulas arise when applying general formulas for statistical influences functions in the regression context.^[6]^[7]

Effect on residual variance

If we are in an ordinary least squares setting with fixed X and homoscedastic regression errors $\varepsilon _{i},$

Y=X\beta +\varepsilon ;\ \ \operatorname {Var} (\varepsilon )=\sigma ^{2}I

then the i-th regression residual

e_{i}=Y_{i}-{\widehat {Y}}_{i}

has variance

\operatorname {Var} (e_{i})=(1-h_{ii})\sigma ^{2}

In other words, an observation's leverage score determines the degree of noise in the model's misprediction of that observation, with higher leverage leading to less noise.

Proof

First, note that $I-H$ is idempotent and symmetric, and ${\widehat {Y}}=HY$ . This gives

\operatorname {Var} (e)=\operatorname {Var} ((I-H)Y)=(I-H)\operatorname {Var} (Y)(I-H)^{\top }=\sigma ^{2}(I-H)^{2}=\sigma ^{2}(I-H).

Thus $\operatorname {Var} (e_{i})=(1-h_{ii})\sigma ^{2}.$

Studentized residuals

The corresponding studentized residual—the residual adjusted for its observation-specific estimated residual variance—is then

t_{i}={e_{i} \over {\widehat {\sigma }}{\sqrt {1-h_{ii}\ }}}

where ${\widehat {\sigma }}$ is an appropriate estimate of $\sigma .$

Related concepts

Partial leverage

Partial leverage is a measure of the contribution of the individual independent variables to the total leverage of each observation. Modern computer packages for statistical analysis include, as part of their facilities for regression analysis, various quantitative measures for identifying influential observations, including such a measure of how an independent variable contributes to the total leverage of a datum.

Mahalanobis distance

Leverage is closely related to the Mahalanobis distance^[8] (see proof^[9]).

Specifically, for some matrix $X_{n,p}$ the squared Mahalanobis distance of some row vector ${\vec {x_{i}}}=X_{i,\cdot }$ from the vector of mean ${\hat {\mu }}={\bar {X}}$ , of length $p$ , and with the estimated covariance matrix $S=\operatorname {cov} (X)$ is:

D^{2}({\vec {x_{i}}})=({\vec {x_{i}}}-{\hat {\mu }})^{T}S^{-1}({\vec {x_{i}}}-{\hat {\mu }})

This is related to the leverage $h_{ii}$ of the hat matrix of $X_{n,p}$ after appending a column vector of 1's to it. The relationship between the two is:

D^{2}({\vec {x_{i}}})=(n-1)(h_{ii}-{\tfrac {1}{n}})

The relationship between leverage and Mahalanobis distance enables us to decompose leverage into meaningful components so that some sources of high leverage can be investigated analytically.^[10]

Software implementations

Many programs and statistics packages, such as R, Python, etc., include implementations of Leverage.

Language/Program	Function	Notes
R	`hat(x, intercept = TRUE)` or `hatvalues(model, ...)`	See [1]

References

^ Everitt, B. S. (2002). Cambridge Dictionary of Statistics. Cambridge University Press. ISBN 0-521-81099-X.
^ Cardinali, C. (June 2013). "Data Assimilation: Observation influence diagnostic of a data assimilation system" (PDF).
^ Miller, Rupert G. (September 1974). "An Unbalanced Jackknife". Annals of Statistics. 2 (5): 880–891. doi:10.1214/aos/1176342811. ISSN 0090-5364.
^ Hiyashi, Fumio (2000). Econometrics. Princeton University Press. p. 21.
^ Young, Alwyn (2019). "Channeling Fisher: Randomization Tests and the Statistical Insignificance of Seemingly Significant Experimental Results". The Quarterly Journal of Economics. 134: 567.
^ Chatterjee, Samprit; Hadi, Ali S. (August 1986). "Influential Observations, High Leverage Points, and Outliers in Linear Regression". Statistical Science. 1 (3): 379–393. doi:10.1214/ss/1177013622. ISSN 0883-4237.
^ "regression - Influence functions and OLS". Cross Validated. Retrieved 2020-12-06.
^ Weiner, Irving B.; Schinka, John A.; Velicer, Wayne F. (23 October 2012). Handbook of Psychology, Research Methods in Psychology. John Wiley & Sons. ISBN 978-1-118-28203-8.
^ Prove the relation between Mahalanobis distance and Leverage?
^ Kim, M. G. (2004). "Sources of high leverage in linear regression model (Journal of Applied Mathematics and Computing, Vol 16, 509–513)". arXiv:2006.04024 [math.ST].

[1] Everitt, B. S. (2002). Cambridge Dictionary of Statistics. Cambridge University Press. ISBN 0-521-81099-X.

[2] Cardinali, C. (June 2013). "Data Assimilation: Observation influence diagnostic of a data assimilation system" (PDF).

[3] Miller, Rupert G. (September 1974). "An Unbalanced Jackknife". Annals of Statistics. 2 (5): 880–891. doi:10.1214/aos/1176342811. ISSN 0090-5364.

[4] Hiyashi, Fumio (2000). Econometrics. Princeton University Press. p. 21.

[5] Young, Alwyn (2019). "Channeling Fisher: Randomization Tests and the Statistical Insignificance of Seemingly Significant Experimental Results". The Quarterly Journal of Economics. 134: 567.

[6] Chatterjee, Samprit; Hadi, Ali S. (August 1986). "Influential Observations, High Leverage Points, and Outliers in Linear Regression". Statistical Science. 1 (3): 379–393. doi:10.1214/ss/1177013622. ISSN 0883-4237.

[7] "regression - Influence functions and OLS". Cross Validated. Retrieved 2020-12-06.

[8] Weiner, Irving B.; Schinka, John A.; Velicer, Wayne F. (23 October 2012). Handbook of Psychology, Research Methods in Psychology. John Wiley & Sons. ISBN 978-1-118-28203-8.

[9] Prove the relation between Mahalanobis distance and Leverage?

[10] Kim, M. G. (2004). "Sources of high leverage in linear regression model (Journal of Applied Mathematics and Computing, Vol 16, 509–513)". arXiv:2006.04024 [math.ST].

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

Definition

Interpretation

Bounds on leverage

Proof

Relation to influence functions

Effect on residual variance

Proof

Studentized residuals

Related concepts

Partial leverage

Mahalanobis distance

Software implementations

See also

References