Iteratively reweighted least squares: Difference between revisions

Content deleted Content added

Inline

Revision as of 17:30, 23 June 2009

The method of iteratively reweighted least squares (IRLS) is used to solve certain optimization problems. It solves objective functions of the form:

{\underset {\boldsymbol {\beta }}{\operatorname {arg\,min} }}\sum _{i=1}^{n}w_{i}({\boldsymbol {\beta }}){\big [}y_{i}-\mathbf {x} _{i}^{\top }{\boldsymbol {\beta }}{\big ]}^{2},

by an iterative method in which each step involves solving a standard weighted linear least squares problem of the form:

{\boldsymbol {\beta }}^{(t+1)}={\underset {\boldsymbol {\beta }}{\operatorname {arg\,min} }}\sum _{i=1}^{n}w_{i}({\boldsymbol {\beta }}^{(t)}){\big [}y_{i}-\mathbf {x} _{i}^{\top }{\boldsymbol {\beta }}{\big ]}^{2}.

IRLS is used to find the maximum likelihood estimates of a generalized linear model, and in robust regression to find an M-estimator, as a way of mitigating the influence of outliers in an otherwise normally-distributed data set. For example, by minimizing the least absolute error rather than the least square error.

Although not a linear regression problem, Weiszfeld's algorithm for approximating the geometric median can also be viewed as a special case of iteratively reweighted least squares, in which the objective function is the sum of distances of the estimator from the samples.

Examples

L^p norm linear regression

To find the parameters β = (β₁, …,β_k)^T which minimise the L^p norm for the linear regression problem:

{\underset {\boldsymbol {\beta }}{\operatorname {arg\,min} }}{\big \|}\mathbf {y} -X{\boldsymbol {\beta }}\|_{p}={\underset {\boldsymbol {\beta }}{\operatorname {arg\,min} }}\sum _{i=1}^{n}{\big |}y_{i}-\mathbf {x} _{i}^{\top }{\boldsymbol {\beta }}{\big |}^{p}

The IRLS algorithm at step t+1 involves solving the weighted linear least squares problem:

{\boldsymbol {\beta }}^{(t+1)}={\underset {\boldsymbol {\beta }}{\operatorname {arg\,min} }}\sum _{i=1}^{n}{\big |}y_{i}-\mathbf {x} _{i}^{\top }{\boldsymbol {\beta }}^{(t)}{\big |}^{p-2}{\big |}y_{i}-\mathbf {x} _{i}^{\top }{\boldsymbol {\beta }}{\big |}^{2}.

In the case p = 1, this corresponds to least absolute deviation regression.

Convergence

Convergence of the method is not guaranteed.

References

Stanford Lecture Notes on the IRLS algorithm by Antoine Guitton
Numerical Methods for Least Squares Problems by Åke Björck (Chapter 4: Generalized Least Squares Problems.)
Robust Estimation in Numerical Recipes in C by Press et al (requires the FileOpen plugin to view)

@@ Line 1: / Line 1: @@
-The method of '''iteratively reweighted least squares (IRLS)''' is used to solve certain [[non-linear least squares]] problems. It is used to solve [[objective function]]s of the form:
+The method of '''iteratively reweighted least squares''' ('''IRLS''') is used to solve certain optimization problems. It solves [[objective function]]s of the form:
 :<math>\underset{\boldsymbol\beta} \operatorname{arg\,min}\sum_{i=1}^n w_i (\boldsymbol\beta) \big[ y_i - \mathbf{x}_i^\top \boldsymbol\beta \big]^2, </math>
@@ Line 11: / Line 11: @@
 Although not a linear regression problem, [[Weiszfeld's algorithm]] for approximating the [[geometric median]] can also be viewed as a special case of iteratively reweighted least squares, in which the objective function is the sum of distances of the estimator from the samples.
-==The method==
+== Examples ==
+=== ''L<sup>p</sup>'' norm linear regression ===
-Starting with a [[diagonal matrix|diagonal]] weighting matrix equal to the [[identity matrix]] <math>\scriptstyle W = I</math> and a linear problem <math>\scriptstyle A x = b</math>, the (weighted) linear equation
+To find the parameters '''''&beta;'''''&nbsp;=&nbsp;(''&beta;''<sub>1</sub>, …,''&beta;''<sub>''k''</sub>)<sup>T</sup> which minimise the [[Lp space|''L<sup>p</sup>'' norm]] for the [[linear regression]] problem:
+:<math> \underset{\boldsymbol \beta} \operatorname{arg\,min}  \big\| \mathbf y - X \boldsymbol \beta \|_p = \underset{\boldsymbol \beta} \operatorname{arg\,min} \sum_{i=1}^n  \big| y_i - \mathbf{x}_i^\top \boldsymbol\beta \big|^p </math>
-:<math> W A x = W b \, </math>
+The IRLS algorithm at step ''t''+1 involves solving the weighted linear least squares problem:
-is formed.  The least squares solution of this equation is then found using standard linear algebra methods. The [[errors and residuals in statistics|residuals]]
+:<math>\boldsymbol\beta^{(t+1)} = \underset{\boldsymbol\beta} \operatorname{arg\,min} \sum_{i=1}^n \big|y_i - \mathbf{x}^\top_i \boldsymbol \beta ^{(t)} \big|^{p-2}  \big| y_i - \mathbf{x}_i^\top \boldsymbol\beta \big|^2. </math>
-:<math> r = A x - b \, </math>
+In the case ''p''&nbsp;=&nbsp;1, this corresponds to [[least absolute deviation]] regression.
-are calculated and the weighting matrix is updated to some non-negative function <math>\scriptstyle f(r)</math> of the residuals, ''r'', e.g., <math>\scriptstyle f(r) = 1/|r|</math>
-:<math> W = \mathop{\rm diag}( f(r) ). \, </math>
-With these new weights, the weighted least squares equation is re-solved and the residuals are re-calculated. The process can be iterated many times.
-The solution to which this iterative process converges is the minimizer of an objective function related to the function <math>\scriptstyle f(r)</math>. With <math>\scriptstyle f(r) = 1/|r|</math> the objective is the [[least absolute deviation]] <math>\scriptstyle \sum |r_i|</math>.
 ==Convergence==
-Convergence of the method is not guaranteed. For example, choosing ''f(r) = |r|<sup>p</sup>'' with ''p'' < −1 or ''p'' ≥ 1 may cause successive solutions to keep oscillating without converging to a limit.
+Convergence of the method is not guaranteed.
 == References ==

Revision as of 17:30, 23 June 2009

Examples

Lp norm linear regression

Convergence

References

L^p norm linear regression